2024-08-17 12:55:56,821 INFO [train_multi_KD3.py:1187] (0/4) Training started 2024-08-17 12:55:56,830 INFO [train_multi_KD3.py:1197] (0/4) Device: cuda:0 2024-08-17 12:55:56,835 INFO [train_multi_KD3.py:1212] (0/4) Using dtype=torch.bfloat16 2024-08-17 12:55:56,835 INFO [train_multi_KD3.py:1214] (0/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'e400fa3b456faf8afe0ee5bfe572946b4921a3db', 'k2-git-date': 'Sat Jul 15 04:21:50 2023', 'lhotse-version': '1.16.0', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.9', 'icefall-git-branch': 'multi_KD_with_wenet', 'icefall-git-sha1': '0d2af1df-clean', 'icefall-git-date': 'Wed Aug 14 17:27:16 2024', 'icefall-path': '/xy/mnt/yangxiaoyu/workspace/icefall_multi_KD', 'k2-path': '/root/anaconda3/lib/python3.9/site-packages/k2/__init__.py', 'lhotse-path': '/root/anaconda3/lib/python3.9/site-packages/lhotse/__init__.py', 'hostname': 'NGK_xiaoyu'}, 'world_size': 4, 'master_port': 13440, 'tensorboard': True, 'num_epochs': 35, 'start_epoch': 1, 'start_batch': 332000, 'exp_dir': PosixPath('multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'stop_early': True, 'use_fp16': False, 'use_bf16': True, 'share_asr': True, 'beats_loss_scale': 1.0, 'ecapa_loss_scale': 10.0, 'whisper_loss_scale': 1.0, 'whisper_cb_loss_scale': 0.01, 'repeat_librispeech': 5, 'repeat_wenetspeech': 0, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': True, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': True, 'use_ctc': False, 'speaker_input_idx': 2, 'whisper_dim': 1280, 'use_task_id': True, 'num_codebooks': 32, 'mvq_kd_layer_idx': -1, 'use_subsampled_output': True, 'delta_t': 6, 'full_libri': True, 'mini_libri': False, 'use_libriheavy': False, 'libriheavy_subset': 'small', 'use_librispeech': True, 'use_wenetspeech': False, 'use_audioset': True, 'audioset_subset': 'unbalanced', 'use_voxceleb': True, 'voxceleb_subset': 'vox2', 'use_fma': False, 'fma_subset': 'large', 'manifest_dir': PosixPath('data/fbank_LSVoxAs_with_whisper_large-v3_with_taskID'), 'max_duration': 1500, 'bucketing_sampler': False, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 1, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'enable_musan': False, 'enable_audioset': False, 'use_musan_separately': False, 'input_strategy': 'PrecomputedFeatures', 'drop_features': False, 'return_audio': False, 'use_beats': True, 'use_ecapa': True, 'use_whisper': True, 'whisper_mvq': False, 'beats_ckpt': 'data/models/BEATs/BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2.pt', 'whisper_version': 'large-v3', 'use_mert': False, 'blank_id': 0, 'vocab_size': 500, 'dtype': torch.bfloat16, 'use_amp': True} 2024-08-17 12:55:56,835 INFO [train_multi_KD3.py:1216] (0/4) About to create model 2024-08-17 12:55:57,198 INFO [model_shift.py:142] (0/4) Delta_t: 6 when computing the distillation loss 2024-08-17 12:55:57,203 INFO [train_multi_KD3.py:1220] (0/4) Number of model parameters: 66484678 2024-08-17 12:55:57,713 INFO [checkpoint.py:112] (0/4) Loading checkpoint from multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-332000.pt 2024-08-17 12:55:58,475 INFO [checkpoint.py:131] (0/4) Loading averaged model 2024-08-17 12:55:59,856 INFO [train_multi_KD3.py:1235] (0/4) Using DDP 2024-08-17 12:56:01,426 INFO [train_multi_KD3.py:1247] (0/4) Loading optimizer state dict 2024-08-17 12:56:01,729 INFO [train_multi_KD3.py:1255] (0/4) Loading scheduler state dict 2024-08-17 12:56:01,730 INFO [kd_datamodule.py:690] (0/4) About to get train 960 cuts 2024-08-17 12:56:01,782 INFO [train_multi_KD3.py:1306] (0/4) Getting audioset cuts 2024-08-17 12:56:01,782 INFO [kd_datamodule.py:900] (0/4) About to get the audioset cuts for KD. 2024-08-17 12:56:01,804 INFO [kd_datamodule.py:869] (0/4) About to get the voxceleb cuts. 2024-08-17 12:56:01,808 INFO [kd_datamodule.py:880] (0/4) Adding voxceleb2 cuts. 2024-08-17 12:56:01,815 INFO [train_multi_KD3.py:1320] (0/4) Using mux to combine Librispeech: True, WenetSpeech: False, audioset: True and voxceleb: True 2024-08-17 12:56:09,801 INFO [train_multi_KD3.py:1322] (0/4) Using mux to combine [CutSet(len=1406195) [underlying data type: ], CutSet(len=1904746) [underlying data type: ], CutSet(len=1187704) [underlying data type: ]] 2024-08-17 12:56:09,801 INFO [train_multi_KD3.py:1323] (0/4) Using weights: [1406195, 1904746, 1187704] 2024-08-17 12:56:09,801 INFO [train_multi_KD3.py:1332] (0/4) CutSet(len=4498645) [underlying data type: ] 2024-08-17 12:56:09,802 INFO [kd_datamodule.py:449] (0/4) Disable MUSAN 2024-08-17 12:56:09,802 INFO [kd_datamodule.py:489] (0/4) Disable SpecAugment 2024-08-17 12:56:09,802 INFO [kd_datamodule.py:491] (0/4) About to create train dataset 2024-08-17 12:56:09,807 INFO [kd_datamodule.py:528] (0/4) Using SimpleCutSampler 2024-08-17 12:56:09,808 INFO [kd_datamodule.py:536] (0/4) About to create train dataloader 2024-08-17 12:56:09,808 INFO [kd_datamodule.py:539] (0/4) Loading sampler state dict 2024-08-17 12:57:14,896 INFO [kd_datamodule.py:763] (0/4) About to get dev-clean cuts 2024-08-17 12:57:14,897 INFO [kd_datamodule.py:781] (0/4) About to get dev-other cuts 2024-08-17 12:57:14,898 INFO [kd_datamodule.py:570] (0/4) About to create dev dataset 2024-08-17 12:57:15,140 INFO [kd_datamodule.py:591] (0/4) About to create dev dataloader 2024-08-17 12:57:15,140 INFO [kd_datamodule.py:840] (0/4) About to get the test set of voxceleb1 set. 2024-08-17 12:57:15,141 INFO [kd_datamodule.py:570] (0/4) About to create dev dataset 2024-08-17 12:57:15,339 INFO [kd_datamodule.py:591] (0/4) About to create dev dataloader 2024-08-17 12:57:15,339 INFO [kd_datamodule.py:912] (0/4) About to get the audioset eval cuts. 2024-08-17 12:57:15,342 INFO [kd_datamodule.py:570] (0/4) About to create dev dataset 2024-08-17 12:57:15,775 INFO [kd_datamodule.py:591] (0/4) About to create dev dataloader 2024-08-17 12:57:15,775 INFO [train_multi_KD3.py:1412] (0/4) ['ASR_libri', 'SV_voxceleb1', 'AT_audioset'] 2024-08-17 12:57:15,775 INFO [train_multi_KD3.py:1416] (0/4) Loading grad scaler state dict 2024-08-17 12:57:30,439 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 0, loss[loss=0.08765, beats_loss=0.0132, ecapa_loss=0.0001541, whisper_loss=0.07291, over 22282.00 frames. ], tot_loss[loss=0.08765, beats_loss=0.0132, ecapa_loss=0.0001541, whisper_loss=0.07291, over 22282.00 frames. ], batch size: 93, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 12:57:30,440 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-17 12:58:09,343 INFO [train_multi_KD3.py:1149] (0/4) Epoch 23, validation on ASR_libri: loss=0.2516, beats_loss=0, ecapa_loss=0.0005218, whisper_loss=0.2464, over 922467.00 frames. 2024-08-17 12:58:23,208 INFO [train_multi_KD3.py:1149] (0/4) Epoch 23, validation on SV_voxceleb1: loss=0.004106, beats_loss=0, ecapa_loss=0.0004106, whisper_loss=0, over 939242.00 frames. 2024-08-17 13:00:20,830 INFO [train_multi_KD3.py:1149] (0/4) Epoch 23, validation on AT_audioset: loss=0.02324, beats_loss=0.02324, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-17 13:00:20,832 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32389MB 2024-08-17 13:00:21,662 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.69 vs. limit=15.0 2024-08-17 13:00:46,068 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 20 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-17 13:00:53,555 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 21 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-17 13:00:57,595 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 20 from LS+wenet, 10 from Vox, 28 fro AS 2024-08-17 13:01:51,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3320500.0, ans=0.0 2024-08-17 13:01:51,825 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 50, loss[loss=0.09277, beats_loss=0.01292, ecapa_loss=0.0001217, whisper_loss=0.07863, over 14990.00 frames. ], tot_loss[loss=0.09837, beats_loss=0.01073, ecapa_loss=0.0001418, whisper_loss=0.08621, over 874296.46 frames. ], batch size: 59, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:01:55,788 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 14 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-17 13:01:59,158 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.57 vs. limit=15.0 2024-08-17 13:02:06,516 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-17 13:02:08,246 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 18 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-17 13:02:43,422 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.21 vs. limit=15.0 2024-08-17 13:02:47,208 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-17 13:02:48,181 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.300e+01 2.566e+01 2.925e+01 4.524e+01, threshold=5.133e+01, percent-clipped=0.0 2024-08-17 13:02:58,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3320900.0, ans=0.125 2024-08-17 13:02:59,444 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 25 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-17 13:03:08,938 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 100, loss[loss=0.04718, beats_loss=0.01215, ecapa_loss=0.0001155, whisper_loss=0.03388, over 14179.00 frames. ], tot_loss[loss=0.09828, beats_loss=0.01077, ecapa_loss=0.000144, whisper_loss=0.08607, over 1518711.93 frames. ], batch size: 56, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:03:17,450 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 20 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-17 13:03:34,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3321100.0, ans=0.0 2024-08-17 13:03:34,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3321100.0, ans=0.025 2024-08-17 13:03:42,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3321200.0, ans=0.125 2024-08-17 13:03:51,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3321200.0, ans=0.125 2024-08-17 13:04:00,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3321300.0, ans=0.07 2024-08-17 13:04:24,698 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 150, loss[loss=0.07653, beats_loss=0.01008, ecapa_loss=0.0001613, whisper_loss=0.06483, over 18511.00 frames. ], tot_loss[loss=0.09969, beats_loss=0.01066, ecapa_loss=0.0001452, whisper_loss=0.08758, over 1991909.51 frames. ], batch size: 75, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:04:26,403 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-17 13:04:50,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3321600.0, ans=0.125 2024-08-17 13:04:52,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3321600.0, ans=0.1 2024-08-17 13:05:17,985 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.722e+01 2.279e+01 2.554e+01 2.913e+01 4.090e+01, threshold=5.109e+01, percent-clipped=0.0 2024-08-17 13:05:21,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=3321800.0, ans=0.02 2024-08-17 13:05:24,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3321900.0, ans=0.125 2024-08-17 13:05:28,685 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.45 vs. limit=22.5 2024-08-17 13:05:38,140 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 200, loss[loss=0.09617, beats_loss=0.01205, ecapa_loss=0.00013, whisper_loss=0.08283, over 22201.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01065, ecapa_loss=0.0001453, whisper_loss=0.08851, over 2410091.64 frames. ], batch size: 89, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:06:01,091 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 25 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-17 13:06:11,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3322200.0, ans=0.125 2024-08-17 13:06:20,623 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.87 vs. limit=22.5 2024-08-17 13:06:24,022 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.78 vs. limit=5.0 2024-08-17 13:06:36,617 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.14 vs. limit=15.0 2024-08-17 13:06:44,613 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 20 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-17 13:06:50,755 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 250, loss[loss=0.1165, beats_loss=0.009329, ecapa_loss=0.0001531, whisper_loss=0.1056, over 23231.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01058, ecapa_loss=0.0001455, whisper_loss=0.08977, over 2743632.56 frames. ], batch size: 88, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:06:56,369 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-17 13:07:02,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3322500.0, ans=0.0 2024-08-17 13:07:15,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3322600.0, ans=0.04949747468305833 2024-08-17 13:07:19,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3322700.0, ans=0.0 2024-08-17 13:07:28,760 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 20 from LS+wenet, 10 from Vox, 25 fro AS 2024-08-17 13:07:31,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3322800.0, ans=0.125 2024-08-17 13:07:40,461 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.294e+01 2.563e+01 2.958e+01 9.042e+01, threshold=5.127e+01, percent-clipped=1.0 2024-08-17 13:07:52,617 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-17 13:07:58,767 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 300, loss[loss=0.08886, beats_loss=0.01172, ecapa_loss=0.0001448, whisper_loss=0.07569, over 22168.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0106, ecapa_loss=0.0001461, whisper_loss=0.08966, over 2964383.05 frames. ], batch size: 93, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:08:04,440 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 17 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-17 13:08:06,068 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 13:08:29,591 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 23 from LS+wenet, 14 from Vox, 17 fro AS 2024-08-17 13:08:57,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3323400.0, ans=0.1 2024-08-17 13:08:59,937 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.44 vs. limit=15.0 2024-08-17 13:09:07,457 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 350, loss[loss=0.08869, beats_loss=0.009193, ecapa_loss=0.0001399, whisper_loss=0.0781, over 14262.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01057, ecapa_loss=0.0001453, whisper_loss=0.09007, over 3150993.83 frames. ], batch size: 54, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:09:10,648 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 37 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-17 13:09:12,070 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-17 13:09:12,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3323500.0, ans=0.125 2024-08-17 13:09:17,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3323500.0, ans=0.125 2024-08-17 13:09:18,303 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 19 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-17 13:09:41,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3323700.0, ans=0.125 2024-08-17 13:09:45,357 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 18 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-17 13:09:45,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3323700.0, ans=0.1 2024-08-17 13:09:46,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3323800.0, ans=0.125 2024-08-17 13:09:47,091 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.96 vs. limit=15.0 2024-08-17 13:09:56,212 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.647e+01 2.195e+01 2.374e+01 2.744e+01 6.242e+01, threshold=4.747e+01, percent-clipped=1.0 2024-08-17 13:09:58,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3323800.0, ans=0.125 2024-08-17 13:10:01,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3323900.0, ans=0.2 2024-08-17 13:10:15,325 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 400, loss[loss=0.116, beats_loss=0.01084, ecapa_loss=0.0001475, whisper_loss=0.1037, over 23764.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01066, ecapa_loss=0.0001459, whisper_loss=0.08996, over 3315334.67 frames. ], batch size: 92, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:10:44,754 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 9 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-17 13:11:02,010 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=7.913e-02 2024-08-17 13:11:10,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3324400.0, ans=0.125 2024-08-17 13:11:22,438 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 450, loss[loss=0.103, beats_loss=0.01257, ecapa_loss=0.000127, whisper_loss=0.0892, over 17300.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01067, ecapa_loss=0.0001469, whisper_loss=0.08967, over 3447142.33 frames. ], batch size: 69, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:11:39,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3324600.0, ans=0.1 2024-08-17 13:11:43,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3324600.0, ans=0.0 2024-08-17 13:11:45,582 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 22 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-17 13:11:46,761 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-17 13:11:59,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3324700.0, ans=0.1 2024-08-17 13:12:10,455 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+01 2.395e+01 2.662e+01 2.983e+01 2.736e+02, threshold=5.325e+01, percent-clipped=1.0 2024-08-17 13:12:16,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=3324900.0, ans=0.025 2024-08-17 13:12:30,306 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 500, loss[loss=0.1021, beats_loss=0.01012, ecapa_loss=0.0001206, whisper_loss=0.09078, over 16700.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01062, ecapa_loss=0.0001466, whisper_loss=0.08913, over 3520640.66 frames. ], batch size: 65, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:12:34,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3325000.0, ans=0.125 2024-08-17 13:12:47,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3325100.0, ans=0.125 2024-08-17 13:13:04,780 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-17 13:13:06,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3325200.0, ans=0.0 2024-08-17 13:13:07,780 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.22 vs. limit=12.0 2024-08-17 13:13:38,885 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 550, loss[loss=0.0895, beats_loss=0.01246, ecapa_loss=0.0001214, whisper_loss=0.07583, over 18478.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01058, ecapa_loss=0.0001466, whisper_loss=0.09022, over 3630652.29 frames. ], batch size: 73, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:13:48,403 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 19 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-17 13:14:03,541 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.01 vs. limit=15.0 2024-08-17 13:14:11,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3325700.0, ans=0.0 2024-08-17 13:14:12,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=3325700.0, ans=0.02 2024-08-17 13:14:13,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3325700.0, ans=0.1 2024-08-17 13:14:25,918 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3325800.0, ans=0.0 2024-08-17 13:14:33,507 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.40 vs. limit=15.0 2024-08-17 13:14:34,072 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.469e+01 2.694e+01 3.140e+01 4.946e+01, threshold=5.388e+01, percent-clipped=0.0 2024-08-17 13:14:41,139 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-17 13:14:41,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3325900.0, ans=0.125 2024-08-17 13:14:51,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3325900.0, ans=0.1 2024-08-17 13:14:53,918 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 600, loss[loss=0.09209, beats_loss=0.0107, ecapa_loss=0.0001552, whisper_loss=0.07984, over 22439.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01049, ecapa_loss=0.0001488, whisper_loss=0.09118, over 3699425.43 frames. ], batch size: 92, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:14:54,038 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 34 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-17 13:15:44,820 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-17 13:15:56,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3326400.0, ans=0.0 2024-08-17 13:16:07,373 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 650, loss[loss=0.1181, beats_loss=0.007654, ecapa_loss=0.0001712, whisper_loss=0.1087, over 17282.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01052, ecapa_loss=0.000148, whisper_loss=0.09089, over 3784139.52 frames. ], batch size: 67, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:16:10,784 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 13 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-17 13:16:29,900 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-17 13:16:47,841 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-17 13:16:59,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3326800.0, ans=0.04949747468305833 2024-08-17 13:17:00,526 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.749e+01 2.302e+01 2.585e+01 2.989e+01 8.771e+01, threshold=5.171e+01, percent-clipped=2.0 2024-08-17 13:17:20,460 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 700, loss[loss=0.106, beats_loss=0.01159, ecapa_loss=0.0001734, whisper_loss=0.09269, over 21165.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01044, ecapa_loss=0.000149, whisper_loss=0.09156, over 3785397.22 frames. ], batch size: 90, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:17:23,943 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.59 vs. limit=15.0 2024-08-17 13:17:31,970 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.78 vs. limit=22.5 2024-08-17 13:17:36,293 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.66 vs. limit=15.0 2024-08-17 13:17:48,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3327200.0, ans=0.2 2024-08-17 13:18:07,909 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 20 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-17 13:18:10,027 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.77 vs. limit=15.0 2024-08-17 13:18:31,707 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 750, loss[loss=0.1017, beats_loss=0.01244, ecapa_loss=0.0001071, whisper_loss=0.08817, over 23150.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01046, ecapa_loss=0.0001497, whisper_loss=0.09118, over 3811825.49 frames. ], batch size: 86, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:18:31,897 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 19 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-17 13:18:35,806 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-17 13:18:50,798 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-17 13:19:00,739 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 21 from LS+wenet, 22 from Vox, 49 fro AS 2024-08-17 13:19:24,261 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.841e+01 2.330e+01 2.505e+01 2.763e+01 4.149e+01, threshold=5.011e+01, percent-clipped=0.0 2024-08-17 13:19:24,705 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.025e+01 2024-08-17 13:19:34,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3327900.0, ans=0.1 2024-08-17 13:19:38,388 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.15 vs. limit=10.0 2024-08-17 13:19:40,384 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-17 13:19:44,774 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 800, loss[loss=0.09368, beats_loss=0.01169, ecapa_loss=0.000155, whisper_loss=0.08044, over 17844.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01058, ecapa_loss=0.0001489, whisper_loss=0.09108, over 3847594.65 frames. ], batch size: 70, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:19:53,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3328000.0, ans=0.0 2024-08-17 13:19:55,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3328000.0, ans=0.125 2024-08-17 13:19:56,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3328000.0, ans=0.025 2024-08-17 13:20:05,398 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-17 13:20:55,874 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3328500.0, ans=0.0 2024-08-17 13:20:56,733 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 850, loss[loss=0.09565, beats_loss=0.01116, ecapa_loss=0.0001367, whisper_loss=0.08312, over 14262.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01054, ecapa_loss=0.0001482, whisper_loss=0.09125, over 3851717.08 frames. ], batch size: 55, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:21:03,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3328500.0, ans=0.1 2024-08-17 13:21:06,619 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-17 13:21:21,679 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-17 13:21:23,010 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-17 13:21:23,352 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 13:21:28,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3328700.0, ans=0.0 2024-08-17 13:21:32,472 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.97 vs. limit=15.0 2024-08-17 13:21:33,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3328700.0, ans=10.0 2024-08-17 13:21:37,703 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-17 13:21:49,446 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.756e+01 2.262e+01 2.529e+01 2.778e+01 3.755e+01, threshold=5.059e+01, percent-clipped=0.0 2024-08-17 13:22:08,930 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 900, loss[loss=0.1074, beats_loss=0.01017, ecapa_loss=0.0001491, whisper_loss=0.09576, over 22312.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01053, ecapa_loss=0.0001483, whisper_loss=0.09118, over 3856016.40 frames. ], batch size: 91, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:22:13,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3329000.0, ans=0.125 2024-08-17 13:22:43,975 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.59 vs. limit=15.0 2024-08-17 13:22:50,347 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.35 vs. limit=22.5 2024-08-17 13:22:51,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3329300.0, ans=0.09899494936611666 2024-08-17 13:22:51,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3329300.0, ans=0.125 2024-08-17 13:22:58,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3329300.0, ans=0.1 2024-08-17 13:23:00,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3329300.0, ans=0.0 2024-08-17 13:23:01,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3329300.0, ans=0.125 2024-08-17 13:23:05,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3329400.0, ans=0.2 2024-08-17 13:23:14,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=3329400.0, ans=0.05 2024-08-17 13:23:17,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3329400.0, ans=0.125 2024-08-17 13:23:20,462 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 950, loss[loss=0.09686, beats_loss=0.0112, ecapa_loss=0.0001552, whisper_loss=0.08411, over 21307.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0105, ecapa_loss=0.0001486, whisper_loss=0.09124, over 3828990.87 frames. ], batch size: 89, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:24:02,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3329800.0, ans=0.125 2024-08-17 13:24:12,387 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.272e+01 2.519e+01 2.767e+01 4.304e+01, threshold=5.037e+01, percent-clipped=0.0 2024-08-17 13:24:19,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3329900.0, ans=0.1 2024-08-17 13:24:19,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3329900.0, ans=0.2 2024-08-17 13:24:24,277 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-17 13:24:24,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3329900.0, ans=0.125 2024-08-17 13:24:32,111 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 1000, loss[loss=0.1138, beats_loss=0.01134, ecapa_loss=0.0001251, whisper_loss=0.1012, over 23084.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01051, ecapa_loss=0.0001484, whisper_loss=0.09093, over 3809587.99 frames. ], batch size: 89, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:24:42,027 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 13:24:46,083 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 10 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-17 13:25:14,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3330100.0, ans=0.1 2024-08-17 13:25:26,571 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3330200.0, ans=0.125 2024-08-17 13:25:30,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3330200.0, ans=0.125 2024-08-17 13:25:39,071 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-17 13:25:40,645 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-17 13:25:42,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3330300.0, ans=0.2 2024-08-17 13:25:44,634 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-17 13:25:47,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3330300.0, ans=0.0 2024-08-17 13:25:48,303 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.15 vs. limit=22.5 2024-08-17 13:26:06,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3330500.0, ans=0.125 2024-08-17 13:26:07,199 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 1050, loss[loss=0.09298, beats_loss=0.01258, ecapa_loss=0.0001423, whisper_loss=0.07898, over 20818.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01051, ecapa_loss=0.0001479, whisper_loss=0.09062, over 3836007.86 frames. ], batch size: 88, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:26:08,528 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-17 13:26:11,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3330500.0, ans=0.125 2024-08-17 13:26:21,994 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.32 vs. limit=6.0 2024-08-17 13:26:41,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3330700.0, ans=0.125 2024-08-17 13:26:58,823 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.292e+01 2.531e+01 2.835e+01 4.393e+01, threshold=5.062e+01, percent-clipped=0.0 2024-08-17 13:27:11,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3330900.0, ans=0.125 2024-08-17 13:27:18,566 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 1100, loss[loss=0.08613, beats_loss=0.008256, ecapa_loss=0.0001552, whisper_loss=0.07632, over 18346.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01043, ecapa_loss=0.0001485, whisper_loss=0.09073, over 3850860.74 frames. ], batch size: 72, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:27:24,489 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.80 vs. limit=15.0 2024-08-17 13:27:44,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3331100.0, ans=0.125 2024-08-17 13:27:45,883 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-17 13:27:59,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3331200.0, ans=0.125 2024-08-17 13:28:04,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3331300.0, ans=0.2 2024-08-17 13:28:26,966 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.12 vs. limit=15.0 2024-08-17 13:28:32,230 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 1150, loss[loss=0.1047, beats_loss=0.00969, ecapa_loss=0.0001434, whisper_loss=0.09358, over 17324.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0106, ecapa_loss=0.0001484, whisper_loss=0.08968, over 3857913.19 frames. ], batch size: 65, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:28:41,222 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 27 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-17 13:29:00,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3331600.0, ans=0.125 2024-08-17 13:29:10,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3331700.0, ans=0.0 2024-08-17 13:29:14,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3331700.0, ans=0.0 2024-08-17 13:29:17,977 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.08 vs. limit=15.0 2024-08-17 13:29:20,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3331800.0, ans=0.04949747468305833 2024-08-17 13:29:22,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3331800.0, ans=0.0 2024-08-17 13:29:26,565 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.418e+01 2.645e+01 2.995e+01 9.791e+01, threshold=5.290e+01, percent-clipped=1.0 2024-08-17 13:29:37,599 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 21 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-17 13:29:39,003 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 24 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-17 13:29:44,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3331900.0, ans=0.125 2024-08-17 13:29:46,894 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 1200, loss[loss=0.09762, beats_loss=0.01114, ecapa_loss=0.0001701, whisper_loss=0.08478, over 21149.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01046, ecapa_loss=0.00015, whisper_loss=0.09099, over 3872923.94 frames. ], batch size: 89, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:29:48,763 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-17 13:30:01,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3332100.0, ans=0.125 2024-08-17 13:30:22,379 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=19.30 vs. limit=22.5 2024-08-17 13:30:38,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3332300.0, ans=0.0 2024-08-17 13:30:42,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3332300.0, ans=0.2 2024-08-17 13:30:44,950 WARNING [optim.py:496] (0/4) Scaling gradients by 0.08344534784555435, model_norm_threshold=52.90373992919922 2024-08-17 13:30:45,116 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.1.conv_module2.depthwise_conv.causal_conv.weight with proportion 0.21, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.267e+04, grad_sumsq=4.529e+05, orig_rms_sq=1.825e-01 2024-08-17 13:30:45,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3332400.0, ans=0.04949747468305833 2024-08-17 13:30:58,797 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 13:31:01,514 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 1250, loss[loss=0.1098, beats_loss=0.008445, ecapa_loss=0.0001531, whisper_loss=0.0998, over 21107.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01061, ecapa_loss=0.0001497, whisper_loss=0.09137, over 3877097.22 frames. ], batch size: 81, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:31:18,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3332600.0, ans=0.125 2024-08-17 13:31:24,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3332600.0, ans=0.0 2024-08-17 13:31:48,279 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 14 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-17 13:31:53,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3332800.0, ans=0.1 2024-08-17 13:31:54,689 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.269e+01 2.597e+01 2.986e+01 6.340e+02, threshold=5.193e+01, percent-clipped=3.0 2024-08-17 13:31:58,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3332800.0, ans=0.0 2024-08-17 13:32:01,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3332900.0, ans=0.0 2024-08-17 13:32:02,104 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-17 13:32:06,713 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 13:32:15,204 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 1300, loss[loss=0.1013, beats_loss=0.009653, ecapa_loss=0.0001368, whisper_loss=0.09023, over 18771.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01068, ecapa_loss=0.00015, whisper_loss=0.09033, over 3867990.60 frames. ], batch size: 74, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:32:20,834 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.45 vs. limit=10.0 2024-08-17 13:32:23,907 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 25 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-17 13:32:27,081 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-17 13:32:28,525 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 21 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-17 13:32:30,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3333100.0, ans=0.125 2024-08-17 13:32:33,276 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 28 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-17 13:32:43,528 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 21 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-17 13:32:46,774 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-17 13:33:05,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3333300.0, ans=0.125 2024-08-17 13:33:21,027 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.96 vs. limit=15.0 2024-08-17 13:33:28,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3333400.0, ans=0.125 2024-08-17 13:33:34,373 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 1350, loss[loss=0.1223, beats_loss=0.009541, ecapa_loss=0.0001429, whisper_loss=0.1113, over 22732.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01061, ecapa_loss=0.0001512, whisper_loss=0.09099, over 3897653.92 frames. ], batch size: 88, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:33:34,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3333500.0, ans=0.0 2024-08-17 13:33:36,153 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 15 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-17 13:33:39,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3333500.0, ans=0.125 2024-08-17 13:33:49,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3333600.0, ans=0.0 2024-08-17 13:34:02,336 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-17 13:34:03,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3333700.0, ans=0.125 2024-08-17 13:34:23,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3333800.0, ans=0.125 2024-08-17 13:34:24,725 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.323e+01 2.578e+01 2.886e+01 4.482e+01, threshold=5.156e+01, percent-clipped=0.0 2024-08-17 13:34:31,468 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.42 vs. limit=15.0 2024-08-17 13:34:36,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3333900.0, ans=0.125 2024-08-17 13:34:44,324 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-17 13:34:45,545 WARNING [optim.py:496] (0/4) Scaling gradients by 0.028498075902462006, model_norm_threshold=51.5612907409668 2024-08-17 13:34:45,710 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.23, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.470e+05, grad_sumsq=7.470e+05, orig_rms_sq=1.000e+00 2024-08-17 13:34:45,734 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 1400, loss[loss=0.1295, beats_loss=0.01857, ecapa_loss=0.0001751, whisper_loss=0.1091, over 22389.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01069, ecapa_loss=0.0001503, whisper_loss=0.09066, over 3875023.11 frames. ], batch size: 89, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:34:47,788 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.58 vs. limit=22.5 2024-08-17 13:34:52,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3334000.0, ans=0.0 2024-08-17 13:35:17,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3334200.0, ans=0.0 2024-08-17 13:35:17,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3334200.0, ans=0.125 2024-08-17 13:35:21,393 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-17 13:35:28,650 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.518e-03 2024-08-17 13:35:47,417 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.68 vs. limit=10.0 2024-08-17 13:35:57,240 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 1450, loss[loss=0.09455, beats_loss=0.01164, ecapa_loss=0.0001702, whisper_loss=0.08121, over 13875.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01071, ecapa_loss=0.0001501, whisper_loss=0.09034, over 3897251.25 frames. ], batch size: 57, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:36:08,837 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-17 13:36:22,709 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 18 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-17 13:36:32,184 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3334700.0, ans=0.1 2024-08-17 13:36:34,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3334700.0, ans=0.1 2024-08-17 13:36:41,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3334800.0, ans=0.05 2024-08-17 13:36:45,720 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 15 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-17 13:36:49,418 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.311e+01 2.535e+01 2.723e+01 1.809e+03, threshold=5.071e+01, percent-clipped=1.0 2024-08-17 13:37:10,440 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 1500, loss[loss=0.1045, beats_loss=0.009491, ecapa_loss=0.0001282, whisper_loss=0.09376, over 20093.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01067, ecapa_loss=0.0001489, whisper_loss=0.09054, over 3924408.46 frames. ], batch size: 79, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:37:19,310 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 18 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-17 13:37:21,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3335000.0, ans=0.0 2024-08-17 13:37:27,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3335100.0, ans=0.125 2024-08-17 13:37:34,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3335100.0, ans=0.0 2024-08-17 13:37:47,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3335200.0, ans=0.0 2024-08-17 13:37:49,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3335200.0, ans=0.04949747468305833 2024-08-17 13:38:04,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3335300.0, ans=0.1 2024-08-17 13:38:07,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3335300.0, ans=0.07 2024-08-17 13:38:26,843 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 1550, loss[loss=0.1134, beats_loss=0.009732, ecapa_loss=0.0001336, whisper_loss=0.1023, over 14986.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01067, ecapa_loss=0.0001491, whisper_loss=0.09088, over 3941373.49 frames. ], batch size: 56, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:38:29,514 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=8.291e+00 2024-08-17 13:38:31,182 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-17 13:39:20,481 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.739e+01 2.357e+01 2.589e+01 2.903e+01 1.401e+02, threshold=5.177e+01, percent-clipped=4.0 2024-08-17 13:39:21,940 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 27 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-17 13:39:32,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3335900.0, ans=0.0 2024-08-17 13:39:40,418 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 1600, loss[loss=0.08873, beats_loss=0.009794, ecapa_loss=0.0001861, whisper_loss=0.07707, over 21584.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01061, ecapa_loss=0.0001511, whisper_loss=0.09034, over 3928241.10 frames. ], batch size: 91, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:39:45,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3336000.0, ans=0.125 2024-08-17 13:39:50,458 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.55 vs. limit=15.0 2024-08-17 13:39:53,118 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-17 13:40:40,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3336400.0, ans=0.2 2024-08-17 13:40:55,411 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 1650, loss[loss=0.09286, beats_loss=0.01148, ecapa_loss=0.0001903, whisper_loss=0.07948, over 18609.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01058, ecapa_loss=0.0001509, whisper_loss=0.09044, over 3902050.98 frames. ], batch size: 79, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:41:00,554 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-17 13:41:02,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3336500.0, ans=0.05 2024-08-17 13:41:03,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3336500.0, ans=0.125 2024-08-17 13:41:04,685 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 27 from LS+wenet, 12 from Vox, 37 fro AS 2024-08-17 13:41:07,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3336500.0, ans=0.1 2024-08-17 13:41:12,654 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 30 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-17 13:41:31,216 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 23 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-17 13:41:42,849 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=8.118e-03 2024-08-17 13:41:49,987 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.16 vs. limit=15.0 2024-08-17 13:41:50,389 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.845e+01 2.288e+01 2.508e+01 2.798e+01 4.258e+01, threshold=5.017e+01, percent-clipped=0.0 2024-08-17 13:42:11,743 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 1700, loss[loss=0.09554, beats_loss=0.01107, ecapa_loss=0.0001511, whisper_loss=0.08296, over 15743.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01065, ecapa_loss=0.0001511, whisper_loss=0.09042, over 3900596.80 frames. ], batch size: 67, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:42:12,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3337000.0, ans=0.125 2024-08-17 13:42:13,419 WARNING [optim.py:496] (0/4) Scaling gradients by 0.06336042284965515, model_norm_threshold=50.16535949707031 2024-08-17 13:42:13,584 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.09, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.458e+04, grad_sumsq=5.458e+04, orig_rms_sq=1.000e+00 2024-08-17 13:42:52,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3337200.0, ans=0.0 2024-08-17 13:43:16,162 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 23 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-17 13:43:26,296 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 1750, loss[loss=0.0944, beats_loss=0.01213, ecapa_loss=0.0001604, whisper_loss=0.08066, over 21206.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01071, ecapa_loss=0.0001494, whisper_loss=0.0903, over 3899684.35 frames. ], batch size: 90, lr: 2.67e-03, grad_scale: 1.152921504606847e+18 2024-08-17 13:43:43,549 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.10 vs. limit=22.5 2024-08-17 13:43:46,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3337600.0, ans=0.0 2024-08-17 13:43:51,251 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-17 13:44:03,860 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-17 13:44:16,919 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 26 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-17 13:44:22,182 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+01 2.382e+01 2.704e+01 3.004e+01 7.917e+02, threshold=5.409e+01, percent-clipped=2.0 2024-08-17 13:44:26,212 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 21 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-17 13:44:32,349 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-17 13:44:34,044 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-17 13:44:34,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3337900.0, ans=0.125 2024-08-17 13:44:42,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3338000.0, ans=0.0 2024-08-17 13:44:43,190 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 1800, loss[loss=0.08955, beats_loss=0.01175, ecapa_loss=0.0001719, whisper_loss=0.07608, over 18699.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01066, ecapa_loss=0.0001483, whisper_loss=0.09091, over 3906499.67 frames. ], batch size: 79, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:44:48,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=3338000.0, ans=0.05 2024-08-17 13:45:02,434 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 25 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-17 13:45:12,296 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.59 vs. limit=15.0 2024-08-17 13:45:17,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3338200.0, ans=0.0 2024-08-17 13:45:25,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3338200.0, ans=0.1 2024-08-17 13:45:30,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3338300.0, ans=0.125 2024-08-17 13:45:37,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3338300.0, ans=0.2 2024-08-17 13:45:43,912 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-17 13:45:55,744 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.42 vs. limit=22.5 2024-08-17 13:45:58,097 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 1850, loss[loss=0.1212, beats_loss=0.009065, ecapa_loss=0.0001141, whisper_loss=0.111, over 22765.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01063, ecapa_loss=0.0001481, whisper_loss=0.09085, over 3886241.64 frames. ], batch size: 81, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:46:00,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3338500.0, ans=0.125 2024-08-17 13:46:14,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3338600.0, ans=0.1 2024-08-17 13:46:20,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3338600.0, ans=0.0 2024-08-17 13:46:23,652 WARNING [optim.py:496] (0/4) Scaling gradients by 0.028131451457738876, model_norm_threshold=54.08749008178711 2024-08-17 13:46:23,818 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.28, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.044e+06, grad_sumsq=1.044e+06, orig_rms_sq=1.000e+00 2024-08-17 13:46:23,986 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 25 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-17 13:46:34,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3338700.0, ans=0.125 2024-08-17 13:46:41,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3338800.0, ans=0.5 2024-08-17 13:46:53,322 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.709e+01 2.302e+01 2.591e+01 3.149e+01 1.923e+03, threshold=5.181e+01, percent-clipped=2.0 2024-08-17 13:47:03,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3338900.0, ans=0.125 2024-08-17 13:47:07,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3338900.0, ans=0.2 2024-08-17 13:47:13,464 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 1900, loss[loss=0.1049, beats_loss=0.01016, ecapa_loss=0.0001536, whisper_loss=0.09317, over 21306.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01057, ecapa_loss=0.0001498, whisper_loss=0.09087, over 3908122.24 frames. ], batch size: 87, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:47:19,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3339000.0, ans=0.0 2024-08-17 13:47:38,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3339100.0, ans=0.125 2024-08-17 13:47:47,915 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.69 vs. limit=15.0 2024-08-17 13:47:58,291 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-17 13:48:26,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3339400.0, ans=0.0 2024-08-17 13:48:35,080 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 1950, loss[loss=0.09744, beats_loss=0.01169, ecapa_loss=0.0001788, whisper_loss=0.08396, over 22778.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01054, ecapa_loss=0.000149, whisper_loss=0.09104, over 3898545.51 frames. ], batch size: 95, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:48:56,762 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-17 13:49:03,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3339600.0, ans=0.125 2024-08-17 13:49:23,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3339800.0, ans=0.125 2024-08-17 13:49:30,928 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-17 13:49:36,196 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+01 2.407e+01 2.684e+01 3.018e+01 1.325e+02, threshold=5.369e+01, percent-clipped=1.0 2024-08-17 13:49:42,556 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-17 13:49:46,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3339900.0, ans=0.1 2024-08-17 13:49:56,224 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 2000, loss[loss=0.1135, beats_loss=0.009284, ecapa_loss=0.0001222, whisper_loss=0.103, over 19287.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01056, ecapa_loss=0.000147, whisper_loss=0.09064, over 3851938.73 frames. ], batch size: 70, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:50:39,798 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-17 13:51:17,313 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 2050, loss[loss=0.101, beats_loss=0.01316, ecapa_loss=0.0001519, whisper_loss=0.08634, over 19483.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01062, ecapa_loss=0.0001476, whisper_loss=0.09048, over 3883763.67 frames. ], batch size: 81, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:51:54,077 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 25 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-17 13:51:57,687 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 20 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-17 13:52:07,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3340600.0, ans=10.0 2024-08-17 13:52:10,638 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 19 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-17 13:52:21,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3340700.0, ans=0.125 2024-08-17 13:52:42,254 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.326e+01 2.537e+01 2.750e+01 3.562e+02, threshold=5.074e+01, percent-clipped=2.0 2024-08-17 13:52:42,415 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-17 13:53:03,927 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 2100, loss[loss=0.09246, beats_loss=0.01028, ecapa_loss=0.0001058, whisper_loss=0.08112, over 15778.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01059, ecapa_loss=0.0001483, whisper_loss=0.09078, over 3869475.62 frames. ], batch size: 60, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:53:05,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3341000.0, ans=0.0 2024-08-17 13:53:39,937 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.83 vs. limit=12.0 2024-08-17 13:53:42,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3341200.0, ans=0.125 2024-08-17 13:53:49,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3341300.0, ans=0.2 2024-08-17 13:53:52,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3341300.0, ans=0.0 2024-08-17 13:54:04,765 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.15 vs. limit=10.0 2024-08-17 13:54:12,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3341400.0, ans=0.125 2024-08-17 13:54:18,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3341400.0, ans=0.0 2024-08-17 13:54:21,250 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 2150, loss[loss=0.0992, beats_loss=0.01079, ecapa_loss=0.0001556, whisper_loss=0.08685, over 20838.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0106, ecapa_loss=0.000149, whisper_loss=0.09042, over 3874460.73 frames. ], batch size: 87, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:54:21,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3341500.0, ans=0.1 2024-08-17 13:54:38,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3341600.0, ans=0.125 2024-08-17 13:54:49,350 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-17 13:54:51,702 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.57 vs. limit=15.0 2024-08-17 13:55:16,218 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.587e+01 2.285e+01 2.479e+01 2.772e+01 4.505e+01, threshold=4.957e+01, percent-clipped=0.0 2024-08-17 13:55:16,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3341800.0, ans=0.125 2024-08-17 13:55:27,424 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.76 vs. limit=10.0 2024-08-17 13:55:29,524 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-17 13:55:35,073 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 2200, loss[loss=0.1042, beats_loss=0.01083, ecapa_loss=0.000151, whisper_loss=0.09183, over 22458.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01071, ecapa_loss=0.0001476, whisper_loss=0.08938, over 3894864.63 frames. ], batch size: 92, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:55:52,089 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 31 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-17 13:56:32,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3342400.0, ans=0.125 2024-08-17 13:56:47,756 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 2250, loss[loss=0.1169, beats_loss=0.009188, ecapa_loss=0.0001821, whisper_loss=0.1059, over 16274.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01063, ecapa_loss=0.000147, whisper_loss=0.08978, over 3880755.62 frames. ], batch size: 64, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:56:47,927 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 32 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-17 13:56:49,944 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.53 vs. limit=22.5 2024-08-17 13:57:02,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3342600.0, ans=0.1 2024-08-17 13:57:20,251 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-17 13:57:23,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3342700.0, ans=0.125 2024-08-17 13:57:35,653 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 22 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-17 13:57:38,161 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.806e+01 2.294e+01 2.494e+01 2.818e+01 3.738e+02, threshold=4.988e+01, percent-clipped=1.0 2024-08-17 13:57:38,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3342800.0, ans=0.1 2024-08-17 13:57:40,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3342800.0, ans=0.1 2024-08-17 13:57:44,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3342900.0, ans=0.0 2024-08-17 13:57:57,075 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 2300, loss[loss=0.105, beats_loss=0.011, ecapa_loss=0.0001512, whisper_loss=0.09252, over 22726.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01062, ecapa_loss=0.0001476, whisper_loss=0.08977, over 3885647.32 frames. ], batch size: 94, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:57:57,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3343000.0, ans=0.0 2024-08-17 13:57:58,371 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-17 13:58:07,592 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3343000.0, ans=0.125 2024-08-17 13:58:10,393 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-17 13:58:11,989 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 16 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-17 13:58:18,812 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-17 13:58:21,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3343100.0, ans=0.125 2024-08-17 13:58:25,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3343200.0, ans=0.0 2024-08-17 13:58:30,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3343200.0, ans=0.125 2024-08-17 13:59:02,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3343400.0, ans=0.125 2024-08-17 13:59:08,546 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 2350, loss[loss=0.1139, beats_loss=0.009745, ecapa_loss=0.0001537, whisper_loss=0.1027, over 22187.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01056, ecapa_loss=0.00015, whisper_loss=0.09014, over 3914562.74 frames. ], batch size: 88, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:59:14,584 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-17 13:59:19,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3343500.0, ans=0.125 2024-08-17 13:59:44,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3343700.0, ans=0.125 2024-08-17 13:59:47,001 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.63 vs. limit=15.0 2024-08-17 13:59:50,891 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-17 13:59:53,753 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-17 14:00:00,891 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.280e+01 2.527e+01 2.757e+01 5.363e+01, threshold=5.054e+01, percent-clipped=1.0 2024-08-17 14:00:01,072 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 20 from LS+wenet, 30 from Vox, 44 fro AS 2024-08-17 14:00:02,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3343800.0, ans=0.0 2024-08-17 14:00:15,460 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.07 vs. limit=22.5 2024-08-17 14:00:19,286 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 2400, loss[loss=0.1157, beats_loss=0.008751, ecapa_loss=0.0001511, whisper_loss=0.1054, over 24022.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01064, ecapa_loss=0.0001491, whisper_loss=0.08991, over 3945453.35 frames. ], batch size: 95, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:00:40,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3344100.0, ans=0.125 2024-08-17 14:00:43,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3344100.0, ans=0.1 2024-08-17 14:00:44,618 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=4.459e-02 2024-08-17 14:00:44,848 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.52 vs. limit=15.0 2024-08-17 14:00:55,726 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2024-08-17 14:01:16,091 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-17 14:01:25,537 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 2450, loss[loss=0.09145, beats_loss=0.01194, ecapa_loss=0.0001239, whisper_loss=0.07828, over 22322.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01059, ecapa_loss=0.0001493, whisper_loss=0.09003, over 3905250.76 frames. ], batch size: 89, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:01:27,386 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.78 vs. limit=22.5 2024-08-17 14:01:34,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3344500.0, ans=0.0 2024-08-17 14:01:38,969 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-17 14:01:54,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3344700.0, ans=0.125 2024-08-17 14:02:06,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3344800.0, ans=0.0 2024-08-17 14:02:12,770 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 18 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-17 14:02:14,121 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 16 from LS+wenet, 11 from Vox, 35 fro AS 2024-08-17 14:02:16,797 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.328e+01 2.680e+01 2.976e+01 4.831e+01, threshold=5.360e+01, percent-clipped=0.0 2024-08-17 14:02:19,942 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-17 14:02:28,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3344900.0, ans=0.0 2024-08-17 14:02:30,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3344900.0, ans=0.125 2024-08-17 14:02:35,421 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 2500, loss[loss=0.1065, beats_loss=0.009132, ecapa_loss=0.0001663, whisper_loss=0.09575, over 22697.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01059, ecapa_loss=0.0001496, whisper_loss=0.08963, over 3899676.90 frames. ], batch size: 91, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:02:35,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3345000.0, ans=0.0 2024-08-17 14:03:15,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3345300.0, ans=0.0 2024-08-17 14:03:18,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3345300.0, ans=0.05 2024-08-17 14:03:19,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3345300.0, ans=0.09899494936611666 2024-08-17 14:03:36,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3345400.0, ans=0.1 2024-08-17 14:03:43,017 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 2550, loss[loss=0.1022, beats_loss=0.009366, ecapa_loss=0.0001489, whisper_loss=0.09139, over 21098.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01062, ecapa_loss=0.0001492, whisper_loss=0.08929, over 3909884.64 frames. ], batch size: 83, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:03:43,178 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 26 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-17 14:03:47,804 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.92 vs. limit=10.0 2024-08-17 14:03:49,297 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.67 vs. limit=12.0 2024-08-17 14:04:33,878 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 13 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-17 14:04:35,530 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.918e+01 2.361e+01 2.576e+01 2.935e+01 4.539e+01, threshold=5.152e+01, percent-clipped=0.0 2024-08-17 14:04:44,284 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-17 14:04:47,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3345900.0, ans=0.2 2024-08-17 14:04:54,886 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 2600, loss[loss=0.1016, beats_loss=0.009775, ecapa_loss=0.0001862, whisper_loss=0.08998, over 21838.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0106, ecapa_loss=0.0001501, whisper_loss=0.08959, over 3910253.60 frames. ], batch size: 89, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:05:08,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3346100.0, ans=0.0 2024-08-17 14:05:12,230 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 20 from LS+wenet, 17 from Vox, 56 fro AS 2024-08-17 14:05:14,556 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 28 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-17 14:05:31,857 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.72 vs. limit=15.0 2024-08-17 14:05:32,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3346300.0, ans=0.125 2024-08-17 14:05:33,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3346300.0, ans=0.5 2024-08-17 14:05:52,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3346400.0, ans=0.125 2024-08-17 14:05:56,490 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-17 14:06:01,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3346500.0, ans=0.1 2024-08-17 14:06:02,624 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 2650, loss[loss=0.1039, beats_loss=0.01057, ecapa_loss=0.0001483, whisper_loss=0.09187, over 22027.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01059, ecapa_loss=0.0001493, whisper_loss=0.08961, over 3891478.31 frames. ], batch size: 90, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:06:23,830 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 21 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-17 14:06:34,289 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-17 14:06:38,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3346700.0, ans=0.0 2024-08-17 14:06:52,930 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.671e+01 2.232e+01 2.460e+01 2.794e+01 3.969e+01, threshold=4.921e+01, percent-clipped=0.0 2024-08-17 14:06:55,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3346800.0, ans=0.2 2024-08-17 14:07:11,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3346900.0, ans=0.0 2024-08-17 14:07:13,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3347000.0, ans=0.1 2024-08-17 14:07:13,879 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 2700, loss[loss=0.1123, beats_loss=0.01042, ecapa_loss=0.0001842, whisper_loss=0.1, over 20946.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01062, ecapa_loss=0.0001489, whisper_loss=0.08995, over 3900554.13 frames. ], batch size: 87, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:07:16,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3347000.0, ans=0.125 2024-08-17 14:07:40,829 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 29 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-17 14:07:43,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3347100.0, ans=0.125 2024-08-17 14:07:47,538 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.92 vs. limit=15.0 2024-08-17 14:07:58,283 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 22 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-17 14:08:06,761 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 21 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-17 14:08:12,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3347300.0, ans=0.2 2024-08-17 14:08:20,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3347400.0, ans=0.0 2024-08-17 14:08:21,742 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 34 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-17 14:08:24,209 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.26 vs. limit=15.0 2024-08-17 14:08:26,114 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 12 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-17 14:08:30,358 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 2750, loss[loss=0.09215, beats_loss=0.01209, ecapa_loss=0.0001534, whisper_loss=0.07853, over 18027.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01065, ecapa_loss=0.0001489, whisper_loss=0.09025, over 3902040.73 frames. ], batch size: 72, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:08:36,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3347500.0, ans=0.2 2024-08-17 14:09:23,889 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.299e+01 2.509e+01 2.759e+01 4.042e+01, threshold=5.018e+01, percent-clipped=0.0 2024-08-17 14:09:42,980 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 2800, loss[loss=0.107, beats_loss=0.008178, ecapa_loss=0.0001716, whisper_loss=0.09709, over 16110.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0106, ecapa_loss=0.0001488, whisper_loss=0.09049, over 3891822.25 frames. ], batch size: 64, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:10:04,722 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-17 14:10:14,034 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=4.506e-02 2024-08-17 14:10:15,958 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 30 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-17 14:10:19,429 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 16 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-17 14:10:28,302 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 19 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-17 14:10:28,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3348300.0, ans=0.125 2024-08-17 14:10:40,430 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.06 vs. limit=15.0 2024-08-17 14:10:45,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3348400.0, ans=0.0 2024-08-17 14:10:49,601 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 23 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-17 14:10:58,377 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 2850, loss[loss=0.09859, beats_loss=0.008186, ecapa_loss=0.000194, whisper_loss=0.08846, over 17419.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01057, ecapa_loss=0.0001497, whisper_loss=0.0902, over 3882729.09 frames. ], batch size: 70, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:11:13,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=3348600.0, ans=0.05 2024-08-17 14:11:19,763 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 17 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-17 14:11:22,678 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-17 14:11:29,830 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-17 14:11:30,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3348700.0, ans=0.125 2024-08-17 14:11:56,341 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.404e+01 2.584e+01 2.839e+01 1.572e+02, threshold=5.168e+01, percent-clipped=1.0 2024-08-17 14:12:17,623 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 2900, loss[loss=0.09169, beats_loss=0.008271, ecapa_loss=0.0002082, whisper_loss=0.08134, over 17063.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01053, ecapa_loss=0.0001496, whisper_loss=0.09054, over 3883929.27 frames. ], batch size: 68, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:12:33,660 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-17 14:12:43,396 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 22 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-17 14:13:18,148 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2024-08-17 14:13:28,193 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 2950, loss[loss=0.1049, beats_loss=0.01296, ecapa_loss=0.0001326, whisper_loss=0.09058, over 23630.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01053, ecapa_loss=0.0001492, whisper_loss=0.09137, over 3929864.68 frames. ], batch size: 95, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:13:43,854 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 21 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-17 14:13:47,678 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 24 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-17 14:13:49,071 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 27 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-17 14:13:49,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3349600.0, ans=0.125 2024-08-17 14:13:58,155 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 19 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-17 14:14:06,556 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-17 14:14:08,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3349800.0, ans=0.0 2024-08-17 14:14:09,562 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.64 vs. limit=22.5 2024-08-17 14:14:15,799 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.281e+01 2.587e+01 2.940e+01 5.139e+01, threshold=5.174e+01, percent-clipped=0.0 2024-08-17 14:14:18,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3349900.0, ans=0.0 2024-08-17 14:14:25,271 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 18 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-17 14:14:27,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3349900.0, ans=0.0 2024-08-17 14:14:32,936 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 3000, loss[loss=0.1044, beats_loss=0.007727, ecapa_loss=0.000143, whisper_loss=0.09529, over 18255.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01048, ecapa_loss=0.0001503, whisper_loss=0.09136, over 3924839.13 frames. ], batch size: 70, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:14:32,937 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-17 14:15:10,856 INFO [train_multi_KD3.py:1149] (0/4) Epoch 23, validation on ASR_libri: loss=0.251, beats_loss=0, ecapa_loss=0.0005243, whisper_loss=0.2458, over 922467.00 frames. 2024-08-17 14:15:29,043 INFO [train_multi_KD3.py:1149] (0/4) Epoch 23, validation on SV_voxceleb1: loss=0.004133, beats_loss=0, ecapa_loss=0.0004133, whisper_loss=0, over 939242.00 frames. 2024-08-17 14:17:17,936 INFO [train_multi_KD3.py:1149] (0/4) Epoch 23, validation on AT_audioset: loss=0.02323, beats_loss=0.02323, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-17 14:17:17,940 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32389MB 2024-08-17 14:17:33,436 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 31 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-17 14:17:35,169 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-17 14:17:41,338 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 20 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-17 14:18:05,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3350300.0, ans=0.07 2024-08-17 14:18:22,865 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 13 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-17 14:18:23,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3350400.0, ans=0.125 2024-08-17 14:18:26,893 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 3050, loss[loss=0.1046, beats_loss=0.01031, ecapa_loss=0.0001493, whisper_loss=0.09279, over 14021.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01053, ecapa_loss=0.0001498, whisper_loss=0.09107, over 3910512.15 frames. ], batch size: 54, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:18:27,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3350500.0, ans=0.0 2024-08-17 14:18:39,869 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-17 14:18:45,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3350600.0, ans=0.1 2024-08-17 14:19:15,621 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 16 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-17 14:19:17,886 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.330e+01 2.532e+01 2.837e+01 4.705e+01, threshold=5.064e+01, percent-clipped=0.0 2024-08-17 14:19:33,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3351000.0, ans=0.125 2024-08-17 14:19:34,186 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 3100, loss[loss=0.09487, beats_loss=0.01201, ecapa_loss=0.0001263, whisper_loss=0.0816, over 23678.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01051, ecapa_loss=0.0001503, whisper_loss=0.0915, over 3938302.55 frames. ], batch size: 95, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:19:34,716 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.32 vs. limit=15.0 2024-08-17 14:19:39,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3351000.0, ans=0.2 2024-08-17 14:19:59,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3351200.0, ans=0.1 2024-08-17 14:20:01,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3351200.0, ans=0.125 2024-08-17 14:20:13,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3351300.0, ans=0.5 2024-08-17 14:20:17,887 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.76 vs. limit=15.0 2024-08-17 14:20:22,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3351300.0, ans=0.125 2024-08-17 14:20:37,353 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 3150, loss[loss=0.1181, beats_loss=0.009937, ecapa_loss=0.0001541, whisper_loss=0.1067, over 18529.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01049, ecapa_loss=0.0001493, whisper_loss=0.09195, over 3926370.74 frames. ], batch size: 74, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:20:39,212 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.79 vs. limit=15.0 2024-08-17 14:20:47,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3351500.0, ans=0.1 2024-08-17 14:20:51,165 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=6.100e+00 2024-08-17 14:21:05,610 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 24 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-17 14:21:07,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3351700.0, ans=0.125 2024-08-17 14:21:08,075 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 40 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-17 14:21:23,117 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.388e+01 2.692e+01 3.115e+01 4.630e+01, threshold=5.383e+01, percent-clipped=0.0 2024-08-17 14:21:32,110 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 23 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-17 14:21:39,581 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 3200, loss[loss=0.09107, beats_loss=0.009929, ecapa_loss=0.0001944, whisper_loss=0.07919, over 20804.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01047, ecapa_loss=0.000149, whisper_loss=0.09172, over 3958065.68 frames. ], batch size: 89, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:21:49,097 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.83 vs. limit=15.0 2024-08-17 14:21:58,208 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-17 14:22:09,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3352200.0, ans=0.125 2024-08-17 14:22:11,737 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 27 from LS+wenet, 29 from Vox, 22 fro AS 2024-08-17 14:22:17,344 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.51 vs. limit=15.0 2024-08-17 14:22:21,697 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 27 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-17 14:22:40,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3352500.0, ans=0.09899494936611666 2024-08-17 14:22:41,331 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 3250, loss[loss=0.1278, beats_loss=0.008668, ecapa_loss=0.0001481, whisper_loss=0.1176, over 22308.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01045, ecapa_loss=0.0001496, whisper_loss=0.09204, over 3962018.92 frames. ], batch size: 86, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:22:45,392 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-17 14:22:45,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3352500.0, ans=0.125 2024-08-17 14:22:45,953 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.00 vs. limit=22.5 2024-08-17 14:22:55,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3352600.0, ans=0.125 2024-08-17 14:23:06,308 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-17 14:23:13,678 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 20 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-17 14:23:16,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3352700.0, ans=0.0 2024-08-17 14:23:20,098 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-17 14:23:21,269 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 25 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-17 14:23:27,094 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.969e+01 2.470e+01 2.720e+01 3.088e+01 1.006e+02, threshold=5.440e+01, percent-clipped=1.0 2024-08-17 14:23:30,573 WARNING [optim.py:496] (0/4) Scaling gradients by 0.0725674256682396, model_norm_threshold=54.39925765991211 2024-08-17 14:23:30,732 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.19, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.092e+05, grad_sumsq=1.076e+07, orig_rms_sq=1.015e-02 2024-08-17 14:23:31,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3352900.0, ans=0.0 2024-08-17 14:23:41,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3352900.0, ans=0.125 2024-08-17 14:23:43,721 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 3300, loss[loss=0.107, beats_loss=0.009317, ecapa_loss=0.0001792, whisper_loss=0.09587, over 20144.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01069, ecapa_loss=0.0001504, whisper_loss=0.09108, over 3924809.24 frames. ], batch size: 82, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:24:04,338 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2024-08-17 14:24:11,321 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.48 vs. limit=22.5 2024-08-17 14:24:19,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3353300.0, ans=0.1 2024-08-17 14:24:37,849 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.06 vs. limit=15.0 2024-08-17 14:24:45,386 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 3350, loss[loss=0.1135, beats_loss=0.008797, ecapa_loss=0.000175, whisper_loss=0.103, over 19937.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01071, ecapa_loss=0.0001498, whisper_loss=0.0907, over 3915160.19 frames. ], batch size: 84, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:24:53,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3353500.0, ans=0.0 2024-08-17 14:24:58,993 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.18 vs. limit=10.0 2024-08-17 14:25:10,499 WARNING [optim.py:496] (0/4) Scaling gradients by 0.057717882096767426, model_norm_threshold=54.39925765991211 2024-08-17 14:25:10,662 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.125e+05, grad_sumsq=1.125e+05, orig_rms_sq=1.000e+00 2024-08-17 14:25:18,178 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 22 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-17 14:25:18,595 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.11 vs. limit=15.0 2024-08-17 14:25:23,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3353800.0, ans=0.95 2024-08-17 14:25:28,084 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 24 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-17 14:25:31,356 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.095e+01 2.365e+01 2.660e+01 2.915e+01 9.425e+02, threshold=5.320e+01, percent-clipped=5.0 2024-08-17 14:25:37,479 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.34 vs. limit=22.5 2024-08-17 14:25:39,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3353900.0, ans=0.2 2024-08-17 14:25:41,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3353900.0, ans=0.125 2024-08-17 14:25:47,686 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 3400, loss[loss=0.1034, beats_loss=0.01193, ecapa_loss=0.0001187, whisper_loss=0.09025, over 19417.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0106, ecapa_loss=0.0001506, whisper_loss=0.09071, over 3898132.29 frames. ], batch size: 76, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:25:59,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3354100.0, ans=0.1 2024-08-17 14:26:07,321 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-17 14:26:12,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3354200.0, ans=0.125 2024-08-17 14:26:20,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3354200.0, ans=0.2 2024-08-17 14:26:26,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3354300.0, ans=0.125 2024-08-17 14:26:33,600 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-17 14:26:50,035 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 3450, loss[loss=0.1167, beats_loss=0.007709, ecapa_loss=0.0001617, whisper_loss=0.1074, over 15550.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0105, ecapa_loss=0.0001519, whisper_loss=0.09032, over 3863521.23 frames. ], batch size: 61, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:26:52,110 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.17 vs. limit=15.0 2024-08-17 14:27:01,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3354500.0, ans=0.2 2024-08-17 14:27:12,447 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 21 from LS+wenet, 10 from Vox, 23 fro AS 2024-08-17 14:27:16,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3354700.0, ans=0.125 2024-08-17 14:27:18,067 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.00 vs. limit=12.0 2024-08-17 14:27:30,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3354800.0, ans=0.0 2024-08-17 14:27:37,060 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.241e+01 2.529e+01 2.817e+01 6.166e+01, threshold=5.057e+01, percent-clipped=1.0 2024-08-17 14:27:47,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3354900.0, ans=0.015 2024-08-17 14:27:49,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3354900.0, ans=0.0 2024-08-17 14:27:52,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3355000.0, ans=0.0 2024-08-17 14:27:52,952 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 3500, loss[loss=0.1081, beats_loss=0.009087, ecapa_loss=0.0001342, whisper_loss=0.09766, over 17791.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01052, ecapa_loss=0.0001511, whisper_loss=0.08986, over 3831105.86 frames. ], batch size: 68, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:28:03,247 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.60 vs. limit=15.0 2024-08-17 14:28:21,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3355200.0, ans=0.125 2024-08-17 14:28:23,333 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 35 from Vox, 30 fro AS 2024-08-17 14:28:24,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3355200.0, ans=0.1 2024-08-17 14:28:36,574 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 24 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-17 14:28:36,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3355300.0, ans=0.125 2024-08-17 14:28:50,902 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 14 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-17 14:28:51,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3355400.0, ans=0.025 2024-08-17 14:28:53,166 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 3550, loss[loss=0.115, beats_loss=0.01051, ecapa_loss=0.0001447, whisper_loss=0.103, over 24418.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01057, ecapa_loss=0.0001503, whisper_loss=0.08977, over 3849736.79 frames. ], batch size: 95, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:28:58,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3355500.0, ans=0.125 2024-08-17 14:29:02,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3355500.0, ans=0.125 2024-08-17 14:29:14,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=3355600.0, ans=15.0 2024-08-17 14:29:16,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3355700.0, ans=0.0 2024-08-17 14:29:18,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3355700.0, ans=0.125 2024-08-17 14:29:19,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3355700.0, ans=0.125 2024-08-17 14:29:38,580 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.236e+01 2.510e+01 2.747e+01 4.772e+01, threshold=5.021e+01, percent-clipped=0.0 2024-08-17 14:29:38,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3355800.0, ans=0.1 2024-08-17 14:29:54,608 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 3600, loss[loss=0.09317, beats_loss=0.01161, ecapa_loss=0.0001254, whisper_loss=0.08031, over 23295.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01055, ecapa_loss=0.0001493, whisper_loss=0.09001, over 3884837.52 frames. ], batch size: 92, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:29:54,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3356000.0, ans=0.2 2024-08-17 14:30:08,155 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 18 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-17 14:30:21,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3356200.0, ans=0.0 2024-08-17 14:30:22,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3356200.0, ans=0.125 2024-08-17 14:30:36,490 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.35 vs. limit=15.0 2024-08-17 14:30:39,878 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 25 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-17 14:30:52,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3356400.0, ans=0.125 2024-08-17 14:30:56,142 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 3650, loss[loss=0.1083, beats_loss=0.01189, ecapa_loss=0.0001214, whisper_loss=0.09521, over 21755.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01064, ecapa_loss=0.0001477, whisper_loss=0.08993, over 3861740.87 frames. ], batch size: 87, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:31:00,046 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 22 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-17 14:31:11,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3356600.0, ans=0.2 2024-08-17 14:31:22,792 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 18 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-17 14:31:23,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3356700.0, ans=0.125 2024-08-17 14:31:25,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3356700.0, ans=0.1 2024-08-17 14:31:29,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3356700.0, ans=0.125 2024-08-17 14:31:31,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3356800.0, ans=10.0 2024-08-17 14:31:40,880 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.623e+01 2.172e+01 2.480e+01 2.940e+01 4.712e+01, threshold=4.960e+01, percent-clipped=0.0 2024-08-17 14:31:41,044 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 17 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-17 14:31:41,960 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.92 vs. limit=6.0 2024-08-17 14:31:45,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3356900.0, ans=0.1 2024-08-17 14:31:48,548 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3356900.0, ans=0.125 2024-08-17 14:31:56,902 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 3700, loss[loss=0.1089, beats_loss=0.008507, ecapa_loss=0.0001617, whisper_loss=0.09877, over 19682.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01067, ecapa_loss=0.0001474, whisper_loss=0.0902, over 3899218.69 frames. ], batch size: 78, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:32:04,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3357000.0, ans=0.0 2024-08-17 14:32:05,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3357000.0, ans=0.125 2024-08-17 14:32:09,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3357100.0, ans=0.125 2024-08-17 14:32:23,124 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 14:32:23,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=3357200.0, ans=0.5 2024-08-17 14:32:25,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3357200.0, ans=0.125 2024-08-17 14:32:25,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3357200.0, ans=0.125 2024-08-17 14:32:26,662 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-17 14:32:36,626 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 14 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-17 14:32:40,038 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 20 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-17 14:32:44,645 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.72 vs. limit=15.0 2024-08-17 14:32:53,195 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.67 vs. limit=22.5 2024-08-17 14:32:58,712 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 3750, loss[loss=0.1083, beats_loss=0.009146, ecapa_loss=0.0001279, whisper_loss=0.09791, over 19682.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01064, ecapa_loss=0.0001468, whisper_loss=0.08992, over 3903306.00 frames. ], batch size: 71, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:33:01,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3357500.0, ans=0.125 2024-08-17 14:33:11,321 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-17 14:33:15,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3357600.0, ans=0.125 2024-08-17 14:33:27,321 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-17 14:33:33,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3357700.0, ans=0.0 2024-08-17 14:33:44,595 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.285e+01 2.536e+01 2.853e+01 4.848e+01, threshold=5.071e+01, percent-clipped=0.0 2024-08-17 14:34:00,720 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 3800, loss[loss=0.12, beats_loss=0.008571, ecapa_loss=0.0001537, whisper_loss=0.1099, over 21670.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01064, ecapa_loss=0.0001467, whisper_loss=0.08995, over 3926858.32 frames. ], batch size: 86, lr: 2.67e-03, grad_scale: 1.152921504606847e+18 2024-08-17 14:34:02,335 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.21 vs. limit=10.0 2024-08-17 14:34:04,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3358000.0, ans=0.0 2024-08-17 14:34:24,301 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.03 vs. limit=10.0 2024-08-17 14:34:29,876 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-17 14:34:32,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3358200.0, ans=0.1 2024-08-17 14:34:40,665 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.00 vs. limit=10.0 2024-08-17 14:34:52,486 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.67 vs. limit=15.0 2024-08-17 14:34:57,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3358400.0, ans=0.0 2024-08-17 14:34:58,764 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-17 14:35:02,359 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 3850, loss[loss=0.07017, beats_loss=0.01403, ecapa_loss=0.0001572, whisper_loss=0.05457, over 18480.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01058, ecapa_loss=0.0001466, whisper_loss=0.08998, over 3913451.30 frames. ], batch size: 79, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:35:10,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3358500.0, ans=0.125 2024-08-17 14:35:17,662 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-17 14:35:18,752 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-17 14:35:22,377 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-17 14:35:29,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3358700.0, ans=0.07 2024-08-17 14:35:33,441 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 40 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-17 14:35:33,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3358700.0, ans=0.125 2024-08-17 14:35:39,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3358800.0, ans=0.125 2024-08-17 14:35:49,067 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.772e+01 2.270e+01 2.483e+01 2.770e+01 3.754e+01, threshold=4.966e+01, percent-clipped=0.0 2024-08-17 14:35:49,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3358800.0, ans=0.0 2024-08-17 14:35:50,457 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 27 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-17 14:35:55,436 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-17 14:36:01,625 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-17 14:36:03,805 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 3900, loss[loss=0.1263, beats_loss=0.009773, ecapa_loss=0.0001276, whisper_loss=0.1153, over 15239.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01049, ecapa_loss=0.0001474, whisper_loss=0.09069, over 3907971.68 frames. ], batch size: 57, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:36:07,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3359000.0, ans=0.2 2024-08-17 14:36:26,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3359100.0, ans=0.0 2024-08-17 14:36:32,138 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 23 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-17 14:36:33,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3359200.0, ans=0.035 2024-08-17 14:36:34,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3359200.0, ans=0.2 2024-08-17 14:36:46,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3359300.0, ans=0.125 2024-08-17 14:36:50,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3359300.0, ans=0.125 2024-08-17 14:36:54,429 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2024-08-17 14:36:57,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3359400.0, ans=0.1 2024-08-17 14:37:06,360 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 3950, loss[loss=0.1029, beats_loss=0.009477, ecapa_loss=0.0001808, whisper_loss=0.0916, over 14737.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01043, ecapa_loss=0.0001473, whisper_loss=0.0913, over 3912463.96 frames. ], batch size: 59, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:37:13,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3359500.0, ans=0.125 2024-08-17 14:37:33,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3359700.0, ans=0.0 2024-08-17 14:37:41,883 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3359700.0, ans=0.125 2024-08-17 14:37:45,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3359800.0, ans=0.0 2024-08-17 14:37:51,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3359800.0, ans=0.125 2024-08-17 14:37:53,316 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 21 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-17 14:37:58,587 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.742e+01 2.357e+01 2.553e+01 2.882e+01 5.685e+01, threshold=5.106e+01, percent-clipped=1.0 2024-08-17 14:38:03,560 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 11 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-17 14:38:09,518 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 22 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-17 14:38:15,319 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-336000.pt 2024-08-17 14:38:18,716 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 4000, loss[loss=0.1151, beats_loss=0.007507, ecapa_loss=0.0001274, whisper_loss=0.1063, over 19199.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01045, ecapa_loss=0.0001474, whisper_loss=0.09117, over 3905614.71 frames. ], batch size: 70, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:38:18,879 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 19 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-17 14:38:21,322 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-17 14:38:44,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3360100.0, ans=0.125 2024-08-17 14:38:45,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3360100.0, ans=0.125 2024-08-17 14:38:57,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3360200.0, ans=0.125 2024-08-17 14:38:58,787 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 33 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-17 14:39:16,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3360300.0, ans=0.125 2024-08-17 14:39:34,777 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 4050, loss[loss=0.1094, beats_loss=0.007623, ecapa_loss=0.0001395, whisper_loss=0.1004, over 17112.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01045, ecapa_loss=0.0001476, whisper_loss=0.09139, over 3901495.00 frames. ], batch size: 63, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:39:40,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3360500.0, ans=0.1 2024-08-17 14:40:02,000 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 13 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-17 14:40:05,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3360700.0, ans=0.1 2024-08-17 14:40:08,130 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 18 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-17 14:40:12,399 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-17 14:40:15,646 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 31 from Vox, 30 fro AS 2024-08-17 14:40:29,952 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.805e+01 2.266e+01 2.528e+01 2.779e+01 4.224e+01, threshold=5.056e+01, percent-clipped=0.0 2024-08-17 14:40:40,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3360900.0, ans=0.125 2024-08-17 14:40:41,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3360900.0, ans=0.07 2024-08-17 14:40:47,271 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 4100, loss[loss=0.1101, beats_loss=0.007725, ecapa_loss=0.0001361, whisper_loss=0.101, over 14428.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01041, ecapa_loss=0.0001484, whisper_loss=0.0913, over 3876682.49 frames. ], batch size: 55, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:40:49,629 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.47 vs. limit=12.0 2024-08-17 14:41:17,523 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.25 vs. limit=22.5 2024-08-17 14:41:24,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3361200.0, ans=0.0 2024-08-17 14:41:47,972 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.35 vs. limit=22.5 2024-08-17 14:42:02,454 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 4150, loss[loss=0.1066, beats_loss=0.009022, ecapa_loss=0.0001264, whisper_loss=0.09633, over 22108.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01042, ecapa_loss=0.0001473, whisper_loss=0.09167, over 3898556.33 frames. ], batch size: 83, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:42:56,842 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.341e+01 2.562e+01 2.816e+01 4.019e+01, threshold=5.124e+01, percent-clipped=0.0 2024-08-17 14:42:59,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3361900.0, ans=0.125 2024-08-17 14:43:13,751 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 4200, loss[loss=0.1341, beats_loss=0.007286, ecapa_loss=0.0001546, whisper_loss=0.1252, over 23176.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01049, ecapa_loss=0.0001482, whisper_loss=0.09114, over 3880095.56 frames. ], batch size: 90, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:43:16,133 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-17 14:43:18,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3362000.0, ans=0.125 2024-08-17 14:43:33,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3362100.0, ans=0.0 2024-08-17 14:43:45,772 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.93 vs. limit=15.0 2024-08-17 14:44:02,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3362300.0, ans=0.1 2024-08-17 14:44:03,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3362300.0, ans=0.0 2024-08-17 14:44:21,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3362400.0, ans=0.1 2024-08-17 14:44:22,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3362400.0, ans=0.0 2024-08-17 14:44:26,493 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 4250, loss[loss=0.1071, beats_loss=0.01103, ecapa_loss=0.0001386, whisper_loss=0.09467, over 22112.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01052, ecapa_loss=0.0001487, whisper_loss=0.09039, over 3861263.07 frames. ], batch size: 88, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:44:26,776 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-17 14:44:41,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3362600.0, ans=0.1 2024-08-17 14:44:48,581 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.94 vs. limit=10.0 2024-08-17 14:44:49,186 WARNING [optim.py:496] (0/4) Scaling gradients by 0.06460781395435333, model_norm_threshold=51.23520278930664 2024-08-17 14:44:49,352 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.0.norm.log_scale with proportion 0.12, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.238e+04, grad_sumsq=7.238e+04, orig_rms_sq=1.000e+00 2024-08-17 14:44:49,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3362600.0, ans=0.0 2024-08-17 14:44:56,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3362700.0, ans=0.1 2024-08-17 14:45:06,001 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.29 vs. limit=22.5 2024-08-17 14:45:11,525 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-17 14:45:13,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=3362800.0, ans=10.0 2024-08-17 14:45:13,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3362800.0, ans=0.125 2024-08-17 14:45:21,238 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.09 vs. limit=12.0 2024-08-17 14:45:23,606 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.849e+01 2.395e+01 2.661e+01 3.176e+01 7.930e+02, threshold=5.321e+01, percent-clipped=4.0 2024-08-17 14:45:26,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3362900.0, ans=0.2 2024-08-17 14:45:40,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3363000.0, ans=0.0 2024-08-17 14:45:41,398 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 4300, loss[loss=0.09232, beats_loss=0.01122, ecapa_loss=0.000126, whisper_loss=0.07985, over 19880.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0105, ecapa_loss=0.0001483, whisper_loss=0.09109, over 3848882.01 frames. ], batch size: 78, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:45:50,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3363000.0, ans=0.0 2024-08-17 14:45:52,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3363000.0, ans=0.05 2024-08-17 14:45:59,919 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 18 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-17 14:46:01,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3363100.0, ans=0.125 2024-08-17 14:46:26,947 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.11 vs. limit=22.5 2024-08-17 14:46:34,429 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 14:46:44,030 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-17 14:46:46,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3363400.0, ans=0.1 2024-08-17 14:46:47,066 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 22 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-17 14:46:47,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3363400.0, ans=0.0 2024-08-17 14:46:51,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3363400.0, ans=0.125 2024-08-17 14:46:56,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3363500.0, ans=0.1 2024-08-17 14:46:57,291 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 4350, loss[loss=0.0875, beats_loss=0.01064, ecapa_loss=0.000151, whisper_loss=0.07534, over 14152.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01045, ecapa_loss=0.0001493, whisper_loss=0.09142, over 3862588.96 frames. ], batch size: 57, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:47:05,323 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 19 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-17 14:47:07,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3363500.0, ans=0.0 2024-08-17 14:47:24,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3363600.0, ans=0.1 2024-08-17 14:47:35,764 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.21 vs. limit=15.0 2024-08-17 14:47:48,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3363800.0, ans=0.125 2024-08-17 14:47:51,519 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.70 vs. limit=15.0 2024-08-17 14:47:53,260 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.243e+01 2.531e+01 2.839e+01 4.488e+01, threshold=5.063e+01, percent-clipped=0.0 2024-08-17 14:48:10,575 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 4400, loss[loss=0.08976, beats_loss=0.0119, ecapa_loss=0.0001206, whisper_loss=0.07666, over 22668.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01057, ecapa_loss=0.0001482, whisper_loss=0.09002, over 3890904.84 frames. ], batch size: 91, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:48:10,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3364000.0, ans=0.0 2024-08-17 14:48:22,273 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.63 vs. limit=22.5 2024-08-17 14:48:54,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3364300.0, ans=0.125 2024-08-17 14:49:06,945 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 31 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-17 14:49:08,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3364400.0, ans=0.125 2024-08-17 14:49:21,710 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 4450, loss[loss=0.07266, beats_loss=0.01063, ecapa_loss=0.0001478, whisper_loss=0.06056, over 13963.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0105, ecapa_loss=0.0001486, whisper_loss=0.09053, over 3885747.94 frames. ], batch size: 57, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:49:37,657 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 24 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-17 14:49:58,675 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.85 vs. limit=15.0 2024-08-17 14:50:14,568 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.276e+01 2024-08-17 14:50:16,737 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.265e+01 2.477e+01 2.746e+01 3.870e+01, threshold=4.955e+01, percent-clipped=0.0 2024-08-17 14:50:20,100 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.89 vs. limit=22.5 2024-08-17 14:50:23,030 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.93 vs. limit=15.0 2024-08-17 14:50:33,271 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 4500, loss[loss=0.09116, beats_loss=0.01279, ecapa_loss=0.0001351, whisper_loss=0.07701, over 16648.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01056, ecapa_loss=0.0001477, whisper_loss=0.09038, over 3882815.84 frames. ], batch size: 66, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:50:40,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3365000.0, ans=0.025 2024-08-17 14:50:42,222 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.23 vs. limit=15.0 2024-08-17 14:50:48,150 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 23 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-17 14:50:50,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3365100.0, ans=0.125 2024-08-17 14:50:52,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3365100.0, ans=0.125 2024-08-17 14:50:56,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=3365100.0, ans=15.0 2024-08-17 14:50:57,848 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-17 14:50:59,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3365200.0, ans=0.125 2024-08-17 14:51:10,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3365200.0, ans=0.125 2024-08-17 14:51:21,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3365300.0, ans=0.1 2024-08-17 14:51:24,157 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-17 14:51:31,147 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-17 14:51:38,123 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 32 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-17 14:51:42,255 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 4550, loss[loss=0.09983, beats_loss=0.01091, ecapa_loss=0.0001355, whisper_loss=0.08757, over 22569.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0106, ecapa_loss=0.0001485, whisper_loss=0.09037, over 3900982.46 frames. ], batch size: 86, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:51:48,021 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 18 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-17 14:51:50,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3365500.0, ans=0.2 2024-08-17 14:52:00,464 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 14 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-17 14:52:11,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3365700.0, ans=0.0 2024-08-17 14:52:14,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3365700.0, ans=0.0 2024-08-17 14:52:31,900 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.242e+01 2.494e+01 2.758e+01 4.249e+01, threshold=4.988e+01, percent-clipped=0.0 2024-08-17 14:52:35,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3365900.0, ans=0.125 2024-08-17 14:52:43,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3365900.0, ans=0.125 2024-08-17 14:52:47,112 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 4600, loss[loss=0.09315, beats_loss=0.009299, ecapa_loss=0.0001667, whisper_loss=0.08219, over 22459.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01061, ecapa_loss=0.000149, whisper_loss=0.09034, over 3910688.34 frames. ], batch size: 93, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:52:49,525 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 30 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-17 14:52:54,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3366000.0, ans=0.0 2024-08-17 14:53:19,002 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-17 14:53:19,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3366200.0, ans=0.125 2024-08-17 14:53:26,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3366300.0, ans=0.125 2024-08-17 14:53:33,676 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 22 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-17 14:53:36,450 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.20 vs. limit=15.0 2024-08-17 14:53:40,283 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.94 vs. limit=15.0 2024-08-17 14:53:44,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3366400.0, ans=0.125 2024-08-17 14:53:48,227 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 4650, loss[loss=0.09128, beats_loss=0.01367, ecapa_loss=0.0001099, whisper_loss=0.07651, over 23253.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01063, ecapa_loss=0.0001482, whisper_loss=0.08999, over 3882481.38 frames. ], batch size: 95, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:53:50,343 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.06 vs. limit=15.0 2024-08-17 14:53:51,743 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=18.55 vs. limit=22.5 2024-08-17 14:53:58,938 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.89 vs. limit=22.5 2024-08-17 14:54:02,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3366600.0, ans=0.125 2024-08-17 14:54:06,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3366600.0, ans=0.125 2024-08-17 14:54:07,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3366600.0, ans=0.0 2024-08-17 14:54:18,600 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 24 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-17 14:54:30,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3366800.0, ans=0.2 2024-08-17 14:54:31,063 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 23 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-17 14:54:31,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3366800.0, ans=0.1 2024-08-17 14:54:34,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3366800.0, ans=0.1 2024-08-17 14:54:35,759 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.359e+01 2.591e+01 2.983e+01 4.791e+01, threshold=5.183e+01, percent-clipped=0.0 2024-08-17 14:54:44,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3366900.0, ans=0.125 2024-08-17 14:54:44,456 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.72 vs. limit=15.0 2024-08-17 14:54:51,077 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 4700, loss[loss=0.1196, beats_loss=0.01023, ecapa_loss=0.0001232, whisper_loss=0.1081, over 21399.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01064, ecapa_loss=0.0001478, whisper_loss=0.08985, over 3893610.33 frames. ], batch size: 81, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:54:57,501 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 13 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-17 14:55:01,484 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 30 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-17 14:55:16,199 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-17 14:55:34,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3367300.0, ans=0.125 2024-08-17 14:55:53,158 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 4750, loss[loss=0.1246, beats_loss=0.008801, ecapa_loss=0.0001379, whisper_loss=0.1144, over 14347.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0106, ecapa_loss=0.0001481, whisper_loss=0.09018, over 3858772.13 frames. ], batch size: 54, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:55:56,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3367500.0, ans=0.125 2024-08-17 14:56:09,300 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-17 14:56:10,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3367600.0, ans=0.125 2024-08-17 14:56:21,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3367700.0, ans=0.0 2024-08-17 14:56:22,857 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-17 14:56:40,094 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.703e+01 2.331e+01 2.583e+01 3.020e+01 4.522e+01, threshold=5.166e+01, percent-clipped=0.0 2024-08-17 14:56:42,687 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 20 from LS+wenet, 23 from Vox, 17 fro AS 2024-08-17 14:56:48,017 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.614e-02 2024-08-17 14:56:55,675 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 4800, loss[loss=0.09804, beats_loss=0.01114, ecapa_loss=0.0001374, whisper_loss=0.08552, over 14328.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01068, ecapa_loss=0.0001478, whisper_loss=0.08908, over 3854066.36 frames. ], batch size: 57, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:56:57,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3368000.0, ans=0.025 2024-08-17 14:56:58,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3368000.0, ans=0.1 2024-08-17 14:57:01,403 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.27 vs. limit=10.0 2024-08-17 14:57:14,607 WARNING [optim.py:496] (0/4) Scaling gradients by 0.09035546332597733, model_norm_threshold=51.66227722167969 2024-08-17 14:57:14,768 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.0.norm.log_scale with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.290e+04, grad_sumsq=4.290e+04, orig_rms_sq=1.000e+00 2024-08-17 14:57:15,475 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.17 vs. limit=15.0 2024-08-17 14:57:31,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3368200.0, ans=0.2 2024-08-17 14:57:40,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3368300.0, ans=0.125 2024-08-17 14:57:57,567 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 4850, loss[loss=0.08482, beats_loss=0.01249, ecapa_loss=0.0001439, whisper_loss=0.07089, over 19597.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01071, ecapa_loss=0.0001479, whisper_loss=0.08901, over 3885327.04 frames. ], batch size: 79, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:58:17,439 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 26 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-17 14:58:22,191 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-17 14:58:23,843 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 20 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-17 14:58:27,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3368700.0, ans=0.2 2024-08-17 14:58:42,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3368800.0, ans=0.125 2024-08-17 14:58:44,398 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.617e+01 2.372e+01 2.629e+01 2.886e+01 5.718e+02, threshold=5.259e+01, percent-clipped=2.0 2024-08-17 14:58:50,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3368900.0, ans=0.04949747468305833 2024-08-17 14:58:51,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3368900.0, ans=0.125 2024-08-17 14:58:54,068 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 19 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-17 14:58:57,525 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 26 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-17 14:58:58,568 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 4900, loss[loss=0.0972, beats_loss=0.01093, ecapa_loss=0.0001987, whisper_loss=0.08428, over 18886.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01071, ecapa_loss=0.000148, whisper_loss=0.08942, over 3884847.59 frames. ], batch size: 82, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:59:06,646 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.54 vs. limit=15.0 2024-08-17 14:59:19,812 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-17 14:59:42,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3369300.0, ans=0.0 2024-08-17 14:59:48,932 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.09 vs. limit=15.0 2024-08-17 15:00:00,769 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 4950, loss[loss=0.1037, beats_loss=0.01271, ecapa_loss=0.000127, whisper_loss=0.08973, over 21922.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01065, ecapa_loss=0.0001468, whisper_loss=0.08945, over 3868445.14 frames. ], batch size: 89, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:00:10,919 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 13 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-17 15:00:12,015 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 35 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-17 15:00:32,735 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 15 from LS+wenet, 8 from Vox, 30 fro AS 2024-08-17 15:00:34,681 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.58 vs. limit=22.5 2024-08-17 15:00:42,023 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=10.14 vs. limit=12.0 2024-08-17 15:00:42,680 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-17 15:00:44,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3369800.0, ans=0.0 2024-08-17 15:00:47,634 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.355e+01 2.551e+01 2.775e+01 1.010e+02, threshold=5.101e+01, percent-clipped=1.0 2024-08-17 15:00:50,141 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-17 15:00:52,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3369900.0, ans=0.125 2024-08-17 15:00:54,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3369900.0, ans=0.0 2024-08-17 15:00:57,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3369900.0, ans=0.125 2024-08-17 15:01:02,477 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 5000, loss[loss=0.1104, beats_loss=0.01096, ecapa_loss=0.0001625, whisper_loss=0.09786, over 21038.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01062, ecapa_loss=0.0001479, whisper_loss=0.08913, over 3854052.98 frames. ], batch size: 85, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:01:04,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3370000.0, ans=0.125 2024-08-17 15:01:26,507 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.03 vs. limit=15.0 2024-08-17 15:01:44,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3370300.0, ans=0.1 2024-08-17 15:01:57,371 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 18 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-17 15:02:04,846 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 5050, loss[loss=0.1126, beats_loss=0.0103, ecapa_loss=0.0001409, whisper_loss=0.1009, over 17388.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0105, ecapa_loss=0.0001474, whisper_loss=0.09082, over 3871099.71 frames. ], batch size: 67, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:02:05,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3370500.0, ans=0.125 2024-08-17 15:02:13,573 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 17 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-17 15:02:29,948 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-17 15:02:37,903 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.14 vs. limit=15.0 2024-08-17 15:02:52,220 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.426e+01 2.645e+01 3.149e+01 5.792e+01, threshold=5.289e+01, percent-clipped=1.0 2024-08-17 15:03:00,651 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 21 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-17 15:03:03,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3370900.0, ans=0.0 2024-08-17 15:03:06,655 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 5100, loss[loss=0.1164, beats_loss=0.01044, ecapa_loss=0.0001079, whisper_loss=0.1049, over 24443.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01053, ecapa_loss=0.0001478, whisper_loss=0.09068, over 3876072.36 frames. ], batch size: 92, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:03:31,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3371200.0, ans=0.125 2024-08-17 15:03:43,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3371300.0, ans=0.0 2024-08-17 15:03:54,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3371400.0, ans=0.0 2024-08-17 15:04:01,748 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-17 15:04:01,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3371400.0, ans=0.125 2024-08-17 15:04:08,147 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 5150, loss[loss=0.1113, beats_loss=0.0112, ecapa_loss=0.0001329, whisper_loss=0.09881, over 20585.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01052, ecapa_loss=0.0001476, whisper_loss=0.09096, over 3900575.04 frames. ], batch size: 79, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:04:14,402 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 22 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-17 15:04:24,285 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-17 15:04:29,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3371600.0, ans=0.125 2024-08-17 15:04:39,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3371700.0, ans=0.1 2024-08-17 15:04:40,383 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 28 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-17 15:04:51,820 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-17 15:04:55,473 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.283e+01 2.471e+01 2.711e+01 4.570e+01, threshold=4.941e+01, percent-clipped=0.0 2024-08-17 15:05:10,549 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 5200, loss[loss=0.1021, beats_loss=0.01174, ecapa_loss=0.0001593, whisper_loss=0.08873, over 21973.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01045, ecapa_loss=0.0001487, whisper_loss=0.09115, over 3895131.55 frames. ], batch size: 89, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:05:19,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3372000.0, ans=0.125 2024-08-17 15:05:26,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3372100.0, ans=0.0 2024-08-17 15:05:29,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3372100.0, ans=0.1 2024-08-17 15:05:32,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3372100.0, ans=0.125 2024-08-17 15:05:41,012 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.64 vs. limit=15.0 2024-08-17 15:05:43,105 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-17 15:05:53,166 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 23 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-17 15:05:53,473 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3372300.0, ans=0.125 2024-08-17 15:05:58,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3372300.0, ans=0.04949747468305833 2024-08-17 15:06:08,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3372400.0, ans=0.05 2024-08-17 15:06:09,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3372400.0, ans=0.1 2024-08-17 15:06:12,941 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 5250, loss[loss=0.1144, beats_loss=0.009299, ecapa_loss=0.0001665, whisper_loss=0.1035, over 18498.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01041, ecapa_loss=0.0001498, whisper_loss=0.09152, over 3880915.62 frames. ], batch size: 73, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:06:38,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3372700.0, ans=0.125 2024-08-17 15:06:53,468 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 30 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-17 15:07:01,132 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.915e+01 2.404e+01 2.586e+01 2.813e+01 5.298e+01, threshold=5.172e+01, percent-clipped=1.0 2024-08-17 15:07:04,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=3372900.0, ans=15.0 2024-08-17 15:07:15,868 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 5300, loss[loss=0.1019, beats_loss=0.01008, ecapa_loss=0.0002074, whisper_loss=0.08972, over 20659.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01045, ecapa_loss=0.0001508, whisper_loss=0.09084, over 3854986.86 frames. ], batch size: 92, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:07:17,180 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-17 15:07:17,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3373000.0, ans=0.125 2024-08-17 15:07:17,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3373000.0, ans=0.1 2024-08-17 15:07:20,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3373000.0, ans=0.07 2024-08-17 15:07:51,923 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 22 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-17 15:07:55,496 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-17 15:07:56,024 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.37 vs. limit=15.0 2024-08-17 15:08:04,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3373400.0, ans=0.125 2024-08-17 15:08:04,739 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.46 vs. limit=15.0 2024-08-17 15:08:17,725 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 5350, loss[loss=0.08537, beats_loss=0.01226, ecapa_loss=0.0001688, whisper_loss=0.07143, over 21806.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01053, ecapa_loss=0.0001506, whisper_loss=0.09043, over 3840799.53 frames. ], batch size: 93, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:08:20,303 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 28 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-17 15:08:20,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3373500.0, ans=0.0 2024-08-17 15:08:24,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3373500.0, ans=0.125 2024-08-17 15:08:27,723 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 25 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-17 15:09:02,612 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-17 15:09:05,088 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.366e+01 2.581e+01 2.898e+01 3.375e+02, threshold=5.162e+01, percent-clipped=2.0 2024-08-17 15:09:09,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3373900.0, ans=0.0 2024-08-17 15:09:16,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3373900.0, ans=0.1 2024-08-17 15:09:20,102 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 5400, loss[loss=0.1256, beats_loss=0.0104, ecapa_loss=0.0001671, whisper_loss=0.1135, over 23167.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01047, ecapa_loss=0.0001506, whisper_loss=0.09105, over 3863811.89 frames. ], batch size: 94, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:09:36,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3374100.0, ans=0.0 2024-08-17 15:09:39,640 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 20 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-17 15:09:59,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3374300.0, ans=0.125 2024-08-17 15:10:14,301 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-17 15:10:21,378 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 5450, loss[loss=0.1156, beats_loss=0.009718, ecapa_loss=0.000197, whisper_loss=0.1039, over 22080.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01053, ecapa_loss=0.0001506, whisper_loss=0.09075, over 3841382.10 frames. ], batch size: 90, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:10:26,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3374500.0, ans=0.95 2024-08-17 15:10:29,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3374500.0, ans=0.0 2024-08-17 15:10:33,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3374600.0, ans=0.0 2024-08-17 15:10:36,589 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-17 15:10:49,791 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.02 vs. limit=15.0 2024-08-17 15:10:55,130 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-17 15:11:08,519 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.416e+01 2.763e+01 3.086e+01 2.790e+02, threshold=5.526e+01, percent-clipped=2.0 2024-08-17 15:11:11,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3374900.0, ans=0.0 2024-08-17 15:11:13,640 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-17 15:11:15,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3374900.0, ans=0.0 2024-08-17 15:11:18,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3374900.0, ans=0.0 2024-08-17 15:11:23,281 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 5500, loss[loss=0.1069, beats_loss=0.01287, ecapa_loss=0.0001202, whisper_loss=0.09282, over 22054.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0107, ecapa_loss=0.0001493, whisper_loss=0.09011, over 3884207.17 frames. ], batch size: 88, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:11:28,919 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 19 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-17 15:11:33,137 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.83 vs. limit=22.5 2024-08-17 15:11:38,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3375100.0, ans=0.0 2024-08-17 15:11:51,300 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2024-08-17 15:11:54,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3375200.0, ans=0.0 2024-08-17 15:12:01,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3375300.0, ans=0.0 2024-08-17 15:12:07,869 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.17 vs. limit=15.0 2024-08-17 15:12:11,726 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.26 vs. limit=15.0 2024-08-17 15:12:14,775 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-17 15:12:17,123 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 27 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-17 15:12:26,008 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 5550, loss[loss=0.09952, beats_loss=0.01115, ecapa_loss=0.0001415, whisper_loss=0.08696, over 20771.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01061, ecapa_loss=0.0001488, whisper_loss=0.09097, over 3883005.20 frames. ], batch size: 85, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:12:32,330 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 15 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-17 15:12:45,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3375600.0, ans=0.2 2024-08-17 15:12:51,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3375700.0, ans=0.0 2024-08-17 15:13:01,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3375700.0, ans=0.05 2024-08-17 15:13:06,911 WARNING [optim.py:496] (0/4) Scaling gradients by 0.024026568979024887, model_norm_threshold=55.25676727294922 2024-08-17 15:13:07,074 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.10, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.047e+05, grad_sumsq=1.492e+05, orig_rms_sq=3.383e+00 2024-08-17 15:13:13,361 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.928e+01 2.279e+01 2.545e+01 2.862e+01 2.300e+03, threshold=5.090e+01, percent-clipped=1.0 2024-08-17 15:13:15,073 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.33 vs. limit=15.0 2024-08-17 15:13:28,517 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 5600, loss[loss=0.1073, beats_loss=0.009454, ecapa_loss=0.0001599, whisper_loss=0.09628, over 18879.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01074, ecapa_loss=0.0001486, whisper_loss=0.09016, over 3870029.13 frames. ], batch size: 74, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:13:44,107 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.88 vs. limit=22.5 2024-08-17 15:14:10,067 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.63 vs. limit=15.0 2024-08-17 15:14:13,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=3376300.0, ans=0.1 2024-08-17 15:14:13,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3376300.0, ans=0.07 2024-08-17 15:14:30,525 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 5650, loss[loss=0.09654, beats_loss=0.01412, ecapa_loss=9.543e-05, whisper_loss=0.08147, over 21502.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01074, ecapa_loss=0.0001494, whisper_loss=0.09004, over 3874186.76 frames. ], batch size: 85, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:14:40,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3376500.0, ans=0.0 2024-08-17 15:14:59,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3376700.0, ans=0.0 2024-08-17 15:15:17,994 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.711e+01 2.354e+01 2.556e+01 3.090e+01 4.828e+01, threshold=5.111e+01, percent-clipped=0.0 2024-08-17 15:15:19,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3376900.0, ans=0.1 2024-08-17 15:15:27,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3376900.0, ans=0.125 2024-08-17 15:15:32,707 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 5700, loss[loss=0.09358, beats_loss=0.01102, ecapa_loss=0.0001055, whisper_loss=0.0815, over 21248.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01064, ecapa_loss=0.0001489, whisper_loss=0.0901, over 3886384.80 frames. ], batch size: 81, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:15:38,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3377000.0, ans=0.05 2024-08-17 15:15:40,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3377000.0, ans=0.2 2024-08-17 15:15:53,795 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 12 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-17 15:15:57,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3377200.0, ans=0.0 2024-08-17 15:15:58,965 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-17 15:16:01,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3377200.0, ans=0.2 2024-08-17 15:16:11,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3377300.0, ans=0.0 2024-08-17 15:16:33,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3377400.0, ans=0.0 2024-08-17 15:16:35,462 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 5750, loss[loss=0.1037, beats_loss=0.009354, ecapa_loss=0.0001463, whisper_loss=0.09291, over 19437.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01061, ecapa_loss=0.000149, whisper_loss=0.0903, over 3880838.56 frames. ], batch size: 79, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:16:45,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3377500.0, ans=0.125 2024-08-17 15:16:50,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3377600.0, ans=0.125 2024-08-17 15:16:51,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3377600.0, ans=0.125 2024-08-17 15:17:00,415 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 22 from LS+wenet, 14 from Vox, 18 fro AS 2024-08-17 15:17:06,881 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 24 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-17 15:17:08,503 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.38 vs. limit=12.0 2024-08-17 15:17:09,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3377700.0, ans=0.0 2024-08-17 15:17:22,650 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.322e+01 2.527e+01 2.770e+01 4.049e+01, threshold=5.054e+01, percent-clipped=0.0 2024-08-17 15:17:24,080 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-17 15:17:24,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3377900.0, ans=0.125 2024-08-17 15:17:26,845 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 16 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-17 15:17:36,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3378000.0, ans=0.04949747468305833 2024-08-17 15:17:37,655 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 5800, loss[loss=0.111, beats_loss=0.01137, ecapa_loss=0.0001724, whisper_loss=0.09794, over 21039.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01062, ecapa_loss=0.0001488, whisper_loss=0.09016, over 3887205.39 frames. ], batch size: 86, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:17:50,527 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 23 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-17 15:17:54,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3378100.0, ans=0.1 2024-08-17 15:17:54,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3378100.0, ans=0.125 2024-08-17 15:17:57,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3378100.0, ans=0.125 2024-08-17 15:18:06,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3378200.0, ans=0.125 2024-08-17 15:18:10,628 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-17 15:18:28,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3378400.0, ans=0.1 2024-08-17 15:18:30,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3378400.0, ans=0.125 2024-08-17 15:18:34,322 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 19 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-17 15:18:40,456 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 5850, loss[loss=0.1035, beats_loss=0.01043, ecapa_loss=0.0001814, whisper_loss=0.0913, over 13604.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01068, ecapa_loss=0.0001486, whisper_loss=0.08946, over 3859274.13 frames. ], batch size: 54, lr: 2.66e-03, grad_scale: 1.152921504606847e+18 2024-08-17 15:18:52,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3378600.0, ans=10.0 2024-08-17 15:19:00,719 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-17 15:19:08,826 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.755e-01 2024-08-17 15:19:28,424 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.739e+01 2.390e+01 2.604e+01 2.871e+01 4.556e+01, threshold=5.208e+01, percent-clipped=0.0 2024-08-17 15:19:43,542 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 5900, loss[loss=0.09832, beats_loss=0.01039, ecapa_loss=0.0001638, whisper_loss=0.08629, over 21701.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01065, ecapa_loss=0.0001495, whisper_loss=0.08922, over 3897504.36 frames. ], batch size: 89, lr: 2.66e-03, grad_scale: 1.152921504606847e+18 2024-08-17 15:19:44,832 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 19 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-17 15:19:45,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3379000.0, ans=0.125 2024-08-17 15:19:46,289 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-17 15:19:57,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3379100.0, ans=0.2 2024-08-17 15:20:04,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3379100.0, ans=0.0 2024-08-17 15:20:05,060 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 40 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-17 15:20:14,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3379200.0, ans=0.125 2024-08-17 15:20:26,275 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3379300.0, ans=0.0 2024-08-17 15:20:27,279 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 27 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-17 15:20:29,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3379300.0, ans=0.125 2024-08-17 15:20:42,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3379400.0, ans=0.0 2024-08-17 15:20:45,988 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 5950, loss[loss=0.09474, beats_loss=0.009811, ecapa_loss=0.0001496, whisper_loss=0.08344, over 17264.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01059, ecapa_loss=0.0001496, whisper_loss=0.08965, over 3893467.68 frames. ], batch size: 66, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:20:46,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3379500.0, ans=0.2 2024-08-17 15:20:46,819 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.78 vs. limit=15.0 2024-08-17 15:20:52,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3379500.0, ans=0.0 2024-08-17 15:21:14,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3379700.0, ans=0.125 2024-08-17 15:21:15,067 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-17 15:21:23,040 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 36 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-17 15:21:35,231 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.244e+01 2.483e+01 2.977e+01 4.324e+01, threshold=4.966e+01, percent-clipped=0.0 2024-08-17 15:21:35,425 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-17 15:21:49,208 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 6000, loss[loss=0.1013, beats_loss=0.0112, ecapa_loss=0.0001398, whisper_loss=0.08866, over 21971.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01053, ecapa_loss=0.0001489, whisper_loss=0.09038, over 3901738.97 frames. ], batch size: 87, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:21:49,209 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-17 15:22:23,019 INFO [train_multi_KD3.py:1149] (0/4) Epoch 23, validation on ASR_libri: loss=0.2521, beats_loss=0, ecapa_loss=0.0005335, whisper_loss=0.2467, over 922467.00 frames. 2024-08-17 15:22:37,685 INFO [train_multi_KD3.py:1149] (0/4) Epoch 23, validation on SV_voxceleb1: loss=0.004145, beats_loss=0, ecapa_loss=0.0004145, whisper_loss=0, over 939242.00 frames. 2024-08-17 15:24:12,910 INFO [train_multi_KD3.py:1149] (0/4) Epoch 23, validation on AT_audioset: loss=0.02339, beats_loss=0.02339, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-17 15:24:12,914 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32389MB 2024-08-17 15:24:14,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3380000.0, ans=0.125 2024-08-17 15:24:16,196 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2024-08-17 15:24:24,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3380100.0, ans=0.125 2024-08-17 15:24:32,792 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.35 vs. limit=12.0 2024-08-17 15:24:33,327 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 24 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-17 15:24:34,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=3380100.0, ans=0.05 2024-08-17 15:24:39,923 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-17 15:24:41,151 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-17 15:24:42,970 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.06 vs. limit=22.5 2024-08-17 15:24:48,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3380200.0, ans=0.07 2024-08-17 15:24:48,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3380200.0, ans=0.2 2024-08-17 15:24:58,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3380300.0, ans=0.125 2024-08-17 15:25:05,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3380400.0, ans=0.2 2024-08-17 15:25:19,728 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 6050, loss[loss=0.09722, beats_loss=0.01065, ecapa_loss=0.0001489, whisper_loss=0.08508, over 17133.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01054, ecapa_loss=0.0001485, whisper_loss=0.0902, over 3886564.13 frames. ], batch size: 69, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:25:21,587 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.05 vs. limit=15.0 2024-08-17 15:25:29,821 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.52 vs. limit=15.0 2024-08-17 15:25:37,796 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-17 15:25:44,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3380600.0, ans=0.0 2024-08-17 15:25:48,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3380700.0, ans=0.125 2024-08-17 15:26:13,051 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.302e+01 2.473e+01 2.788e+01 3.926e+01, threshold=4.946e+01, percent-clipped=0.0 2024-08-17 15:26:17,700 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 15:26:28,372 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 6100, loss[loss=0.08676, beats_loss=0.01428, ecapa_loss=0.0001605, whisper_loss=0.07088, over 20650.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01056, ecapa_loss=0.000149, whisper_loss=0.08956, over 3888733.50 frames. ], batch size: 85, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:26:29,820 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 26 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-17 15:26:37,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3381000.0, ans=0.0 2024-08-17 15:26:46,778 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-17 15:26:50,548 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 15 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-17 15:27:13,676 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 12 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-17 15:27:36,743 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 6150, loss[loss=0.1031, beats_loss=0.01052, ecapa_loss=0.0001208, whisper_loss=0.09134, over 23334.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01057, ecapa_loss=0.0001492, whisper_loss=0.08953, over 3878009.62 frames. ], batch size: 89, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:27:56,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3381600.0, ans=0.2 2024-08-17 15:28:01,459 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 27 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-17 15:28:15,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3381700.0, ans=10.0 2024-08-17 15:28:21,382 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 27 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-17 15:28:31,944 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-17 15:28:39,023 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 23 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-17 15:28:45,119 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.233e+01 2.467e+01 2.726e+01 4.750e+01, threshold=4.934e+01, percent-clipped=0.0 2024-08-17 15:28:45,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3381900.0, ans=0.2 2024-08-17 15:29:04,937 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 6200, loss[loss=0.1099, beats_loss=0.01147, ecapa_loss=0.0001336, whisper_loss=0.09707, over 20213.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01057, ecapa_loss=0.0001497, whisper_loss=0.08928, over 3877149.33 frames. ], batch size: 78, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:29:05,737 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 21 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-17 15:29:31,936 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.96 vs. limit=22.5 2024-08-17 15:29:34,403 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 16 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-17 15:29:54,578 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.45 vs. limit=15.0 2024-08-17 15:30:37,487 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 6250, loss[loss=0.1113, beats_loss=0.009346, ecapa_loss=0.000207, whisper_loss=0.09984, over 20397.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01055, ecapa_loss=0.0001485, whisper_loss=0.09002, over 3851005.79 frames. ], batch size: 88, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:30:43,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3382500.0, ans=0.0 2024-08-17 15:30:52,724 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-17 15:31:24,138 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.50 vs. limit=5.0 2024-08-17 15:31:31,989 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.10 vs. limit=15.0 2024-08-17 15:31:53,970 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.680e+01 2.369e+01 2.606e+01 2.979e+01 4.813e+02, threshold=5.211e+01, percent-clipped=2.0 2024-08-17 15:31:57,061 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.65 vs. limit=15.0 2024-08-17 15:32:03,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3382900.0, ans=0.0 2024-08-17 15:32:16,303 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 6300, loss[loss=0.09325, beats_loss=0.009905, ecapa_loss=0.000154, whisper_loss=0.08181, over 20477.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01044, ecapa_loss=0.0001499, whisper_loss=0.09068, over 3840554.18 frames. ], batch size: 81, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:32:35,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3383100.0, ans=0.0 2024-08-17 15:32:46,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3383100.0, ans=0.5 2024-08-17 15:33:04,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3383200.0, ans=0.0 2024-08-17 15:33:15,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3383300.0, ans=0.0 2024-08-17 15:33:17,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3383300.0, ans=0.09899494936611666 2024-08-17 15:33:30,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3383400.0, ans=0.2 2024-08-17 15:33:32,074 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-17 15:33:36,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3383400.0, ans=0.1 2024-08-17 15:33:45,945 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-17 15:33:47,575 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-17 15:33:51,355 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 6350, loss[loss=0.08465, beats_loss=0.009204, ecapa_loss=0.0001457, whisper_loss=0.07399, over 22143.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0105, ecapa_loss=0.0001494, whisper_loss=0.09031, over 3847546.56 frames. ], batch size: 90, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:33:53,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3383500.0, ans=0.1 2024-08-17 15:34:15,312 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 28 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-17 15:34:16,843 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 16 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-17 15:34:29,092 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 26 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-17 15:34:51,026 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.290e+01 2.555e+01 2.760e+01 4.440e+01, threshold=5.110e+01, percent-clipped=0.0 2024-08-17 15:34:55,319 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 20 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-17 15:35:07,460 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 6400, loss[loss=0.1269, beats_loss=0.01014, ecapa_loss=0.0001075, whisper_loss=0.1157, over 22624.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01059, ecapa_loss=0.0001482, whisper_loss=0.0899, over 3846070.66 frames. ], batch size: 81, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:35:09,244 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-17 15:35:24,121 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 29 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-17 15:35:37,115 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 27 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-17 15:36:18,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3384400.0, ans=0.1 2024-08-17 15:36:21,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3384400.0, ans=0.0 2024-08-17 15:36:30,996 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 6450, loss[loss=0.08446, beats_loss=0.01272, ecapa_loss=0.0001329, whisper_loss=0.07041, over 19674.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01061, ecapa_loss=0.0001474, whisper_loss=0.09033, over 3841275.20 frames. ], batch size: 81, lr: 2.65e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:36:58,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3384600.0, ans=0.125 2024-08-17 15:37:02,666 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 24 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-17 15:37:26,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3384800.0, ans=0.2 2024-08-17 15:37:29,000 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-17 15:37:36,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3384800.0, ans=0.125 2024-08-17 15:37:37,319 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.419e+01 2.570e+01 2.770e+01 6.732e+01, threshold=5.141e+01, percent-clipped=1.0 2024-08-17 15:37:41,884 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2024-08-17 15:37:54,091 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-17 15:37:54,970 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.47 vs. limit=22.5 2024-08-17 15:37:56,845 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 6500, loss[loss=0.1022, beats_loss=0.01099, ecapa_loss=0.0001596, whisper_loss=0.08961, over 17896.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01066, ecapa_loss=0.0001468, whisper_loss=0.0904, over 3840744.42 frames. ], batch size: 72, lr: 2.65e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:38:11,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3385000.0, ans=0.0 2024-08-17 15:38:12,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3385100.0, ans=0.0 2024-08-17 15:38:19,579 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.48 vs. limit=22.5 2024-08-17 15:38:21,844 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-17 15:38:26,901 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 12 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-17 15:38:33,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3385200.0, ans=0.125 2024-08-17 15:38:37,508 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 20 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-17 15:38:49,792 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.52 vs. limit=22.5 2024-08-17 15:38:52,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=3385300.0, ans=0.05 2024-08-17 15:39:06,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3385300.0, ans=0.2 2024-08-17 15:39:20,946 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 22 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-17 15:39:23,353 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 6550, loss[loss=0.1363, beats_loss=0.007014, ecapa_loss=0.0001646, whisper_loss=0.1276, over 22766.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01062, ecapa_loss=0.0001464, whisper_loss=0.09095, over 3847849.35 frames. ], batch size: 89, lr: 2.65e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:39:33,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3385500.0, ans=0.125 2024-08-17 15:39:50,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3385600.0, ans=0.0 2024-08-17 15:39:51,159 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-17 15:39:56,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3385700.0, ans=0.2 2024-08-17 15:39:57,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3385700.0, ans=0.125 2024-08-17 15:40:15,543 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.52 vs. limit=15.0 2024-08-17 15:40:18,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3385800.0, ans=0.125 2024-08-17 15:40:27,166 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.807e+01 2.341e+01 2.572e+01 2.842e+01 4.504e+01, threshold=5.144e+01, percent-clipped=0.0 2024-08-17 15:40:47,888 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 6600, loss[loss=0.08655, beats_loss=0.01274, ecapa_loss=0.0001434, whisper_loss=0.07237, over 20572.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01068, ecapa_loss=0.0001465, whisper_loss=0.09055, over 3876709.29 frames. ], batch size: 87, lr: 2.65e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:40:54,783 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-17 15:41:00,480 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-17 15:41:14,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3386100.0, ans=0.2 2024-08-17 15:41:34,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3386200.0, ans=0.125 2024-08-17 15:41:48,682 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.68 vs. limit=12.0 2024-08-17 15:41:50,200 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 10 from Vox, 34 fro AS 2024-08-17 15:41:50,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3386300.0, ans=0.0 2024-08-17 15:42:11,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3386400.0, ans=0.1 2024-08-17 15:42:17,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3386400.0, ans=0.125 2024-08-17 15:42:26,516 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 6650, loss[loss=0.1123, beats_loss=0.009961, ecapa_loss=0.0001432, whisper_loss=0.1009, over 15610.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01064, ecapa_loss=0.0001481, whisper_loss=0.09086, over 3877697.10 frames. ], batch size: 60, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 15:43:16,350 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-17 15:43:47,257 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.597e+01 2.309e+01 2.525e+01 2.864e+01 3.753e+01, threshold=5.050e+01, percent-clipped=0.0 2024-08-17 15:43:49,944 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.77 vs. limit=15.0 2024-08-17 15:44:04,248 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 6700, loss[loss=0.106, beats_loss=0.01128, ecapa_loss=0.0001408, whisper_loss=0.09334, over 21839.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0105, ecapa_loss=0.0001491, whisper_loss=0.09169, over 3882168.63 frames. ], batch size: 89, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 15:44:14,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3387000.0, ans=0.125 2024-08-17 15:44:21,039 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.141e-01 2024-08-17 15:44:21,114 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3387100.0, ans=0.125 2024-08-17 15:44:34,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3387100.0, ans=0.125 2024-08-17 15:44:45,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3387200.0, ans=0.1 2024-08-17 15:45:11,774 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 17 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-17 15:45:12,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3387400.0, ans=0.125 2024-08-17 15:45:24,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3387400.0, ans=0.0 2024-08-17 15:45:28,892 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 20 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-17 15:45:34,952 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 6750, loss[loss=0.1033, beats_loss=0.01041, ecapa_loss=0.0001072, whisper_loss=0.09184, over 20898.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01053, ecapa_loss=0.0001496, whisper_loss=0.09117, over 3909544.16 frames. ], batch size: 79, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 15:45:57,610 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-17 15:46:42,507 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.26 vs. limit=15.0 2024-08-17 15:46:54,503 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.393e+01 2.641e+01 2.928e+01 4.492e+01, threshold=5.282e+01, percent-clipped=0.0 2024-08-17 15:47:14,567 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 6800, loss[loss=0.0902, beats_loss=0.01146, ecapa_loss=0.0001233, whisper_loss=0.0775, over 14013.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01043, ecapa_loss=0.0001504, whisper_loss=0.09165, over 3887923.83 frames. ], batch size: 55, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 15:47:24,917 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-17 15:47:36,533 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.28 vs. limit=15.0 2024-08-17 15:47:39,265 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 28 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-17 15:47:41,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3388100.0, ans=0.125 2024-08-17 15:47:55,525 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.32 vs. limit=15.0 2024-08-17 15:48:03,634 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-17 15:48:26,040 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-17 15:48:35,363 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=15.16 vs. limit=15.0 2024-08-17 15:48:35,507 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.21 vs. limit=15.0 2024-08-17 15:48:47,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3388400.0, ans=0.95 2024-08-17 15:48:49,101 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-17 15:48:53,433 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 6850, loss[loss=0.09721, beats_loss=0.01005, ecapa_loss=0.0001521, whisper_loss=0.08563, over 22747.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01038, ecapa_loss=0.0001505, whisper_loss=0.09167, over 3886610.58 frames. ], batch size: 91, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 15:49:02,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3388500.0, ans=0.1 2024-08-17 15:49:18,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3388600.0, ans=0.0 2024-08-17 15:49:22,006 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.48 vs. limit=22.5 2024-08-17 15:49:23,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3388600.0, ans=0.125 2024-08-17 15:49:45,707 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-17 15:49:55,513 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-17 15:50:04,323 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-17 15:50:06,447 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.332e+01 2.534e+01 2.809e+01 1.632e+02, threshold=5.068e+01, percent-clipped=1.0 2024-08-17 15:50:15,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3388900.0, ans=0.125 2024-08-17 15:50:20,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3388900.0, ans=0.0 2024-08-17 15:50:27,109 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 6900, loss[loss=0.09027, beats_loss=0.009455, ecapa_loss=0.0001887, whisper_loss=0.07893, over 18580.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01037, ecapa_loss=0.0001494, whisper_loss=0.09189, over 3871434.87 frames. ], batch size: 81, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 15:50:45,673 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 14 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-17 15:50:49,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3389100.0, ans=0.2 2024-08-17 15:50:55,390 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-17 15:50:55,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3389100.0, ans=0.125 2024-08-17 15:51:02,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3389100.0, ans=0.5 2024-08-17 15:51:06,429 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-17 15:51:15,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3389200.0, ans=0.09899494936611666 2024-08-17 15:51:40,747 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-17 15:51:50,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3389400.0, ans=0.125 2024-08-17 15:51:56,061 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 24 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-17 15:52:04,615 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 32 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-17 15:52:08,073 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 6950, loss[loss=0.108, beats_loss=0.00993, ecapa_loss=0.000141, whisper_loss=0.09668, over 21255.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01046, ecapa_loss=0.0001489, whisper_loss=0.09136, over 3888492.26 frames. ], batch size: 86, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 15:52:17,116 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 18 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-17 15:52:18,866 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 15:52:31,710 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.26 vs. limit=10.0 2024-08-17 15:53:15,171 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-17 15:53:24,852 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.932e+01 2.373e+01 2.584e+01 2.825e+01 4.780e+01, threshold=5.168e+01, percent-clipped=0.0 2024-08-17 15:53:41,223 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 7000, loss[loss=0.0929, beats_loss=0.009889, ecapa_loss=0.0001587, whisper_loss=0.08143, over 18517.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01054, ecapa_loss=0.00015, whisper_loss=0.09015, over 3911374.56 frames. ], batch size: 74, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 15:53:46,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3390000.0, ans=0.0 2024-08-17 15:53:52,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3390000.0, ans=0.125 2024-08-17 15:53:56,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3390000.0, ans=0.1 2024-08-17 15:54:04,154 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-17 15:54:49,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3390400.0, ans=0.2 2024-08-17 15:55:03,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3390400.0, ans=0.07 2024-08-17 15:55:06,693 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 7050, loss[loss=0.09992, beats_loss=0.01036, ecapa_loss=0.0001688, whisper_loss=0.08787, over 14859.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01052, ecapa_loss=0.0001496, whisper_loss=0.09049, over 3897110.82 frames. ], batch size: 58, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 15:55:16,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3390500.0, ans=0.0 2024-08-17 15:55:27,756 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 17 from Vox, 50 fro AS 2024-08-17 15:55:36,912 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 34 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-17 15:55:42,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3390700.0, ans=0.1 2024-08-17 15:55:49,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3390700.0, ans=0.09899494936611666 2024-08-17 15:55:54,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3390700.0, ans=0.125 2024-08-17 15:56:08,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=3390800.0, ans=0.05 2024-08-17 15:56:16,185 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.336e+01 2.552e+01 2.787e+01 4.135e+01, threshold=5.105e+01, percent-clipped=0.0 2024-08-17 15:56:25,764 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.52 vs. limit=10.0 2024-08-17 15:56:28,956 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.75 vs. limit=15.0 2024-08-17 15:56:32,501 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 7100, loss[loss=0.09255, beats_loss=0.01163, ecapa_loss=0.0001257, whisper_loss=0.07966, over 22503.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01048, ecapa_loss=0.0001507, whisper_loss=0.09039, over 3893137.29 frames. ], batch size: 90, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 15:56:32,621 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-17 15:56:45,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3391000.0, ans=0.125 2024-08-17 15:57:00,430 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2024-08-17 15:57:11,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3391200.0, ans=0.125 2024-08-17 15:57:36,857 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 27 from LS+wenet, 27 from Vox, 20 fro AS 2024-08-17 15:57:56,076 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 26 from LS+wenet, 13 from Vox, 43 fro AS 2024-08-17 15:57:59,304 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 7150, loss[loss=0.09444, beats_loss=0.009883, ecapa_loss=0.0001478, whisper_loss=0.08308, over 14232.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01042, ecapa_loss=0.0001509, whisper_loss=0.09057, over 3888397.20 frames. ], batch size: 57, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 15:57:59,493 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 20 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-17 15:58:00,013 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.21 vs. limit=15.0 2024-08-17 15:58:03,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3391500.0, ans=0.0 2024-08-17 15:58:04,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3391500.0, ans=0.125 2024-08-17 15:58:25,574 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-17 15:58:55,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3391800.0, ans=0.0 2024-08-17 15:58:56,743 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.34 vs. limit=10.0 2024-08-17 15:59:04,147 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.298e+01 2.545e+01 2.788e+01 4.387e+02, threshold=5.090e+01, percent-clipped=2.0 2024-08-17 15:59:14,885 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 24 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-17 15:59:20,139 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 7200, loss[loss=0.1003, beats_loss=0.01056, ecapa_loss=0.0001441, whisper_loss=0.0883, over 17377.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01049, ecapa_loss=0.0001501, whisper_loss=0.09012, over 3851228.18 frames. ], batch size: 69, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 15:59:20,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3392000.0, ans=10.0 2024-08-17 15:59:26,073 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-17 15:59:40,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3392100.0, ans=0.07 2024-08-17 15:59:45,406 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 24 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-17 15:59:47,280 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.41 vs. limit=15.0 2024-08-17 15:59:51,883 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3392200.0, ans=0.0 2024-08-17 15:59:56,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3392200.0, ans=0.0 2024-08-17 16:00:24,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3392400.0, ans=10.0 2024-08-17 16:00:35,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3392400.0, ans=0.0 2024-08-17 16:00:36,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3392400.0, ans=0.125 2024-08-17 16:00:40,432 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 18 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-17 16:00:41,151 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.53 vs. limit=12.0 2024-08-17 16:00:41,775 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 7250, loss[loss=0.08369, beats_loss=0.01182, ecapa_loss=0.0001414, whisper_loss=0.07046, over 17886.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01051, ecapa_loss=0.0001492, whisper_loss=0.08991, over 3844457.77 frames. ], batch size: 70, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:00:45,487 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 21 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-17 16:00:56,726 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-17 16:01:06,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3392600.0, ans=0.125 2024-08-17 16:01:09,459 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 12 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-17 16:01:15,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3392700.0, ans=0.125 2024-08-17 16:01:29,989 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.79 vs. limit=22.5 2024-08-17 16:01:31,021 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 18 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-17 16:01:39,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3392800.0, ans=0.125 2024-08-17 16:01:47,474 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.715e+01 2.301e+01 2.611e+01 2.944e+01 3.835e+01, threshold=5.222e+01, percent-clipped=0.0 2024-08-17 16:01:53,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3392900.0, ans=0.2 2024-08-17 16:02:05,222 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 7300, loss[loss=0.09978, beats_loss=0.01145, ecapa_loss=0.0001462, whisper_loss=0.08687, over 20423.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01054, ecapa_loss=0.0001473, whisper_loss=0.08969, over 3856138.19 frames. ], batch size: 83, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:02:36,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3393100.0, ans=0.025 2024-08-17 16:02:37,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=3393200.0, ans=10.0 2024-08-17 16:02:38,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3393200.0, ans=0.0 2024-08-17 16:02:52,459 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-17 16:02:57,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3393300.0, ans=0.0 2024-08-17 16:03:11,762 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3393400.0, ans=0.0 2024-08-17 16:03:14,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3393400.0, ans=0.2 2024-08-17 16:03:16,115 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-17 16:03:16,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3393400.0, ans=10.0 2024-08-17 16:03:19,755 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-17 16:03:21,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3393400.0, ans=0.0 2024-08-17 16:03:27,538 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 7350, loss[loss=0.1151, beats_loss=0.01105, ecapa_loss=0.0001214, whisper_loss=0.1028, over 16247.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01047, ecapa_loss=0.0001473, whisper_loss=0.09063, over 3867712.29 frames. ], batch size: 63, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:03:28,539 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.74 vs. limit=15.0 2024-08-17 16:03:42,718 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-17 16:03:46,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3393600.0, ans=0.125 2024-08-17 16:03:48,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3393600.0, ans=0.2 2024-08-17 16:03:59,971 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 26 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-17 16:04:14,099 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 19 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-17 16:04:31,284 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.343e+01 2.608e+01 3.088e+01 3.317e+02, threshold=5.216e+01, percent-clipped=4.0 2024-08-17 16:04:37,749 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.51 vs. limit=15.0 2024-08-17 16:04:46,830 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 7400, loss[loss=0.114, beats_loss=0.00751, ecapa_loss=0.0001511, whisper_loss=0.1049, over 15070.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01045, ecapa_loss=0.0001473, whisper_loss=0.09137, over 3872088.14 frames. ], batch size: 58, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:05:02,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3394100.0, ans=0.2 2024-08-17 16:05:17,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3394100.0, ans=0.0 2024-08-17 16:05:19,818 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 16:05:21,373 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=7.069e-02 2024-08-17 16:05:26,494 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-17 16:05:37,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3394300.0, ans=0.1 2024-08-17 16:05:37,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3394300.0, ans=10.0 2024-08-17 16:05:39,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3394300.0, ans=0.125 2024-08-17 16:05:39,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3394300.0, ans=0.0 2024-08-17 16:05:41,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3394300.0, ans=0.2 2024-08-17 16:05:49,655 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 18 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-17 16:06:12,548 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 7450, loss[loss=0.1215, beats_loss=0.008906, ecapa_loss=0.0001432, whisper_loss=0.1112, over 22175.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01043, ecapa_loss=0.0001473, whisper_loss=0.0923, over 3888302.56 frames. ], batch size: 85, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:06:17,932 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3394500.0, ans=0.125 2024-08-17 16:06:21,864 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 22 from LS+wenet, 9 from Vox, 27 fro AS 2024-08-17 16:06:23,618 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-17 16:06:30,738 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-17 16:06:32,591 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 29 from LS+wenet, 12 from Vox, 41 fro AS 2024-08-17 16:06:45,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3394700.0, ans=0.125 2024-08-17 16:06:54,203 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=5.227e+00 2024-08-17 16:07:11,413 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 16 from LS+wenet, 11 from Vox, 34 fro AS 2024-08-17 16:07:13,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3394800.0, ans=0.125 2024-08-17 16:07:21,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3394900.0, ans=0.125 2024-08-17 16:07:22,039 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.262e+01 2.481e+01 2.703e+01 3.740e+01, threshold=4.961e+01, percent-clipped=0.0 2024-08-17 16:07:22,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3394900.0, ans=0.125 2024-08-17 16:07:25,769 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 29 from Vox, 24 fro AS 2024-08-17 16:07:32,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3394900.0, ans=0.125 2024-08-17 16:07:35,945 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.28 vs. limit=22.5 2024-08-17 16:07:38,556 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 7500, loss[loss=0.07903, beats_loss=0.01025, ecapa_loss=0.0002102, whisper_loss=0.06668, over 16922.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01051, ecapa_loss=0.0001467, whisper_loss=0.09152, over 3883577.28 frames. ], batch size: 73, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:07:40,620 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=10.51 vs. limit=12.0 2024-08-17 16:07:44,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3395000.0, ans=0.125 2024-08-17 16:07:55,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3395100.0, ans=0.0 2024-08-17 16:08:00,705 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 25 from LS+wenet, 11 from Vox, 35 fro AS 2024-08-17 16:08:00,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3395100.0, ans=0.125 2024-08-17 16:08:06,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3395100.0, ans=0.125 2024-08-17 16:08:48,277 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=4.942e+00 2024-08-17 16:09:04,619 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 7550, loss[loss=0.1146, beats_loss=0.005969, ecapa_loss=0.000186, whisper_loss=0.1068, over 17477.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01052, ecapa_loss=0.0001468, whisper_loss=0.09199, over 3907148.44 frames. ], batch size: 70, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:09:05,373 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.99 vs. limit=22.5 2024-08-17 16:09:07,756 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.63 vs. limit=15.0 2024-08-17 16:09:19,482 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 16:09:20,852 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 17 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-17 16:09:22,424 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-17 16:09:23,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3395600.0, ans=0.125 2024-08-17 16:09:29,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3395600.0, ans=0.125 2024-08-17 16:09:34,307 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-17 16:09:41,402 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.09 vs. limit=15.0 2024-08-17 16:09:48,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3395700.0, ans=0.125 2024-08-17 16:09:51,097 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.42 vs. limit=10.0 2024-08-17 16:09:54,108 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.35 vs. limit=15.0 2024-08-17 16:10:09,642 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.390e+01 2.660e+01 3.033e+01 1.573e+02, threshold=5.320e+01, percent-clipped=3.0 2024-08-17 16:10:12,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3395900.0, ans=0.0 2024-08-17 16:10:21,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3395900.0, ans=0.125 2024-08-17 16:10:24,663 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 7600, loss[loss=0.1048, beats_loss=0.009439, ecapa_loss=0.0001418, whisper_loss=0.09389, over 22263.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01056, ecapa_loss=0.0001456, whisper_loss=0.09152, over 3901614.27 frames. ], batch size: 87, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:10:25,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3396000.0, ans=0.125 2024-08-17 16:10:32,357 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3396000.0, ans=0.1 2024-08-17 16:10:52,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3396100.0, ans=0.07 2024-08-17 16:10:56,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3396200.0, ans=0.125 2024-08-17 16:11:08,277 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.39 vs. limit=15.0 2024-08-17 16:11:29,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3396400.0, ans=0.1 2024-08-17 16:11:40,101 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-17 16:11:41,191 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 7650, loss[loss=0.1079, beats_loss=0.009523, ecapa_loss=0.0001586, whisper_loss=0.09679, over 18162.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01054, ecapa_loss=0.0001468, whisper_loss=0.0918, over 3939060.40 frames. ], batch size: 67, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:11:58,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3396600.0, ans=0.1 2024-08-17 16:12:02,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3396600.0, ans=0.0 2024-08-17 16:12:11,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3396700.0, ans=0.07 2024-08-17 16:12:27,868 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.64 vs. limit=12.0 2024-08-17 16:12:37,246 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.87 vs. limit=22.5 2024-08-17 16:12:38,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3396800.0, ans=0.125 2024-08-17 16:12:41,291 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.270e+01 2.469e+01 2.738e+01 5.063e+01, threshold=4.938e+01, percent-clipped=0.0 2024-08-17 16:12:47,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3396900.0, ans=0.125 2024-08-17 16:12:48,708 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-17 16:12:56,597 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 7700, loss[loss=0.09149, beats_loss=0.01082, ecapa_loss=0.0001367, whisper_loss=0.07931, over 15090.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01045, ecapa_loss=0.0001481, whisper_loss=0.09201, over 3933971.87 frames. ], batch size: 58, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:12:59,968 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3397000.0, ans=0.125 2024-08-17 16:13:04,362 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 17 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-17 16:13:18,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3397100.0, ans=0.0 2024-08-17 16:13:21,360 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.42 vs. limit=12.0 2024-08-17 16:13:23,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3397100.0, ans=0.125 2024-08-17 16:13:28,608 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.55 vs. limit=15.0 2024-08-17 16:13:41,358 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-17 16:13:43,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3397300.0, ans=0.125 2024-08-17 16:14:09,396 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 7750, loss[loss=0.1032, beats_loss=0.01151, ecapa_loss=0.0001255, whisper_loss=0.09042, over 20540.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01049, ecapa_loss=0.0001472, whisper_loss=0.09195, over 3919472.14 frames. ], batch size: 79, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:14:14,836 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 16 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-17 16:14:40,710 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.568e-01 2024-08-17 16:14:42,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3397700.0, ans=0.0 2024-08-17 16:15:06,089 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.283e+01 2.591e+01 2.932e+01 1.157e+02, threshold=5.183e+01, percent-clipped=2.0 2024-08-17 16:15:06,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3397900.0, ans=0.0 2024-08-17 16:15:20,525 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 7800, loss[loss=0.09783, beats_loss=0.01159, ecapa_loss=0.0001655, whisper_loss=0.08459, over 22506.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01044, ecapa_loss=0.0001476, whisper_loss=0.09222, over 3900837.00 frames. ], batch size: 95, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:15:26,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3398000.0, ans=0.125 2024-08-17 16:15:34,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3398100.0, ans=0.125 2024-08-17 16:15:38,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3398100.0, ans=0.125 2024-08-17 16:15:47,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3398200.0, ans=0.0 2024-08-17 16:15:50,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3398200.0, ans=0.0 2024-08-17 16:15:58,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3398200.0, ans=0.1 2024-08-17 16:16:02,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3398300.0, ans=0.0 2024-08-17 16:16:13,103 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 23 from LS+wenet, 16 from Vox, 17 fro AS 2024-08-17 16:16:23,106 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 21 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-17 16:16:33,287 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 7850, loss[loss=0.07754, beats_loss=0.01154, ecapa_loss=0.0001419, whisper_loss=0.06458, over 14421.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01043, ecapa_loss=0.0001464, whisper_loss=0.09239, over 3919087.11 frames. ], batch size: 58, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:16:44,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3398500.0, ans=0.125 2024-08-17 16:16:57,276 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 15 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-17 16:17:14,648 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 37 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-17 16:17:19,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3398700.0, ans=0.0 2024-08-17 16:17:49,830 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.263e+01 2.535e+01 2.863e+01 4.043e+01, threshold=5.070e+01, percent-clipped=0.0 2024-08-17 16:18:01,844 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-17 16:18:04,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3398900.0, ans=0.1 2024-08-17 16:18:13,631 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 7900, loss[loss=0.07831, beats_loss=0.01158, ecapa_loss=0.0001265, whisper_loss=0.06546, over 21400.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01044, ecapa_loss=0.0001474, whisper_loss=0.09188, over 3901251.75 frames. ], batch size: 85, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:18:20,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3399000.0, ans=0.125 2024-08-17 16:18:42,657 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-17 16:18:42,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3399100.0, ans=0.125 2024-08-17 16:19:00,423 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.18 vs. limit=15.0 2024-08-17 16:19:02,107 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.505e-02 2024-08-17 16:19:26,217 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 22 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-17 16:19:34,162 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 23 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-17 16:19:46,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3399400.0, ans=0.1 2024-08-17 16:19:54,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3399500.0, ans=0.0 2024-08-17 16:19:55,885 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 7950, loss[loss=0.09189, beats_loss=0.01068, ecapa_loss=0.0001269, whisper_loss=0.07994, over 16750.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01042, ecapa_loss=0.0001464, whisper_loss=0.09193, over 3888564.42 frames. ], batch size: 64, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:20:01,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3399500.0, ans=0.0 2024-08-17 16:20:03,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3399500.0, ans=10.0 2024-08-17 16:20:44,702 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-17 16:20:47,727 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 20 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-17 16:20:53,203 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 34 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-17 16:21:18,519 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 27 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-17 16:21:20,339 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.432e+01 2.654e+01 2.954e+01 3.124e+02, threshold=5.307e+01, percent-clipped=2.0 2024-08-17 16:21:20,542 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 23 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-17 16:21:23,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3399900.0, ans=0.1 2024-08-17 16:21:40,338 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-340000.pt 2024-08-17 16:21:44,314 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 8000, loss[loss=0.09466, beats_loss=0.01032, ecapa_loss=0.0001431, whisper_loss=0.0829, over 17860.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01042, ecapa_loss=0.000147, whisper_loss=0.09074, over 3893464.92 frames. ], batch size: 69, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:21:51,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3400000.0, ans=0.0 2024-08-17 16:22:50,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3400300.0, ans=0.1 2024-08-17 16:23:10,227 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 16:23:12,850 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 18 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-17 16:23:16,860 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 12 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-17 16:23:30,264 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 8050, loss[loss=0.09418, beats_loss=0.01136, ecapa_loss=0.0001204, whisper_loss=0.08162, over 19392.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01042, ecapa_loss=0.0001466, whisper_loss=0.09136, over 3919255.57 frames. ], batch size: 77, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:23:30,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3400500.0, ans=0.125 2024-08-17 16:23:35,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3400500.0, ans=0.1 2024-08-17 16:23:49,462 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.53 vs. limit=15.0 2024-08-17 16:23:58,003 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-17 16:24:37,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3400800.0, ans=6.0 2024-08-17 16:24:52,758 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.730e+01 2.357e+01 2.561e+01 2.862e+01 4.337e+01, threshold=5.122e+01, percent-clipped=0.0 2024-08-17 16:25:08,457 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 8100, loss[loss=0.1129, beats_loss=0.01036, ecapa_loss=0.0001567, whisper_loss=0.101, over 22388.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01048, ecapa_loss=0.0001475, whisper_loss=0.09121, over 3911907.57 frames. ], batch size: 87, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:25:12,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3401000.0, ans=0.09899494936611666 2024-08-17 16:25:20,054 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.30 vs. limit=12.0 2024-08-17 16:25:36,870 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.83 vs. limit=15.0 2024-08-17 16:25:43,563 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 12 from Vox, 48 fro AS 2024-08-17 16:25:44,012 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.27 vs. limit=12.0 2024-08-17 16:25:46,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3401300.0, ans=0.0 2024-08-17 16:25:48,554 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-17 16:25:50,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3401300.0, ans=0.125 2024-08-17 16:25:55,962 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.51 vs. limit=22.5 2024-08-17 16:25:59,884 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.60 vs. limit=10.0 2024-08-17 16:26:07,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3401400.0, ans=0.125 2024-08-17 16:26:13,424 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 8150, loss[loss=0.09882, beats_loss=0.01186, ecapa_loss=0.0001328, whisper_loss=0.08563, over 17927.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01049, ecapa_loss=0.0001482, whisper_loss=0.09095, over 3907753.30 frames. ], batch size: 70, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:26:15,221 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 16:26:21,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3401500.0, ans=0.1 2024-08-17 16:26:32,903 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-17 16:27:00,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3401800.0, ans=0.125 2024-08-17 16:27:05,024 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.85 vs. limit=15.0 2024-08-17 16:27:05,511 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.914e+01 2.351e+01 2.682e+01 3.225e+01 8.305e+01, threshold=5.364e+01, percent-clipped=1.0 2024-08-17 16:27:07,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3401900.0, ans=0.125 2024-08-17 16:27:14,284 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.27 vs. limit=15.0 2024-08-17 16:27:18,566 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 8200, loss[loss=0.1006, beats_loss=0.01115, ecapa_loss=0.000149, whisper_loss=0.08796, over 15678.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01048, ecapa_loss=0.0001477, whisper_loss=0.09094, over 3896055.97 frames. ], batch size: 64, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:27:25,551 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 33 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-17 16:27:30,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3402100.0, ans=0.0 2024-08-17 16:27:37,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3402100.0, ans=0.09899494936611666 2024-08-17 16:27:42,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3402100.0, ans=0.125 2024-08-17 16:27:47,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3402200.0, ans=0.0 2024-08-17 16:27:50,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3402200.0, ans=0.125 2024-08-17 16:27:53,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3402200.0, ans=0.0 2024-08-17 16:27:54,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3402200.0, ans=0.125 2024-08-17 16:28:08,838 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.44 vs. limit=15.0 2024-08-17 16:28:10,275 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.59 vs. limit=15.0 2024-08-17 16:28:12,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3402400.0, ans=0.125 2024-08-17 16:28:12,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3402400.0, ans=0.0 2024-08-17 16:28:23,688 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 8250, loss[loss=0.082, beats_loss=0.01405, ecapa_loss=0.0001328, whisper_loss=0.06662, over 15478.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01056, ecapa_loss=0.000147, whisper_loss=0.09068, over 3931195.82 frames. ], batch size: 64, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:28:23,872 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-17 16:28:29,323 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.44 vs. limit=15.0 2024-08-17 16:28:32,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3402500.0, ans=0.2 2024-08-17 16:28:38,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3402600.0, ans=0.125 2024-08-17 16:28:44,668 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 21 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-17 16:28:55,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3402700.0, ans=0.2 2024-08-17 16:28:56,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3402700.0, ans=0.125 2024-08-17 16:28:57,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3402700.0, ans=0.0 2024-08-17 16:28:58,203 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.98 vs. limit=22.5 2024-08-17 16:29:02,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3402800.0, ans=0.125 2024-08-17 16:29:13,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3402800.0, ans=0.125 2024-08-17 16:29:16,323 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.415e+01 2.647e+01 2.985e+01 4.296e+01, threshold=5.294e+01, percent-clipped=0.0 2024-08-17 16:29:17,009 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.20 vs. limit=12.0 2024-08-17 16:29:25,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3402900.0, ans=0.0 2024-08-17 16:29:28,282 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.43 vs. limit=15.0 2024-08-17 16:29:29,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3403000.0, ans=0.0 2024-08-17 16:29:29,922 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 8300, loss[loss=0.1053, beats_loss=0.009806, ecapa_loss=0.0001289, whisper_loss=0.09421, over 15867.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01053, ecapa_loss=0.0001478, whisper_loss=0.09047, over 3924776.47 frames. ], batch size: 61, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:29:31,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3403000.0, ans=0.125 2024-08-17 16:29:48,223 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-17 16:30:19,811 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-17 16:30:25,119 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.27 vs. limit=15.0 2024-08-17 16:30:28,661 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 19 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-17 16:30:36,000 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 8350, loss[loss=0.0864, beats_loss=0.0108, ecapa_loss=0.000159, whisper_loss=0.07401, over 21423.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01066, ecapa_loss=0.0001463, whisper_loss=0.08998, over 3923841.01 frames. ], batch size: 88, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:30:37,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3403500.0, ans=0.125 2024-08-17 16:30:37,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3403500.0, ans=0.0 2024-08-17 16:30:47,932 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3403600.0, ans=0.125 2024-08-17 16:30:52,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3403600.0, ans=0.125 2024-08-17 16:30:53,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3403600.0, ans=0.0 2024-08-17 16:31:02,349 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2024-08-17 16:31:09,953 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.38 vs. limit=6.0 2024-08-17 16:31:13,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3403800.0, ans=0.125 2024-08-17 16:31:16,918 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-17 16:31:27,434 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.334e+01 2.617e+01 2.973e+01 3.819e+01, threshold=5.233e+01, percent-clipped=0.0 2024-08-17 16:31:34,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3403900.0, ans=0.0 2024-08-17 16:31:35,550 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-17 16:31:40,546 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 8400, loss[loss=0.101, beats_loss=0.01294, ecapa_loss=0.0001423, whisper_loss=0.08667, over 22201.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01074, ecapa_loss=0.0001461, whisper_loss=0.09023, over 3942623.27 frames. ], batch size: 94, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:31:41,256 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.03 vs. limit=12.0 2024-08-17 16:31:41,898 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 20 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-17 16:31:42,345 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.35 vs. limit=15.0 2024-08-17 16:32:13,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3404200.0, ans=0.2 2024-08-17 16:32:21,797 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.52 vs. limit=15.0 2024-08-17 16:32:22,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3404300.0, ans=0.125 2024-08-17 16:32:30,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3404300.0, ans=0.0 2024-08-17 16:32:44,732 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 24 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-17 16:32:45,855 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 8450, loss[loss=0.09243, beats_loss=0.01309, ecapa_loss=0.0001494, whisper_loss=0.07785, over 21681.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01067, ecapa_loss=0.0001473, whisper_loss=0.0913, over 3943089.14 frames. ], batch size: 93, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:32:59,110 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 24 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-17 16:32:59,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3404600.0, ans=0.125 2024-08-17 16:33:00,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3404600.0, ans=0.125 2024-08-17 16:33:02,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3404600.0, ans=0.1 2024-08-17 16:33:02,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3404600.0, ans=0.1 2024-08-17 16:33:14,012 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 22 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-17 16:33:15,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3404700.0, ans=0.07 2024-08-17 16:33:24,955 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=15.0 2024-08-17 16:33:25,750 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-17 16:33:36,151 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 32 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-17 16:33:38,310 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.287e+01 2.644e+01 3.078e+01 2.118e+02, threshold=5.288e+01, percent-clipped=3.0 2024-08-17 16:33:50,694 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.03 vs. limit=15.0 2024-08-17 16:33:51,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3405000.0, ans=0.0 2024-08-17 16:33:52,466 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 8500, loss[loss=0.08685, beats_loss=0.009265, ecapa_loss=0.0001895, whisper_loss=0.07569, over 18204.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01062, ecapa_loss=0.0001474, whisper_loss=0.09125, over 3945416.87 frames. ], batch size: 79, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:33:59,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3405000.0, ans=0.0 2024-08-17 16:34:18,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3405200.0, ans=0.0 2024-08-17 16:34:25,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3405200.0, ans=0.0 2024-08-17 16:34:32,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3405300.0, ans=0.125 2024-08-17 16:34:58,478 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 8550, loss[loss=0.09685, beats_loss=0.01123, ecapa_loss=0.0001141, whisper_loss=0.08449, over 15100.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01067, ecapa_loss=0.0001475, whisper_loss=0.09132, over 3964249.72 frames. ], batch size: 57, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:35:13,591 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.89 vs. limit=10.0 2024-08-17 16:35:14,177 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-17 16:35:22,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3405600.0, ans=0.1 2024-08-17 16:35:28,665 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 21 from LS+wenet, 19 from Vox, 51 fro AS 2024-08-17 16:35:36,516 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 21 from LS+wenet, 18 from Vox, 16 fro AS 2024-08-17 16:35:51,225 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.747e+01 2.281e+01 2.591e+01 2.805e+01 5.234e+01, threshold=5.182e+01, percent-clipped=0.0 2024-08-17 16:35:59,714 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.88 vs. limit=10.0 2024-08-17 16:36:02,988 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.72 vs. limit=15.0 2024-08-17 16:36:04,648 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 8600, loss[loss=0.1292, beats_loss=0.00858, ecapa_loss=0.0001531, whisper_loss=0.1191, over 22985.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01067, ecapa_loss=0.0001481, whisper_loss=0.09102, over 3931102.83 frames. ], batch size: 88, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:36:11,027 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.17 vs. limit=6.0 2024-08-17 16:36:41,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3406200.0, ans=0.1 2024-08-17 16:36:49,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3406300.0, ans=0.1 2024-08-17 16:36:57,976 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 21 from LS+wenet, 19 from Vox, 53 fro AS 2024-08-17 16:37:00,100 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.81 vs. limit=6.0 2024-08-17 16:37:04,717 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 25 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-17 16:37:12,893 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 8650, loss[loss=0.08853, beats_loss=0.008849, ecapa_loss=0.0001994, whisper_loss=0.07768, over 16851.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01069, ecapa_loss=0.0001486, whisper_loss=0.09044, over 3930843.46 frames. ], batch size: 72, lr: 2.65e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:37:27,599 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 25 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-17 16:37:34,473 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3406600.0, ans=0.09899494936611666 2024-08-17 16:37:37,110 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.27 vs. limit=15.0 2024-08-17 16:37:37,869 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-17 16:37:52,586 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.37 vs. limit=15.0 2024-08-17 16:37:57,305 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 32 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-17 16:38:04,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3406800.0, ans=0.0 2024-08-17 16:38:06,984 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.354e+01 2.688e+01 3.025e+01 2.265e+02, threshold=5.375e+01, percent-clipped=1.0 2024-08-17 16:38:17,393 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.58 vs. limit=15.0 2024-08-17 16:38:21,515 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 8700, loss[loss=0.117, beats_loss=0.01101, ecapa_loss=0.0001261, whisper_loss=0.1048, over 19346.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01067, ecapa_loss=0.0001484, whisper_loss=0.0905, over 3940010.39 frames. ], batch size: 75, lr: 2.65e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:38:30,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3407000.0, ans=0.025 2024-08-17 16:38:43,411 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2024-08-17 16:38:56,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3407200.0, ans=0.0 2024-08-17 16:39:14,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3407300.0, ans=0.125 2024-08-17 16:39:24,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3407400.0, ans=0.125 2024-08-17 16:39:32,020 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 8750, loss[loss=0.1024, beats_loss=0.009563, ecapa_loss=0.0001444, whisper_loss=0.09137, over 23348.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01061, ecapa_loss=0.0001485, whisper_loss=0.09053, over 3909869.68 frames. ], batch size: 93, lr: 2.65e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:39:33,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3407500.0, ans=10.0 2024-08-17 16:39:52,370 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-17 16:39:55,800 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.51 vs. limit=12.0 2024-08-17 16:40:09,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3407700.0, ans=0.0 2024-08-17 16:40:10,415 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-17 16:40:20,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3407800.0, ans=0.1 2024-08-17 16:40:27,087 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.849e+01 2.350e+01 2.581e+01 3.005e+01 1.666e+02, threshold=5.162e+01, percent-clipped=1.0 2024-08-17 16:40:40,500 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 8800, loss[loss=0.1219, beats_loss=0.006488, ecapa_loss=0.0001605, whisper_loss=0.1138, over 22412.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01063, ecapa_loss=0.0001492, whisper_loss=0.0899, over 3909165.37 frames. ], batch size: 89, lr: 2.65e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:40:47,042 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.55 vs. limit=15.0 2024-08-17 16:40:58,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3408100.0, ans=0.0 2024-08-17 16:41:21,331 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-17 16:41:28,419 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.19 vs. limit=15.0 2024-08-17 16:41:37,399 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 37 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-17 16:41:38,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3408400.0, ans=0.125 2024-08-17 16:41:41,816 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 22 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-17 16:41:48,354 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 8850, loss[loss=0.09213, beats_loss=0.01267, ecapa_loss=0.0001068, whisper_loss=0.07839, over 19175.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01057, ecapa_loss=0.0001491, whisper_loss=0.09023, over 3902056.02 frames. ], batch size: 74, lr: 2.65e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:41:50,183 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.43 vs. limit=15.0 2024-08-17 16:41:56,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3408500.0, ans=0.2 2024-08-17 16:42:44,293 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.312e+01 2.500e+01 2.835e+01 4.395e+02, threshold=5.001e+01, percent-clipped=3.0 2024-08-17 16:42:59,094 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 8900, loss[loss=0.1123, beats_loss=0.0102, ecapa_loss=0.000134, whisper_loss=0.1008, over 19348.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01047, ecapa_loss=0.0001497, whisper_loss=0.09083, over 3883997.19 frames. ], batch size: 73, lr: 2.65e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:43:47,580 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.75 vs. limit=15.0 2024-08-17 16:43:48,124 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 24 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-17 16:43:48,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3409300.0, ans=0.0 2024-08-17 16:44:01,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3409400.0, ans=0.125 2024-08-17 16:44:04,740 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.84 vs. limit=15.0 2024-08-17 16:44:07,936 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 8950, loss[loss=0.09284, beats_loss=0.01318, ecapa_loss=0.0001432, whisper_loss=0.07823, over 21823.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01053, ecapa_loss=0.000149, whisper_loss=0.0909, over 3888264.95 frames. ], batch size: 92, lr: 2.65e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:44:09,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3409500.0, ans=0.125 2024-08-17 16:44:17,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3409500.0, ans=0.0 2024-08-17 16:44:25,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3409600.0, ans=0.1 2024-08-17 16:44:49,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3409800.0, ans=0.1 2024-08-17 16:44:56,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3409800.0, ans=0.09899494936611666 2024-08-17 16:45:00,439 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.019e+01 2.330e+01 2.600e+01 2.951e+01 4.837e+01, threshold=5.199e+01, percent-clipped=0.0 2024-08-17 16:45:02,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3409900.0, ans=0.125 2024-08-17 16:45:02,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3409900.0, ans=0.125 2024-08-17 16:45:02,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3409900.0, ans=0.0 2024-08-17 16:45:13,733 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 9000, loss[loss=0.1108, beats_loss=0.009674, ecapa_loss=0.0001351, whisper_loss=0.09981, over 18561.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01057, ecapa_loss=0.0001485, whisper_loss=0.09048, over 3847340.68 frames. ], batch size: 72, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:45:13,734 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-17 16:45:48,998 INFO [train_multi_KD3.py:1149] (0/4) Epoch 23, validation on ASR_libri: loss=0.2519, beats_loss=0, ecapa_loss=0.0005245, whisper_loss=0.2466, over 922467.00 frames. 2024-08-17 16:46:06,634 INFO [train_multi_KD3.py:1149] (0/4) Epoch 23, validation on SV_voxceleb1: loss=0.004189, beats_loss=0, ecapa_loss=0.0004189, whisper_loss=0, over 939242.00 frames. 2024-08-17 16:47:40,927 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.5328, 1.9023, 2.2561, 1.1490], device='cuda:0') 2024-08-17 16:47:47,255 INFO [train_multi_KD3.py:1149] (0/4) Epoch 23, validation on AT_audioset: loss=0.02324, beats_loss=0.02324, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-17 16:47:47,260 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32389MB 2024-08-17 16:47:54,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3410000.0, ans=0.0 2024-08-17 16:47:56,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3410000.0, ans=0.1 2024-08-17 16:48:05,057 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 27 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-17 16:48:20,723 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=10.02 vs. limit=12.0 2024-08-17 16:48:41,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3410400.0, ans=0.04949747468305833 2024-08-17 16:48:43,907 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-17 16:48:50,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3410400.0, ans=0.0 2024-08-17 16:48:54,331 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 9050, loss[loss=0.116, beats_loss=0.0101, ecapa_loss=0.0001297, whisper_loss=0.1046, over 22721.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01057, ecapa_loss=0.0001473, whisper_loss=0.09082, over 3871221.99 frames. ], batch size: 90, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:49:05,854 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.62 vs. limit=15.0 2024-08-17 16:49:10,764 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 15 from LS+wenet, 27 from Vox, 46 fro AS 2024-08-17 16:49:13,399 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 21 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-17 16:49:16,335 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 35 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-17 16:49:19,260 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=15.62 vs. limit=15.0 2024-08-17 16:49:24,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3410700.0, ans=0.0 2024-08-17 16:49:34,442 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-17 16:49:37,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3410800.0, ans=0.125 2024-08-17 16:49:41,910 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-17 16:49:47,778 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.651e+01 2.352e+01 2.621e+01 2.985e+01 9.819e+01, threshold=5.241e+01, percent-clipped=2.0 2024-08-17 16:49:49,118 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-17 16:49:50,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3410900.0, ans=0.125 2024-08-17 16:50:01,024 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 9100, loss[loss=0.122, beats_loss=0.008108, ecapa_loss=0.0002225, whisper_loss=0.1116, over 19087.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01053, ecapa_loss=0.0001493, whisper_loss=0.09052, over 3904979.20 frames. ], batch size: 82, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:50:06,184 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-17 16:50:09,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3411000.0, ans=0.125 2024-08-17 16:50:23,261 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.18 vs. limit=15.0 2024-08-17 16:50:47,580 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 16 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-17 16:51:02,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3411400.0, ans=0.0 2024-08-17 16:51:05,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3411400.0, ans=0.0 2024-08-17 16:51:08,618 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 9150, loss[loss=0.1095, beats_loss=0.0119, ecapa_loss=0.0001118, whisper_loss=0.09646, over 16606.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01049, ecapa_loss=0.0001484, whisper_loss=0.09092, over 3903117.67 frames. ], batch size: 63, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:51:24,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3411600.0, ans=0.125 2024-08-17 16:51:28,304 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.765e+00 2024-08-17 16:51:37,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3411700.0, ans=0.0 2024-08-17 16:51:40,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3411700.0, ans=0.125 2024-08-17 16:51:42,378 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-17 16:51:46,921 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 29 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-17 16:51:47,328 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.74 vs. limit=15.0 2024-08-17 16:52:04,827 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.286e+01 2.611e+01 2.888e+01 4.834e+01, threshold=5.222e+01, percent-clipped=0.0 2024-08-17 16:52:18,859 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 9200, loss[loss=0.0796, beats_loss=0.01137, ecapa_loss=0.000125, whisper_loss=0.06698, over 17541.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01042, ecapa_loss=0.0001479, whisper_loss=0.09151, over 3938586.83 frames. ], batch size: 66, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:52:19,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3412000.0, ans=0.125 2024-08-17 16:52:21,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3412000.0, ans=0.0 2024-08-17 16:52:24,402 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.98 vs. limit=15.0 2024-08-17 16:52:40,837 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 9 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-17 16:52:43,774 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 23 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-17 16:52:47,539 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 27 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-17 16:52:57,464 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 28 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-17 16:53:03,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3412300.0, ans=0.125 2024-08-17 16:53:16,526 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 20 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-17 16:53:16,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3412400.0, ans=0.2 2024-08-17 16:53:25,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3412400.0, ans=0.125 2024-08-17 16:53:28,876 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 9250, loss[loss=0.1126, beats_loss=0.01046, ecapa_loss=0.000139, whisper_loss=0.1008, over 22705.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01044, ecapa_loss=0.0001467, whisper_loss=0.0919, over 3894086.20 frames. ], batch size: 91, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:53:28,991 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 35 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-17 16:53:40,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3412500.0, ans=0.1 2024-08-17 16:54:05,968 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.12 vs. limit=15.0 2024-08-17 16:54:09,878 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.53 vs. limit=15.0 2024-08-17 16:54:10,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3412800.0, ans=0.0 2024-08-17 16:54:11,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3412800.0, ans=0.2 2024-08-17 16:54:11,237 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.98 vs. limit=15.0 2024-08-17 16:54:17,314 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.31 vs. limit=15.0 2024-08-17 16:54:18,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3412800.0, ans=0.0 2024-08-17 16:54:22,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3412800.0, ans=0.1 2024-08-17 16:54:23,292 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-17 16:54:24,313 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.401e+01 2.711e+01 2.916e+01 4.438e+01, threshold=5.422e+01, percent-clipped=0.0 2024-08-17 16:54:27,401 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 21 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-17 16:54:27,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3412900.0, ans=0.1 2024-08-17 16:54:39,002 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 9300, loss[loss=0.1179, beats_loss=0.01044, ecapa_loss=0.0001459, whisper_loss=0.106, over 23127.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01057, ecapa_loss=0.0001468, whisper_loss=0.09112, over 3901048.24 frames. ], batch size: 89, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:54:42,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3413000.0, ans=0.0 2024-08-17 16:54:51,569 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-17 16:54:53,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3413100.0, ans=0.125 2024-08-17 16:54:57,152 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-17 16:55:09,885 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 13 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-17 16:55:12,385 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-17 16:55:23,569 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.82 vs. limit=15.0 2024-08-17 16:55:32,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3413400.0, ans=0.0 2024-08-17 16:55:45,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3413400.0, ans=0.125 2024-08-17 16:55:48,067 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 9350, loss[loss=0.0974, beats_loss=0.01305, ecapa_loss=0.0001369, whisper_loss=0.08299, over 23323.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01063, ecapa_loss=0.0001472, whisper_loss=0.09096, over 3937915.18 frames. ], batch size: 96, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:55:48,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3413500.0, ans=0.125 2024-08-17 16:56:02,358 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.00 vs. limit=15.0 2024-08-17 16:56:08,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3413600.0, ans=0.1 2024-08-17 16:56:13,385 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.62 vs. limit=6.0 2024-08-17 16:56:32,649 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.44 vs. limit=22.5 2024-08-17 16:56:42,157 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.915e+01 2.341e+01 2.647e+01 3.045e+01 1.659e+02, threshold=5.294e+01, percent-clipped=2.0 2024-08-17 16:56:42,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3413900.0, ans=0.125 2024-08-17 16:56:48,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3413900.0, ans=0.125 2024-08-17 16:56:52,117 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.07 vs. limit=15.0 2024-08-17 16:56:55,715 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 9400, loss[loss=0.1002, beats_loss=0.01146, ecapa_loss=0.0001633, whisper_loss=0.08706, over 22697.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01061, ecapa_loss=0.0001485, whisper_loss=0.09111, over 3920012.78 frames. ], batch size: 91, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:56:58,513 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 18 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-17 16:57:19,001 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 19 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-17 16:57:25,472 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-17 16:57:30,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3414200.0, ans=0.125 2024-08-17 16:58:00,221 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 9450, loss[loss=0.09663, beats_loss=0.01177, ecapa_loss=0.0001371, whisper_loss=0.08349, over 22647.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01064, ecapa_loss=0.0001478, whisper_loss=0.09083, over 3930452.05 frames. ], batch size: 91, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:58:00,344 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-17 16:58:01,507 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-17 16:58:08,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3414500.0, ans=0.125 2024-08-17 16:58:13,855 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.45 vs. limit=10.0 2024-08-17 16:58:25,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3414700.0, ans=0.1 2024-08-17 16:58:28,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3414700.0, ans=0.0 2024-08-17 16:58:29,981 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 22 from LS+wenet, 12 from Vox, 20 fro AS 2024-08-17 16:58:30,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3414700.0, ans=0.1 2024-08-17 16:58:51,539 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.271e+01 2.511e+01 2.774e+01 4.657e+01, threshold=5.021e+01, percent-clipped=0.0 2024-08-17 16:59:04,740 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 9500, loss[loss=0.1107, beats_loss=0.008647, ecapa_loss=0.0001618, whisper_loss=0.1004, over 18256.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01066, ecapa_loss=0.0001475, whisper_loss=0.09043, over 3911793.04 frames. ], batch size: 70, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:59:44,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3415300.0, ans=0.125 2024-08-17 16:59:50,579 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-17 16:59:54,148 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 31 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-17 16:59:55,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3415400.0, ans=0.125 2024-08-17 17:00:07,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3415500.0, ans=0.025 2024-08-17 17:00:08,342 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 9550, loss[loss=0.1076, beats_loss=0.01168, ecapa_loss=0.0001299, whisper_loss=0.09467, over 24011.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01068, ecapa_loss=0.0001473, whisper_loss=0.08974, over 3874178.82 frames. ], batch size: 94, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:00:08,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3415500.0, ans=0.125 2024-08-17 17:00:09,572 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 19 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-17 17:00:11,967 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 15 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-17 17:00:12,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3415500.0, ans=0.0 2024-08-17 17:00:20,584 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 11 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-17 17:00:21,847 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 13 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-17 17:00:24,622 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 17 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-17 17:00:43,665 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.17 vs. limit=22.5 2024-08-17 17:00:44,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3415800.0, ans=0.0 2024-08-17 17:00:44,784 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 17:00:47,374 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.01 vs. limit=15.0 2024-08-17 17:00:55,643 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-17 17:00:57,908 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.741e+01 2.226e+01 2.523e+01 2.913e+01 4.217e+01, threshold=5.047e+01, percent-clipped=0.0 2024-08-17 17:01:10,916 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 9600, loss[loss=0.0761, beats_loss=0.01039, ecapa_loss=0.0001568, whisper_loss=0.06414, over 20727.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01055, ecapa_loss=0.0001485, whisper_loss=0.08975, over 3853602.86 frames. ], batch size: 84, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:01:17,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3416000.0, ans=0.125 2024-08-17 17:01:18,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3416000.0, ans=0.125 2024-08-17 17:01:21,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3416000.0, ans=0.125 2024-08-17 17:01:32,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3416100.0, ans=0.0 2024-08-17 17:01:40,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3416200.0, ans=0.125 2024-08-17 17:01:41,923 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2024-08-17 17:02:11,025 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-17 17:02:12,441 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-17 17:02:13,483 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 9650, loss[loss=0.1094, beats_loss=0.009163, ecapa_loss=0.0001638, whisper_loss=0.09862, over 21849.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01055, ecapa_loss=0.0001485, whisper_loss=0.09045, over 3881760.05 frames. ], batch size: 86, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:02:28,442 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 18 from Vox, 50 fro AS 2024-08-17 17:02:35,047 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-17 17:02:41,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3416700.0, ans=0.2 2024-08-17 17:02:50,605 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 26 from LS+wenet, 13 from Vox, 43 fro AS 2024-08-17 17:03:03,962 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.357e+01 2.621e+01 2.968e+01 4.527e+01, threshold=5.241e+01, percent-clipped=0.0 2024-08-17 17:03:06,781 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3416900.0, ans=0.125 2024-08-17 17:03:08,054 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.604e+01 2024-08-17 17:03:13,091 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-17 17:03:16,844 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 9700, loss[loss=0.1228, beats_loss=0.008444, ecapa_loss=0.0001645, whisper_loss=0.1127, over 16133.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01052, ecapa_loss=0.0001485, whisper_loss=0.09101, over 3912094.62 frames. ], batch size: 64, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:03:30,509 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 21 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-17 17:03:35,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3417100.0, ans=0.0 2024-08-17 17:03:37,965 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 22 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-17 17:03:48,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3417200.0, ans=0.0 2024-08-17 17:03:55,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3417300.0, ans=0.2 2024-08-17 17:04:17,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3417400.0, ans=0.0 2024-08-17 17:04:19,305 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 9750, loss[loss=0.1177, beats_loss=0.009687, ecapa_loss=0.0001525, whisper_loss=0.1065, over 16360.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01049, ecapa_loss=0.0001481, whisper_loss=0.09096, over 3926233.22 frames. ], batch size: 67, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:04:19,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=3417500.0, ans=0.5 2024-08-17 17:04:24,559 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 36 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-17 17:04:44,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3417700.0, ans=0.1 2024-08-17 17:04:50,120 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 15 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-17 17:04:54,002 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 27 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-17 17:04:54,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3417700.0, ans=0.0 2024-08-17 17:04:55,212 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 24 from LS+wenet, 9 from Vox, 34 fro AS 2024-08-17 17:05:11,015 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.328e+01 2.610e+01 2.968e+01 3.624e+01, threshold=5.221e+01, percent-clipped=0.0 2024-08-17 17:05:23,973 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 9800, loss[loss=0.1163, beats_loss=0.01119, ecapa_loss=0.0001548, whisper_loss=0.1036, over 21517.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01051, ecapa_loss=0.0001479, whisper_loss=0.09103, over 3876983.27 frames. ], batch size: 89, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:05:25,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3418000.0, ans=0.125 2024-08-17 17:05:32,447 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.84 vs. limit=15.0 2024-08-17 17:05:36,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3418100.0, ans=0.05 2024-08-17 17:05:49,664 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-17 17:05:54,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3418200.0, ans=0.1 2024-08-17 17:06:02,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3418300.0, ans=0.05 2024-08-17 17:06:05,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3418300.0, ans=0.125 2024-08-17 17:06:06,949 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.10 vs. limit=6.0 2024-08-17 17:06:10,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3418300.0, ans=0.125 2024-08-17 17:06:14,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3418400.0, ans=0.125 2024-08-17 17:06:18,057 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-17 17:06:21,968 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 17:06:27,859 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 9850, loss[loss=0.0961, beats_loss=0.01127, ecapa_loss=0.0001499, whisper_loss=0.08333, over 14150.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01046, ecapa_loss=0.0001473, whisper_loss=0.09083, over 3857855.04 frames. ], batch size: 57, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:06:29,319 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-17 17:06:39,391 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 21 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-17 17:06:43,605 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.14 vs. limit=15.0 2024-08-17 17:06:45,649 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-17 17:06:48,313 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-17 17:07:03,698 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.68 vs. limit=22.5 2024-08-17 17:07:15,346 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-17 17:07:16,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3418900.0, ans=0.125 2024-08-17 17:07:17,677 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.311e+01 2.569e+01 2.955e+01 4.968e+01, threshold=5.138e+01, percent-clipped=0.0 2024-08-17 17:07:20,320 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-17 17:07:21,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3418900.0, ans=0.0 2024-08-17 17:07:30,182 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 9900, loss[loss=0.09758, beats_loss=0.01272, ecapa_loss=0.0001228, whisper_loss=0.08363, over 22801.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01049, ecapa_loss=0.0001475, whisper_loss=0.09098, over 3878807.77 frames. ], batch size: 88, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:07:36,875 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.80 vs. limit=15.0 2024-08-17 17:07:39,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3419000.0, ans=0.1 2024-08-17 17:07:56,569 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 29 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-17 17:08:09,079 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 26 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-17 17:08:18,950 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-17 17:08:25,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3419400.0, ans=0.125 2024-08-17 17:08:32,719 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 9950, loss[loss=0.09837, beats_loss=0.01196, ecapa_loss=0.0001695, whisper_loss=0.08471, over 19388.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01053, ecapa_loss=0.0001467, whisper_loss=0.09042, over 3882497.17 frames. ], batch size: 82, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:08:39,062 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 20 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-17 17:08:46,177 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.64 vs. limit=15.0 2024-08-17 17:08:48,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3419600.0, ans=0.5 2024-08-17 17:08:52,736 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 18 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-17 17:09:13,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3419800.0, ans=0.125 2024-08-17 17:09:16,730 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 28 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-17 17:09:22,869 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.277e+01 2.547e+01 2.876e+01 4.081e+01, threshold=5.094e+01, percent-clipped=0.0 2024-08-17 17:09:28,767 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.70 vs. limit=22.5 2024-08-17 17:09:35,742 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 10000, loss[loss=0.1004, beats_loss=0.009763, ecapa_loss=0.0001706, whisper_loss=0.0889, over 22354.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01046, ecapa_loss=0.0001469, whisper_loss=0.09094, over 3862775.06 frames. ], batch size: 91, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:09:39,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3420000.0, ans=0.125 2024-08-17 17:09:51,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3420100.0, ans=0.0 2024-08-17 17:10:09,352 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-17 17:10:15,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3420300.0, ans=0.125 2024-08-17 17:10:17,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3420300.0, ans=0.0 2024-08-17 17:10:22,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3420300.0, ans=0.125 2024-08-17 17:10:22,704 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.03 vs. limit=6.0 2024-08-17 17:10:35,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3420400.0, ans=0.1 2024-08-17 17:10:38,221 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 10050, loss[loss=0.1159, beats_loss=0.009336, ecapa_loss=0.0001299, whisper_loss=0.1052, over 22748.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01047, ecapa_loss=0.0001478, whisper_loss=0.0905, over 3844253.61 frames. ], batch size: 88, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:10:59,182 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.32 vs. limit=8.0 2024-08-17 17:11:08,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3420700.0, ans=0.04949747468305833 2024-08-17 17:11:09,524 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-17 17:11:10,110 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.46 vs. limit=10.0 2024-08-17 17:11:12,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3420700.0, ans=0.1 2024-08-17 17:11:22,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3420800.0, ans=0.0 2024-08-17 17:11:28,096 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.989e+01 2.359e+01 2.579e+01 2.991e+01 2.335e+02, threshold=5.159e+01, percent-clipped=2.0 2024-08-17 17:11:33,106 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 29 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-17 17:11:34,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3420900.0, ans=0.1 2024-08-17 17:11:40,713 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 10100, loss[loss=0.1119, beats_loss=0.01033, ecapa_loss=0.0001145, whisper_loss=0.1004, over 23977.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01046, ecapa_loss=0.0001474, whisper_loss=0.09083, over 3867154.37 frames. ], batch size: 90, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:11:59,750 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 25 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-17 17:12:02,283 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 27 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-17 17:12:02,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3421100.0, ans=0.0 2024-08-17 17:12:03,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3421100.0, ans=0.0 2024-08-17 17:12:09,061 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.87 vs. limit=15.0 2024-08-17 17:12:15,428 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.36 vs. limit=22.5 2024-08-17 17:12:23,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3421300.0, ans=0.2 2024-08-17 17:12:41,942 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.40 vs. limit=15.0 2024-08-17 17:12:43,572 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 10150, loss[loss=0.08218, beats_loss=0.01208, ecapa_loss=0.0001197, whisper_loss=0.06891, over 16159.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01048, ecapa_loss=0.0001471, whisper_loss=0.09098, over 3850068.57 frames. ], batch size: 62, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:12:43,748 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-17 17:13:33,379 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.337e+01 2.592e+01 2.912e+01 4.503e+01, threshold=5.183e+01, percent-clipped=0.0 2024-08-17 17:13:34,984 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 17:13:36,006 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-17 17:13:38,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3421900.0, ans=0.0 2024-08-17 17:13:42,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3421900.0, ans=0.125 2024-08-17 17:13:45,725 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 10200, loss[loss=0.1056, beats_loss=0.01018, ecapa_loss=0.0001504, whisper_loss=0.09396, over 21865.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01048, ecapa_loss=0.0001484, whisper_loss=0.09084, over 3832591.95 frames. ], batch size: 88, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:14:00,176 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.69 vs. limit=12.0 2024-08-17 17:14:10,712 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-17 17:14:11,242 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.86 vs. limit=15.0 2024-08-17 17:14:22,599 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.75 vs. limit=10.0 2024-08-17 17:14:31,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3422300.0, ans=0.125 2024-08-17 17:14:47,665 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 10250, loss[loss=0.112, beats_loss=0.01027, ecapa_loss=0.0001425, whisper_loss=0.1003, over 24100.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01055, ecapa_loss=0.0001479, whisper_loss=0.09077, over 3846328.55 frames. ], batch size: 95, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:15:20,150 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-17 17:15:24,760 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.83 vs. limit=22.5 2024-08-17 17:15:26,720 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 26 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-17 17:15:29,189 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-17 17:15:31,952 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-17 17:15:37,966 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.362e+01 2.622e+01 2.945e+01 4.411e+01, threshold=5.245e+01, percent-clipped=0.0 2024-08-17 17:15:48,090 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 29 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-17 17:15:50,576 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 10300, loss[loss=0.08112, beats_loss=0.01389, ecapa_loss=0.0001233, whisper_loss=0.066, over 22880.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01063, ecapa_loss=0.0001487, whisper_loss=0.0904, over 3846517.80 frames. ], batch size: 92, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:15:52,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3423000.0, ans=0.125 2024-08-17 17:15:53,087 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 24 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-17 17:16:06,635 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-17 17:16:08,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3423100.0, ans=0.0 2024-08-17 17:16:08,267 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.67 vs. limit=15.0 2024-08-17 17:16:29,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3423300.0, ans=0.125 2024-08-17 17:16:29,884 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.28 vs. limit=22.5 2024-08-17 17:16:42,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3423400.0, ans=0.035 2024-08-17 17:16:48,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3423400.0, ans=0.2 2024-08-17 17:16:52,946 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 10350, loss[loss=0.1047, beats_loss=0.0128, ecapa_loss=0.0001326, whisper_loss=0.09059, over 21946.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01064, ecapa_loss=0.0001484, whisper_loss=0.08999, over 3840248.15 frames. ], batch size: 88, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:16:54,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3423500.0, ans=0.125 2024-08-17 17:16:56,400 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.54 vs. limit=15.0 2024-08-17 17:17:02,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3423500.0, ans=0.125 2024-08-17 17:17:18,407 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-17 17:17:18,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3423700.0, ans=0.0 2024-08-17 17:17:39,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3423800.0, ans=0.1 2024-08-17 17:17:43,585 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.997e+01 2.300e+01 2.585e+01 2.974e+01 4.112e+01, threshold=5.170e+01, percent-clipped=0.0 2024-08-17 17:17:56,139 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 10400, loss[loss=0.09236, beats_loss=0.01428, ecapa_loss=0.0001323, whisper_loss=0.07676, over 19664.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01071, ecapa_loss=0.0001486, whisper_loss=0.08952, over 3879742.10 frames. ], batch size: 81, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:18:05,109 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 23 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-17 17:18:28,432 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 30 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-17 17:18:29,668 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 22 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-17 17:18:32,721 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.06 vs. limit=15.0 2024-08-17 17:18:36,391 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.43 vs. limit=22.5 2024-08-17 17:18:48,411 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 15 from Vox, 48 fro AS 2024-08-17 17:18:56,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3424400.0, ans=0.125 2024-08-17 17:18:58,099 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 10450, loss[loss=0.107, beats_loss=0.009464, ecapa_loss=0.000137, whisper_loss=0.09613, over 23468.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01066, ecapa_loss=0.000148, whisper_loss=0.08954, over 3847040.23 frames. ], batch size: 93, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:19:03,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3424500.0, ans=0.0 2024-08-17 17:19:06,091 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.18 vs. limit=10.0 2024-08-17 17:19:15,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3424600.0, ans=0.1 2024-08-17 17:19:17,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3424600.0, ans=0.0 2024-08-17 17:19:23,784 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.63 vs. limit=6.0 2024-08-17 17:19:29,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3424700.0, ans=0.0 2024-08-17 17:19:47,896 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.411e+01 2.718e+01 3.284e+01 2.620e+02, threshold=5.436e+01, percent-clipped=5.0 2024-08-17 17:19:48,070 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-17 17:19:49,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3424900.0, ans=0.0 2024-08-17 17:20:00,391 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 10500, loss[loss=0.112, beats_loss=0.009079, ecapa_loss=0.0001909, whisper_loss=0.101, over 21328.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01065, ecapa_loss=0.0001491, whisper_loss=0.08966, over 3855222.65 frames. ], batch size: 91, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:20:00,503 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 18 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-17 17:20:01,051 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2024-08-17 17:20:09,666 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-17 17:20:15,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3425100.0, ans=0.015 2024-08-17 17:20:16,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3425100.0, ans=0.125 2024-08-17 17:20:28,230 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 25 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-17 17:20:40,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3425300.0, ans=0.125 2024-08-17 17:20:54,556 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.49 vs. limit=15.0 2024-08-17 17:21:02,655 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 10550, loss[loss=0.1254, beats_loss=0.009653, ecapa_loss=0.0001366, whisper_loss=0.1144, over 22933.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01061, ecapa_loss=0.0001489, whisper_loss=0.09062, over 3876079.65 frames. ], batch size: 89, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:21:10,815 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2024-08-17 17:21:20,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3425600.0, ans=0.0 2024-08-17 17:21:21,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3425600.0, ans=0.125 2024-08-17 17:21:38,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3425800.0, ans=0.125 2024-08-17 17:21:38,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3425800.0, ans=0.0 2024-08-17 17:21:43,607 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2024-08-17 17:21:48,107 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-17 17:21:51,591 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.414e+01 2.704e+01 3.069e+01 2.193e+02, threshold=5.408e+01, percent-clipped=2.0 2024-08-17 17:21:57,872 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 39 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-17 17:22:03,961 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 10600, loss[loss=0.1006, beats_loss=0.01158, ecapa_loss=0.0001622, whisper_loss=0.08742, over 22584.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01065, ecapa_loss=0.0001482, whisper_loss=0.09028, over 3894490.15 frames. ], batch size: 93, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:22:06,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3426000.0, ans=0.0 2024-08-17 17:22:09,123 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 33 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-17 17:22:09,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3426000.0, ans=0.125 2024-08-17 17:22:16,550 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 26 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-17 17:22:27,087 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.73 vs. limit=22.5 2024-08-17 17:22:41,651 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 35 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-17 17:22:45,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3426300.0, ans=0.2 2024-08-17 17:22:56,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3426400.0, ans=0.015 2024-08-17 17:22:58,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3426400.0, ans=10.0 2024-08-17 17:22:59,260 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-17 17:23:01,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3426400.0, ans=0.2 2024-08-17 17:23:06,519 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 10650, loss[loss=0.1031, beats_loss=0.0115, ecapa_loss=0.0001827, whisper_loss=0.08978, over 19303.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0106, ecapa_loss=0.0001479, whisper_loss=0.0907, over 3879520.08 frames. ], batch size: 80, lr: 2.64e-03, grad_scale: 1.152921504606847e+18 2024-08-17 17:23:09,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3426500.0, ans=0.125 2024-08-17 17:23:22,382 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.24 vs. limit=6.0 2024-08-17 17:23:35,609 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.19 vs. limit=22.5 2024-08-17 17:23:36,256 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 29 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-17 17:23:39,489 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.68 vs. limit=22.5 2024-08-17 17:23:41,550 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-17 17:23:47,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3426800.0, ans=0.0 2024-08-17 17:23:56,146 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.375e+01 2.587e+01 2.940e+01 1.166e+02, threshold=5.174e+01, percent-clipped=1.0 2024-08-17 17:23:56,863 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.92 vs. limit=22.5 2024-08-17 17:24:04,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3426900.0, ans=0.1 2024-08-17 17:24:08,950 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 10700, loss[loss=0.1006, beats_loss=0.01243, ecapa_loss=0.0001094, whisper_loss=0.08709, over 19522.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01064, ecapa_loss=0.0001475, whisper_loss=0.09089, over 3888175.48 frames. ], batch size: 76, lr: 2.64e-03, grad_scale: 1.152921504606847e+18 2024-08-17 17:24:10,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3427000.0, ans=0.0 2024-08-17 17:24:10,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3427000.0, ans=0.125 2024-08-17 17:24:20,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3427100.0, ans=0.2 2024-08-17 17:24:30,034 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 29 from Vox, 26 fro AS 2024-08-17 17:24:45,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3427300.0, ans=0.125 2024-08-17 17:24:46,272 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-17 17:24:57,999 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.03 vs. limit=10.0 2024-08-17 17:25:02,703 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 20 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-17 17:25:09,989 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-17 17:25:10,929 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 10750, loss[loss=0.1223, beats_loss=0.009542, ecapa_loss=0.0001445, whisper_loss=0.1114, over 22915.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01062, ecapa_loss=0.0001471, whisper_loss=0.09152, over 3902277.36 frames. ], batch size: 90, lr: 2.64e-03, grad_scale: 1.152921504606847e+18 2024-08-17 17:25:14,661 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 17 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-17 17:25:24,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3427600.0, ans=0.125 2024-08-17 17:25:42,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3427700.0, ans=0.0 2024-08-17 17:25:52,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3427800.0, ans=0.125 2024-08-17 17:25:58,118 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 34 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-17 17:25:59,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3427900.0, ans=0.2 2024-08-17 17:26:00,265 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.471e+01 2.708e+01 3.054e+01 4.365e+01, threshold=5.417e+01, percent-clipped=0.0 2024-08-17 17:26:03,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3427900.0, ans=0.125 2024-08-17 17:26:12,891 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 10800, loss[loss=0.1079, beats_loss=0.01233, ecapa_loss=0.0001159, whisper_loss=0.09437, over 20139.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0106, ecapa_loss=0.0001476, whisper_loss=0.09096, over 3882759.32 frames. ], batch size: 78, lr: 2.64e-03, grad_scale: 1.152921504606847e+18 2024-08-17 17:26:14,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3428000.0, ans=0.1 2024-08-17 17:26:34,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3428100.0, ans=0.125 2024-08-17 17:26:44,707 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 27 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-17 17:26:47,588 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.90 vs. limit=15.0 2024-08-17 17:27:02,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3428400.0, ans=0.1 2024-08-17 17:27:03,347 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 12 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-17 17:27:10,193 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=12.01 vs. limit=15.0 2024-08-17 17:27:15,586 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 10850, loss[loss=0.08447, beats_loss=0.01265, ecapa_loss=0.0001381, whisper_loss=0.07044, over 22167.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01064, ecapa_loss=0.000147, whisper_loss=0.09075, over 3896757.94 frames. ], batch size: 90, lr: 2.64e-03, grad_scale: 1.152921504606847e+18 2024-08-17 17:27:30,750 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 19 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-17 17:27:45,555 WARNING [optim.py:496] (0/4) Scaling gradients by 0.07737734913825989, model_norm_threshold=54.16817855834961 2024-08-17 17:27:45,711 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.25, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.205e+05, grad_sumsq=1.205e+05, orig_rms_sq=1.000e+00 2024-08-17 17:27:55,823 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-17 17:27:56,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3428800.0, ans=0.1 2024-08-17 17:27:58,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3428800.0, ans=0.0 2024-08-17 17:27:59,694 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-17 17:28:05,717 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.955e+01 2.361e+01 2.674e+01 3.069e+01 7.001e+02, threshold=5.348e+01, percent-clipped=1.0 2024-08-17 17:28:18,417 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 10900, loss[loss=0.09939, beats_loss=0.01246, ecapa_loss=0.0001273, whisper_loss=0.08566, over 21148.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01066, ecapa_loss=0.0001472, whisper_loss=0.09046, over 3876414.45 frames. ], batch size: 84, lr: 2.64e-03, grad_scale: 1.152921504606847e+18 2024-08-17 17:28:21,144 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=7.376e-02 2024-08-17 17:28:28,549 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.35 vs. limit=22.5 2024-08-17 17:28:30,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3429100.0, ans=0.025 2024-08-17 17:28:32,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3429100.0, ans=0.1 2024-08-17 17:28:34,307 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-17 17:28:39,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3429100.0, ans=0.0 2024-08-17 17:28:41,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3429200.0, ans=0.125 2024-08-17 17:28:44,432 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-17 17:28:48,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3429200.0, ans=0.125 2024-08-17 17:28:49,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3429200.0, ans=0.0 2024-08-17 17:28:50,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3429200.0, ans=0.025 2024-08-17 17:28:54,322 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 17 from LS+wenet, 24 from Vox, 48 fro AS 2024-08-17 17:29:16,061 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.22 vs. limit=15.0 2024-08-17 17:29:18,902 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 20 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-17 17:29:20,178 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 10950, loss[loss=0.1072, beats_loss=0.009826, ecapa_loss=0.0001343, whisper_loss=0.09604, over 16402.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0107, ecapa_loss=0.0001463, whisper_loss=0.08991, over 3880396.97 frames. ], batch size: 60, lr: 2.64e-03, grad_scale: 1.152921504606847e+18 2024-08-17 17:29:23,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3429500.0, ans=0.0 2024-08-17 17:29:36,437 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 26 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-17 17:29:38,973 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-17 17:29:43,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3429600.0, ans=0.0 2024-08-17 17:29:45,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3429700.0, ans=0.2 2024-08-17 17:29:46,592 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 18 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-17 17:29:50,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3429700.0, ans=0.07 2024-08-17 17:30:01,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3429800.0, ans=0.125 2024-08-17 17:30:10,087 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+01 2.341e+01 2.537e+01 2.893e+01 3.514e+01, threshold=5.074e+01, percent-clipped=0.0 2024-08-17 17:30:11,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3429900.0, ans=0.0 2024-08-17 17:30:14,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3429900.0, ans=0.125 2024-08-17 17:30:15,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3429900.0, ans=0.1 2024-08-17 17:30:15,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3429900.0, ans=0.125 2024-08-17 17:30:22,499 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 11000, loss[loss=0.07642, beats_loss=0.01262, ecapa_loss=0.0001675, whisper_loss=0.06213, over 16794.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01073, ecapa_loss=0.0001461, whisper_loss=0.08965, over 3882493.23 frames. ], batch size: 73, lr: 2.64e-03, grad_scale: 1.152921504606847e+18 2024-08-17 17:30:32,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3430000.0, ans=0.0 2024-08-17 17:30:38,481 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-17 17:30:57,174 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 18 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-17 17:30:59,564 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 31 from LS+wenet, 32 from Vox, 31 fro AS 2024-08-17 17:31:06,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3430300.0, ans=0.0 2024-08-17 17:31:07,790 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.32 vs. limit=6.0 2024-08-17 17:31:08,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3430300.0, ans=0.0 2024-08-17 17:31:12,171 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-17 17:31:14,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3430400.0, ans=10.0 2024-08-17 17:31:23,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3430500.0, ans=0.0 2024-08-17 17:31:24,573 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 11050, loss[loss=0.1363, beats_loss=0.00727, ecapa_loss=0.0001767, whisper_loss=0.1273, over 22079.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01066, ecapa_loss=0.000147, whisper_loss=0.08985, over 3857386.84 frames. ], batch size: 87, lr: 2.64e-03, grad_scale: 1.152921504606847e+18 2024-08-17 17:31:33,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3430500.0, ans=0.125 2024-08-17 17:31:52,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3430700.0, ans=0.125 2024-08-17 17:32:00,276 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-17 17:32:04,716 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.09 vs. limit=22.5 2024-08-17 17:32:09,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3430800.0, ans=0.125 2024-08-17 17:32:14,749 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.68 vs. limit=12.0 2024-08-17 17:32:15,234 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.316e+01 2.591e+01 2.930e+01 4.532e+01, threshold=5.182e+01, percent-clipped=0.0 2024-08-17 17:32:16,453 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-17 17:32:19,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3430900.0, ans=0.125 2024-08-17 17:32:25,921 INFO [train_multi_KD3.py:1116] (0/4) Epoch 23, batch 11100, loss[loss=0.1165, beats_loss=0.01121, ecapa_loss=0.0001457, whisper_loss=0.1038, over 22853.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01064, ecapa_loss=0.0001479, whisper_loss=0.09003, over 3856237.94 frames. ], batch size: 93, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:32:27,268 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-17 17:32:41,404 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 17 from LS+wenet, 28 from Vox, 45 fro AS 2024-08-17 17:33:03,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3431300.0, ans=0.0 2024-08-17 17:33:04,419 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 14 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-17 17:33:07,685 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-17 17:33:13,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3431400.0, ans=0.0 2024-08-17 17:33:16,950 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/epoch-23.pt 2024-08-17 17:33:46,642 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 0, loss[loss=0.1121, beats_loss=0.008792, ecapa_loss=0.0001642, whisper_loss=0.1016, over 19693.00 frames. ], tot_loss[loss=0.1121, beats_loss=0.008792, ecapa_loss=0.0001642, whisper_loss=0.1016, over 19693.00 frames. ], batch size: 78, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:33:46,644 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-17 17:34:22,024 INFO [train_multi_KD3.py:1149] (0/4) Epoch 24, validation on ASR_libri: loss=0.2501, beats_loss=0, ecapa_loss=0.0005267, whisper_loss=0.2449, over 922467.00 frames. 2024-08-17 17:34:36,635 INFO [train_multi_KD3.py:1149] (0/4) Epoch 24, validation on SV_voxceleb1: loss=0.004161, beats_loss=0, ecapa_loss=0.0004161, whisper_loss=0, over 939242.00 frames. 2024-08-17 17:36:06,712 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.3832, 3.6334, 4.1338, 4.2074], device='cuda:0') 2024-08-17 17:36:20,257 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.9695, 1.7764, 2.1147, 1.3035], device='cuda:0') 2024-08-17 17:36:22,828 INFO [train_multi_KD3.py:1149] (0/4) Epoch 24, validation on AT_audioset: loss=0.02331, beats_loss=0.02331, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-17 17:36:22,831 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32389MB 2024-08-17 17:36:24,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3431420.0, ans=0.0 2024-08-17 17:36:24,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3431420.0, ans=0.1 2024-08-17 17:36:26,587 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=18.89 vs. limit=22.5 2024-08-17 17:37:19,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3431720.0, ans=0.0 2024-08-17 17:37:33,366 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 17:37:36,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3431820.0, ans=0.0 2024-08-17 17:37:43,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3431820.0, ans=0.0 2024-08-17 17:37:49,134 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.990e+01 2.526e+01 2.875e+01 3.255e+01 4.776e+01, threshold=5.751e+01, percent-clipped=0.0 2024-08-17 17:37:49,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3431920.0, ans=0.0 2024-08-17 17:37:51,224 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 50, loss[loss=0.08833, beats_loss=0.009574, ecapa_loss=0.0001273, whisper_loss=0.07748, over 22051.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.009132, ecapa_loss=0.0001571, whisper_loss=0.09038, over 862058.40 frames. ], batch size: 85, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:38:06,717 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 20 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-17 17:38:19,960 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 15 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-17 17:38:23,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3432120.0, ans=0.2 2024-08-17 17:38:25,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3432120.0, ans=0.125 2024-08-17 17:38:28,983 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3432120.0, ans=0.2 2024-08-17 17:38:47,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3432220.0, ans=0.0 2024-08-17 17:38:59,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3432320.0, ans=0.0 2024-08-17 17:39:14,871 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-17 17:39:18,441 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 100, loss[loss=0.112, beats_loss=0.00821, ecapa_loss=0.0001603, whisper_loss=0.1021, over 21524.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.009416, ecapa_loss=0.0001524, whisper_loss=0.09168, over 1529722.05 frames. ], batch size: 86, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:39:38,227 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-17 17:39:38,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3432520.0, ans=0.0 2024-08-17 17:39:47,279 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 23 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-17 17:39:52,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3432620.0, ans=0.125 2024-08-17 17:39:54,731 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-17 17:40:14,901 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.10 vs. limit=15.0 2024-08-17 17:40:17,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3432720.0, ans=0.125 2024-08-17 17:40:29,896 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-17 17:40:33,059 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 25 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-17 17:40:37,415 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 18 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-17 17:40:41,426 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.203e+01 2.631e+01 2.865e+01 3.182e+01 4.534e+01, threshold=5.730e+01, percent-clipped=0.0 2024-08-17 17:40:42,700 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 150, loss[loss=0.1035, beats_loss=0.009529, ecapa_loss=0.000163, whisper_loss=0.09237, over 22104.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.009591, ecapa_loss=0.0001499, whisper_loss=0.09006, over 2028818.71 frames. ], batch size: 87, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:40:58,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3433020.0, ans=0.0 2024-08-17 17:41:08,058 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-17 17:41:13,761 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-17 17:41:15,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3433120.0, ans=0.125 2024-08-17 17:41:25,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3433220.0, ans=0.2 2024-08-17 17:41:26,359 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.09 vs. limit=12.0 2024-08-17 17:41:37,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3433320.0, ans=0.0 2024-08-17 17:41:38,727 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 28 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-17 17:41:45,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3433320.0, ans=0.2 2024-08-17 17:41:50,197 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 200, loss[loss=0.1185, beats_loss=0.01049, ecapa_loss=0.00011, whisper_loss=0.1069, over 15358.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.009802, ecapa_loss=0.0001502, whisper_loss=0.08993, over 2446652.21 frames. ], batch size: 55, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:41:54,179 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-17 17:42:27,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3433620.0, ans=0.125 2024-08-17 17:42:30,766 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 30 from LS+wenet, 11 from Vox, 34 fro AS 2024-08-17 17:42:51,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3433820.0, ans=0.125 2024-08-17 17:42:53,758 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.081e+01 2.383e+01 2.582e+01 2.960e+01 4.276e+01, threshold=5.164e+01, percent-clipped=0.0 2024-08-17 17:42:54,970 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 250, loss[loss=0.09516, beats_loss=0.01075, ecapa_loss=0.0001377, whisper_loss=0.08303, over 15279.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01003, ecapa_loss=0.0001489, whisper_loss=0.09023, over 2764256.47 frames. ], batch size: 59, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:43:49,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3434320.0, ans=0.125 2024-08-17 17:43:50,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3434320.0, ans=0.0 2024-08-17 17:43:53,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3434320.0, ans=0.2 2024-08-17 17:43:59,617 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 300, loss[loss=0.08407, beats_loss=0.01241, ecapa_loss=0.0001564, whisper_loss=0.0701, over 20004.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01019, ecapa_loss=0.0001502, whisper_loss=0.08962, over 2988481.62 frames. ], batch size: 81, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:44:02,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3434420.0, ans=0.04949747468305833 2024-08-17 17:44:05,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3434420.0, ans=0.1 2024-08-17 17:44:06,351 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-17 17:44:08,421 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.48 vs. limit=15.0 2024-08-17 17:44:10,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3434420.0, ans=0.125 2024-08-17 17:44:14,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3434520.0, ans=0.0 2024-08-17 17:44:17,470 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.898e+01 2024-08-17 17:44:21,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3434520.0, ans=0.125 2024-08-17 17:44:44,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3434720.0, ans=0.0 2024-08-17 17:44:54,263 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.35 vs. limit=15.0 2024-08-17 17:45:05,620 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.312e+01 2.476e+01 2.764e+01 4.106e+02, threshold=4.951e+01, percent-clipped=1.0 2024-08-17 17:45:06,987 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 350, loss[loss=0.08762, beats_loss=0.01259, ecapa_loss=0.0001423, whisper_loss=0.07361, over 20232.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01023, ecapa_loss=0.0001494, whisper_loss=0.09003, over 3158706.16 frames. ], batch size: 82, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:45:15,312 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 19 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-17 17:45:17,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3434920.0, ans=0.1 2024-08-17 17:45:19,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3435020.0, ans=0.1 2024-08-17 17:45:24,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3435020.0, ans=0.1 2024-08-17 17:45:35,821 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 17 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-17 17:45:39,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3435120.0, ans=0.0 2024-08-17 17:45:40,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3435120.0, ans=0.125 2024-08-17 17:45:57,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3435220.0, ans=0.125 2024-08-17 17:46:11,032 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 13 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-17 17:46:15,324 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 400, loss[loss=0.07989, beats_loss=0.01258, ecapa_loss=0.0001477, whisper_loss=0.06583, over 20349.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01028, ecapa_loss=0.000148, whisper_loss=0.09024, over 3332964.10 frames. ], batch size: 85, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:46:40,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3435520.0, ans=0.1 2024-08-17 17:46:51,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3435620.0, ans=0.0 2024-08-17 17:46:59,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3435720.0, ans=0.07 2024-08-17 17:47:09,066 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-17 17:47:16,500 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.21 vs. limit=15.0 2024-08-17 17:47:22,635 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.287e+01 2.548e+01 2.895e+01 1.655e+02, threshold=5.097e+01, percent-clipped=3.0 2024-08-17 17:47:23,943 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 450, loss[loss=0.1011, beats_loss=0.009512, ecapa_loss=0.0001514, whisper_loss=0.09009, over 15073.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01024, ecapa_loss=0.0001488, whisper_loss=0.09028, over 3441457.32 frames. ], batch size: 59, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:47:44,044 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.48 vs. limit=15.0 2024-08-17 17:47:46,050 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 25 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-17 17:48:06,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3436220.0, ans=0.2 2024-08-17 17:48:18,568 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.00 vs. limit=12.0 2024-08-17 17:48:19,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3436320.0, ans=0.1 2024-08-17 17:48:31,553 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 500, loss[loss=0.1039, beats_loss=0.009032, ecapa_loss=0.0001258, whisper_loss=0.09362, over 14527.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01025, ecapa_loss=0.0001493, whisper_loss=0.08996, over 3519682.76 frames. ], batch size: 56, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:48:44,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3436520.0, ans=0.125 2024-08-17 17:49:10,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3436620.0, ans=0.1 2024-08-17 17:49:14,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3436720.0, ans=0.125 2024-08-17 17:49:17,276 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.74 vs. limit=12.0 2024-08-17 17:49:21,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3436720.0, ans=0.2 2024-08-17 17:49:34,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3436820.0, ans=0.05 2024-08-17 17:49:39,165 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.385e+01 2.606e+01 2.957e+01 2.283e+02, threshold=5.212e+01, percent-clipped=2.0 2024-08-17 17:49:40,481 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 550, loss[loss=0.1008, beats_loss=0.0118, ecapa_loss=0.0001546, whisper_loss=0.08744, over 19725.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01032, ecapa_loss=0.0001488, whisper_loss=0.08992, over 3602486.43 frames. ], batch size: 77, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:49:43,587 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 28 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-17 17:49:49,550 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-17 17:50:24,282 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.16 vs. limit=15.0 2024-08-17 17:50:30,230 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.15 vs. limit=15.0 2024-08-17 17:50:32,827 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.31 vs. limit=15.0 2024-08-17 17:50:46,367 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 34 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-17 17:50:47,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3437320.0, ans=0.2 2024-08-17 17:50:50,283 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 600, loss[loss=0.1238, beats_loss=0.00814, ecapa_loss=0.0001538, whisper_loss=0.1142, over 20284.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01035, ecapa_loss=0.0001481, whisper_loss=0.0904, over 3676765.74 frames. ], batch size: 78, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:50:50,512 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-17 17:51:03,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3437520.0, ans=0.2 2024-08-17 17:51:09,752 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-17 17:51:22,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3437620.0, ans=0.125 2024-08-17 17:51:27,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3437620.0, ans=0.0 2024-08-17 17:51:33,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3437720.0, ans=0.1 2024-08-17 17:51:49,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3437820.0, ans=0.125 2024-08-17 17:51:51,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3437820.0, ans=0.0 2024-08-17 17:51:57,448 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.305e+01 2.570e+01 2.922e+01 6.139e+01, threshold=5.141e+01, percent-clipped=1.0 2024-08-17 17:51:58,855 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 650, loss[loss=0.09887, beats_loss=0.01194, ecapa_loss=0.0001146, whisper_loss=0.08578, over 20749.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01031, ecapa_loss=0.0001491, whisper_loss=0.09021, over 3679844.54 frames. ], batch size: 84, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:52:28,948 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 14 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-17 17:52:37,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3438120.0, ans=0.125 2024-08-17 17:52:37,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3438120.0, ans=0.125 2024-08-17 17:52:49,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3438220.0, ans=0.125 2024-08-17 17:53:00,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3438320.0, ans=0.0 2024-08-17 17:53:09,961 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 700, loss[loss=0.09859, beats_loss=0.01212, ecapa_loss=0.0001178, whisper_loss=0.0853, over 18587.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01037, ecapa_loss=0.0001483, whisper_loss=0.09039, over 3693633.11 frames. ], batch size: 70, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:53:26,858 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 15 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-17 17:53:34,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3438520.0, ans=0.125 2024-08-17 17:53:42,735 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 24 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-17 17:53:46,250 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.77 vs. limit=15.0 2024-08-17 17:53:48,443 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.54 vs. limit=15.0 2024-08-17 17:54:01,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3438720.0, ans=0.125 2024-08-17 17:54:06,570 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.73 vs. limit=15.0 2024-08-17 17:54:26,143 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.293e+01 2.490e+01 2.735e+01 3.624e+01, threshold=4.979e+01, percent-clipped=0.0 2024-08-17 17:54:27,679 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 750, loss[loss=0.08335, beats_loss=0.01229, ecapa_loss=0.0001481, whisper_loss=0.06957, over 16039.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01045, ecapa_loss=0.0001462, whisper_loss=0.08973, over 3707777.70 frames. ], batch size: 68, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:54:29,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3438920.0, ans=0.0 2024-08-17 17:54:44,437 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 32 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-17 17:55:17,785 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 20 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-17 17:55:24,247 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 20 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-17 17:55:35,857 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 35 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-17 17:55:45,654 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.36 vs. limit=15.0 2024-08-17 17:55:48,484 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 800, loss[loss=0.1168, beats_loss=0.01006, ecapa_loss=0.0001185, whisper_loss=0.1056, over 16628.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0105, ecapa_loss=0.0001461, whisper_loss=0.08966, over 3747726.55 frames. ], batch size: 63, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:56:06,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3439520.0, ans=0.0 2024-08-17 17:56:08,548 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3439520.0, ans=0.125 2024-08-17 17:56:10,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3439520.0, ans=0.125 2024-08-17 17:56:23,971 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-17 17:56:32,098 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 20 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-17 17:56:32,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3439620.0, ans=0.1 2024-08-17 17:56:36,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3439720.0, ans=0.125 2024-08-17 17:56:43,221 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.98 vs. limit=15.0 2024-08-17 17:56:45,055 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.23 vs. limit=22.5 2024-08-17 17:56:48,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3439720.0, ans=0.125 2024-08-17 17:56:51,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3439820.0, ans=0.0 2024-08-17 17:57:03,546 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.287e+01 2.519e+01 2.785e+01 3.931e+01, threshold=5.037e+01, percent-clipped=0.0 2024-08-17 17:57:05,176 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 850, loss[loss=0.08994, beats_loss=0.01238, ecapa_loss=0.0001476, whisper_loss=0.07607, over 21840.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01053, ecapa_loss=0.0001457, whisper_loss=0.08902, over 3732441.56 frames. ], batch size: 92, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:57:18,020 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-344000.pt 2024-08-17 17:57:53,091 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 11 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-17 17:58:09,894 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.43 vs. limit=10.0 2024-08-17 17:58:13,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3440220.0, ans=0.2 2024-08-17 17:58:41,822 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 37 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-17 17:58:51,123 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 900, loss[loss=0.1246, beats_loss=0.008928, ecapa_loss=0.0001657, whisper_loss=0.114, over 19821.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01053, ecapa_loss=0.0001456, whisper_loss=0.08924, over 3736756.10 frames. ], batch size: 77, lr: 2.58e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 17:58:58,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3440420.0, ans=0.0 2024-08-17 17:59:07,962 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3440420.0, ans=0.125 2024-08-17 17:59:12,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3440520.0, ans=0.0 2024-08-17 17:59:26,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3440620.0, ans=0.125 2024-08-17 17:59:38,099 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 32 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-17 17:59:50,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3440720.0, ans=0.125 2024-08-17 18:00:24,445 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 15 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-17 18:00:35,874 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.343e+01 2.504e+01 2.806e+01 4.204e+01, threshold=5.007e+01, percent-clipped=0.0 2024-08-17 18:00:35,896 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 950, loss[loss=0.06823, beats_loss=0.01218, ecapa_loss=0.0001391, whisper_loss=0.05466, over 14395.00 frames. ], tot_loss[loss=0.101, beats_loss=0.0106, ecapa_loss=0.0001446, whisper_loss=0.08895, over 3767485.47 frames. ], batch size: 57, lr: 2.58e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:00:40,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3440920.0, ans=0.125 2024-08-17 18:00:49,713 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-17 18:00:49,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3440920.0, ans=0.125 2024-08-17 18:01:24,476 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 22 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-17 18:01:58,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3441220.0, ans=0.125 2024-08-17 18:02:10,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3441320.0, ans=0.2 2024-08-17 18:02:30,951 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 1000, loss[loss=0.08738, beats_loss=0.01213, ecapa_loss=0.0001785, whisper_loss=0.07346, over 22612.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01056, ecapa_loss=0.0001454, whisper_loss=0.08897, over 3787771.39 frames. ], batch size: 92, lr: 2.58e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:02:55,045 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.98 vs. limit=15.0 2024-08-17 18:03:04,704 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 17 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-17 18:03:36,671 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 14 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-17 18:03:45,643 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 35 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-17 18:03:50,768 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 11 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-17 18:04:03,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3441820.0, ans=0.125 2024-08-17 18:04:07,058 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.48 vs. limit=15.0 2024-08-17 18:04:21,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3441820.0, ans=0.125 2024-08-17 18:04:26,149 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.946e+01 2.287e+01 2.499e+01 2.719e+01 4.151e+01, threshold=4.997e+01, percent-clipped=0.0 2024-08-17 18:04:26,171 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 1050, loss[loss=0.1072, beats_loss=0.007595, ecapa_loss=0.0002089, whisper_loss=0.09752, over 20285.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01049, ecapa_loss=0.000146, whisper_loss=0.08901, over 3778478.38 frames. ], batch size: 84, lr: 2.58e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:04:26,373 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 15 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-17 18:04:28,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3441920.0, ans=0.1 2024-08-17 18:04:30,013 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 16 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-17 18:04:42,088 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.579e-03 2024-08-17 18:04:43,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3441920.0, ans=0.125 2024-08-17 18:05:29,206 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 19 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-17 18:05:32,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3442220.0, ans=0.0 2024-08-17 18:05:53,144 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 1100, loss[loss=0.09348, beats_loss=0.01009, ecapa_loss=0.0001664, whisper_loss=0.08172, over 21633.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01053, ecapa_loss=0.0001451, whisper_loss=0.08949, over 3799429.24 frames. ], batch size: 91, lr: 2.58e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:05:53,285 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 22 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-17 18:05:55,369 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 18:06:24,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3442620.0, ans=0.035 2024-08-17 18:06:26,858 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 38 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-17 18:06:30,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3442620.0, ans=0.125 2024-08-17 18:06:44,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3442720.0, ans=0.2 2024-08-17 18:06:50,754 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-17 18:06:59,405 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 25 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-17 18:07:06,065 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.407e+01 2.705e+01 2.966e+01 4.079e+01, threshold=5.411e+01, percent-clipped=0.0 2024-08-17 18:07:06,088 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 1150, loss[loss=0.1145, beats_loss=0.01031, ecapa_loss=0.000176, whisper_loss=0.1024, over 22213.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01054, ecapa_loss=0.0001449, whisper_loss=0.08971, over 3831965.75 frames. ], batch size: 89, lr: 2.58e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:07:12,615 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.31 vs. limit=22.5 2024-08-17 18:07:25,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=3443020.0, ans=0.1 2024-08-17 18:07:28,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3443020.0, ans=0.0 2024-08-17 18:07:37,398 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 29 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-17 18:07:40,472 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 16 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-17 18:07:45,432 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 31 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-17 18:08:07,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3443320.0, ans=0.0 2024-08-17 18:08:07,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3443320.0, ans=0.1 2024-08-17 18:08:16,472 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 24 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-17 18:08:19,090 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 1200, loss[loss=0.07921, beats_loss=0.01141, ecapa_loss=0.0001626, whisper_loss=0.06617, over 15002.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01058, ecapa_loss=0.0001441, whisper_loss=0.0895, over 3824186.34 frames. ], batch size: 62, lr: 2.58e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:08:38,053 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.48 vs. limit=15.0 2024-08-17 18:08:39,798 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 11 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-17 18:08:43,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3443520.0, ans=0.125 2024-08-17 18:08:50,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3443620.0, ans=0.125 2024-08-17 18:09:17,624 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 12 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-17 18:09:33,374 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.226e+01 2.659e+01 3.142e+01 2.875e+02, threshold=5.318e+01, percent-clipped=2.0 2024-08-17 18:09:33,397 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 1250, loss[loss=0.08699, beats_loss=0.01171, ecapa_loss=0.000127, whisper_loss=0.07401, over 19693.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01066, ecapa_loss=0.0001439, whisper_loss=0.08903, over 3822835.80 frames. ], batch size: 76, lr: 2.58e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:09:55,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=3444020.0, ans=6.0 2024-08-17 18:10:04,134 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 24 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-17 18:10:16,027 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-17 18:10:32,640 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 18 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-17 18:10:35,400 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 15 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-17 18:10:46,326 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 35 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-17 18:10:48,854 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 1300, loss[loss=0.1044, beats_loss=0.009065, ecapa_loss=0.0001285, whisper_loss=0.09404, over 16987.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01066, ecapa_loss=0.000145, whisper_loss=0.0888, over 3821595.19 frames. ], batch size: 64, lr: 2.58e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:11:07,798 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-17 18:11:28,073 WARNING [optim.py:496] (0/4) Scaling gradients by 0.06072128564119339, model_norm_threshold=53.17913055419922 2024-08-17 18:11:28,249 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.2.out_combiner.bypass_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.036e+05, grad_sumsq=1.803e+05, orig_rms_sq=5.745e-01 2024-08-17 18:11:31,533 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.108e+00 2024-08-17 18:12:01,539 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 33 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-17 18:12:05,244 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.747e+01 2.210e+01 2.591e+01 3.011e+01 8.758e+02, threshold=5.182e+01, percent-clipped=3.0 2024-08-17 18:12:05,267 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 1350, loss[loss=0.0967, beats_loss=0.009153, ecapa_loss=0.0001573, whisper_loss=0.08597, over 15008.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.0107, ecapa_loss=0.0001445, whisper_loss=0.08803, over 3851895.47 frames. ], batch size: 57, lr: 2.58e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:12:14,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3444920.0, ans=0.1 2024-08-17 18:12:19,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3445020.0, ans=0.0 2024-08-17 18:12:19,997 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 21 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-17 18:12:34,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3445120.0, ans=0.0 2024-08-17 18:12:54,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=3445220.0, ans=15.0 2024-08-17 18:13:20,692 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 1400, loss[loss=0.1096, beats_loss=0.007903, ecapa_loss=0.0001474, whisper_loss=0.1002, over 18347.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01057, ecapa_loss=0.0001455, whisper_loss=0.08825, over 3832291.76 frames. ], batch size: 70, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:13:29,871 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 23 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-17 18:13:31,126 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 19 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-17 18:13:50,003 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.09 vs. limit=15.0 2024-08-17 18:13:56,796 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 20 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-17 18:14:07,771 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.81 vs. limit=15.0 2024-08-17 18:14:15,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=3445720.0, ans=0.2 2024-08-17 18:14:25,077 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 22 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-17 18:14:26,883 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.914e-01 2024-08-17 18:15:05,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3445920.0, ans=0.125 2024-08-17 18:15:06,379 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.243e+01 2.508e+01 2.795e+01 3.559e+01, threshold=5.016e+01, percent-clipped=0.0 2024-08-17 18:15:06,401 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 1450, loss[loss=0.09414, beats_loss=0.01079, ecapa_loss=0.000164, whisper_loss=0.08171, over 21107.00 frames. ], tot_loss[loss=0.09997, beats_loss=0.0106, ecapa_loss=0.0001445, whisper_loss=0.08792, over 3806545.84 frames. ], batch size: 89, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:15:15,018 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 22 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-17 18:15:18,187 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 14 from LS+wenet, 24 from Vox, 19 fro AS 2024-08-17 18:15:26,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3446020.0, ans=0.0 2024-08-17 18:15:36,495 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 20 from LS+wenet, 12 from Vox, 21 fro AS 2024-08-17 18:15:46,651 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 36 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-17 18:15:48,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3446120.0, ans=0.07 2024-08-17 18:15:54,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3446220.0, ans=0.125 2024-08-17 18:16:13,232 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 16 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-17 18:16:13,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3446320.0, ans=0.1 2024-08-17 18:16:20,927 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 1500, loss[loss=0.1008, beats_loss=0.008686, ecapa_loss=0.0001982, whisper_loss=0.09009, over 19107.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01057, ecapa_loss=0.0001447, whisper_loss=0.08842, over 3788340.10 frames. ], batch size: 77, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:16:37,059 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-17 18:16:37,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3446520.0, ans=0.1 2024-08-17 18:16:46,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3446520.0, ans=0.125 2024-08-17 18:16:51,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3446620.0, ans=0.125 2024-08-17 18:17:15,943 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-17 18:17:30,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3446820.0, ans=0.0 2024-08-17 18:17:30,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3446820.0, ans=0.125 2024-08-17 18:17:31,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3446820.0, ans=0.0 2024-08-17 18:17:35,771 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 29 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-17 18:17:36,524 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.72 vs. limit=6.0 2024-08-17 18:17:36,832 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.317e+01 2.500e+01 2.871e+01 1.026e+02, threshold=5.000e+01, percent-clipped=3.0 2024-08-17 18:17:36,855 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 1550, loss[loss=0.108, beats_loss=0.01092, ecapa_loss=0.0001203, whisper_loss=0.09584, over 21454.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01056, ecapa_loss=0.0001448, whisper_loss=0.08878, over 3781101.87 frames. ], batch size: 82, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:18:47,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3447320.0, ans=0.0 2024-08-17 18:18:51,578 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 1600, loss[loss=0.1237, beats_loss=0.01025, ecapa_loss=0.0001366, whisper_loss=0.1121, over 23082.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01048, ecapa_loss=0.0001448, whisper_loss=0.08964, over 3796988.95 frames. ], batch size: 92, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:18:51,734 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-17 18:18:53,868 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.54 vs. limit=15.0 2024-08-17 18:19:01,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3447420.0, ans=0.1 2024-08-17 18:19:19,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3447620.0, ans=0.125 2024-08-17 18:19:34,457 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-17 18:19:57,767 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-17 18:20:05,387 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.245e+01 2.501e+01 2.930e+01 4.153e+01, threshold=5.003e+01, percent-clipped=0.0 2024-08-17 18:20:05,409 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 1650, loss[loss=0.09316, beats_loss=0.009864, ecapa_loss=0.0001723, whisper_loss=0.08157, over 14751.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01048, ecapa_loss=0.0001444, whisper_loss=0.08969, over 3824842.26 frames. ], batch size: 59, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:20:06,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3447920.0, ans=0.0 2024-08-17 18:20:17,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3447920.0, ans=0.0 2024-08-17 18:20:50,918 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3448220.0, ans=0.0 2024-08-17 18:20:51,966 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-17 18:21:17,154 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 1700, loss[loss=0.09395, beats_loss=0.01349, ecapa_loss=0.0001242, whisper_loss=0.07921, over 22558.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01045, ecapa_loss=0.0001454, whisper_loss=0.09013, over 3845304.80 frames. ], batch size: 92, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:21:25,706 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 28 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-17 18:21:40,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3448520.0, ans=0.125 2024-08-17 18:21:48,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3448620.0, ans=0.125 2024-08-17 18:21:54,026 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.81 vs. limit=15.0 2024-08-17 18:22:01,752 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.81 vs. limit=22.5 2024-08-17 18:22:04,778 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.99 vs. limit=6.0 2024-08-17 18:22:08,578 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=5.096e-03 2024-08-17 18:22:09,006 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.76 vs. limit=12.0 2024-08-17 18:22:16,357 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3448820.0, ans=0.1 2024-08-17 18:22:22,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3448820.0, ans=0.2 2024-08-17 18:22:26,056 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.381e+01 2.629e+01 2.847e+01 4.282e+01, threshold=5.258e+01, percent-clipped=0.0 2024-08-17 18:22:26,079 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 1750, loss[loss=0.1117, beats_loss=0.009185, ecapa_loss=0.0001385, whisper_loss=0.1011, over 19231.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01048, ecapa_loss=0.0001453, whisper_loss=0.08978, over 3821146.06 frames. ], batch size: 74, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:22:30,176 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-17 18:22:34,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3448920.0, ans=0.125 2024-08-17 18:22:54,578 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-17 18:22:56,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3449120.0, ans=0.125 2024-08-17 18:23:03,414 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.86 vs. limit=10.0 2024-08-17 18:23:10,922 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-17 18:23:14,204 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.30 vs. limit=22.5 2024-08-17 18:23:33,293 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 1800, loss[loss=0.07852, beats_loss=0.01151, ecapa_loss=0.0001501, whisper_loss=0.06551, over 14868.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01055, ecapa_loss=0.0001449, whisper_loss=0.08874, over 3810569.36 frames. ], batch size: 57, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:23:43,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3449420.0, ans=0.1 2024-08-17 18:23:47,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3449520.0, ans=0.125 2024-08-17 18:23:54,341 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 24 from LS+wenet, 24 from Vox, 17 fro AS 2024-08-17 18:24:09,311 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 28 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-17 18:24:10,483 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-17 18:24:19,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3449720.0, ans=0.0 2024-08-17 18:24:19,694 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.44 vs. limit=12.0 2024-08-17 18:24:34,860 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 20 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-17 18:24:40,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3449820.0, ans=0.2 2024-08-17 18:24:42,542 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.197e+01 2.415e+01 2.703e+01 3.683e+01, threshold=4.830e+01, percent-clipped=0.0 2024-08-17 18:24:42,563 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 1850, loss[loss=0.08837, beats_loss=0.01138, ecapa_loss=0.0001773, whisper_loss=0.07521, over 20157.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01042, ecapa_loss=0.0001449, whisper_loss=0.08959, over 3826307.89 frames. ], batch size: 87, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:24:46,837 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.82 vs. limit=15.0 2024-08-17 18:24:51,254 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.74 vs. limit=15.0 2024-08-17 18:25:03,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3450020.0, ans=0.125 2024-08-17 18:25:08,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3450020.0, ans=0.0 2024-08-17 18:25:17,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3450120.0, ans=0.125 2024-08-17 18:25:32,377 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-17 18:25:42,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3450320.0, ans=0.0 2024-08-17 18:25:49,884 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 16 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-17 18:25:50,903 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 1900, loss[loss=0.09255, beats_loss=0.009279, ecapa_loss=0.0001314, whisper_loss=0.08195, over 14371.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01049, ecapa_loss=0.0001442, whisper_loss=0.08891, over 3800894.46 frames. ], batch size: 55, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:25:52,349 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-17 18:25:53,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3450420.0, ans=0.125 2024-08-17 18:26:00,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3450420.0, ans=0.025 2024-08-17 18:26:01,980 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-17 18:26:15,736 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 22 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-17 18:26:21,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3450620.0, ans=0.125 2024-08-17 18:26:28,020 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-17 18:26:31,692 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.54 vs. limit=15.0 2024-08-17 18:26:32,913 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-17 18:26:40,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=3450720.0, ans=0.025 2024-08-17 18:26:48,871 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.24 vs. limit=22.5 2024-08-17 18:26:55,309 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 15 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-17 18:26:59,092 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.266e+01 2.492e+01 2.719e+01 3.794e+02, threshold=4.984e+01, percent-clipped=0.0 2024-08-17 18:26:59,115 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 1950, loss[loss=0.09825, beats_loss=0.01822, ecapa_loss=0.0001562, whisper_loss=0.07847, over 19116.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01052, ecapa_loss=0.0001454, whisper_loss=0.08918, over 3797528.61 frames. ], batch size: 74, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:27:03,088 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 18 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-17 18:27:04,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3450920.0, ans=0.2 2024-08-17 18:27:17,817 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-17 18:27:20,979 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.28 vs. limit=15.0 2024-08-17 18:27:26,727 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.995e+01 2024-08-17 18:27:29,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3451120.0, ans=0.0 2024-08-17 18:27:42,306 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 16 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-17 18:27:46,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3451220.0, ans=0.2 2024-08-17 18:28:02,112 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 31 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-17 18:28:04,471 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 2000, loss[loss=0.1008, beats_loss=0.007736, ecapa_loss=0.0001617, whisper_loss=0.0914, over 17262.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01047, ecapa_loss=0.0001463, whisper_loss=0.08963, over 3800258.40 frames. ], batch size: 68, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:28:04,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3451420.0, ans=0.2 2024-08-17 18:28:06,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3451420.0, ans=0.1 2024-08-17 18:28:06,446 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.87 vs. limit=10.0 2024-08-17 18:28:09,980 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 15 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-17 18:28:16,165 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.18 vs. limit=6.0 2024-08-17 18:28:29,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3451520.0, ans=0.0 2024-08-17 18:28:30,150 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 25 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-17 18:28:42,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3451620.0, ans=0.125 2024-08-17 18:28:53,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3451720.0, ans=0.2 2024-08-17 18:29:02,315 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 18 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-17 18:29:03,524 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 17 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-17 18:29:10,628 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.08 vs. limit=15.0 2024-08-17 18:29:12,765 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.396e+01 2.689e+01 3.009e+01 4.514e+01, threshold=5.377e+01, percent-clipped=1.0 2024-08-17 18:29:12,783 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 2050, loss[loss=0.09624, beats_loss=0.009111, ecapa_loss=0.0001228, whisper_loss=0.0859, over 16611.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0105, ecapa_loss=0.0001462, whisper_loss=0.08931, over 3778270.81 frames. ], batch size: 63, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:29:20,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3451920.0, ans=6.0 2024-08-17 18:29:21,420 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.18 vs. limit=8.0 2024-08-17 18:29:21,964 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 26 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-17 18:29:33,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3452020.0, ans=0.0 2024-08-17 18:29:36,085 WARNING [optim.py:496] (0/4) Scaling gradients by 0.08474668860435486, model_norm_threshold=53.77110290527344 2024-08-17 18:29:36,247 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.0.norm.log_scale with proportion 0.19, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.640e+04, grad_sumsq=7.640e+04, orig_rms_sq=1.000e+00 2024-08-17 18:29:40,643 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.26 vs. limit=15.0 2024-08-17 18:29:51,512 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.09 vs. limit=10.0 2024-08-17 18:29:55,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3452220.0, ans=0.125 2024-08-17 18:30:06,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3452320.0, ans=0.125 2024-08-17 18:30:18,593 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 2100, loss[loss=0.1104, beats_loss=0.01051, ecapa_loss=0.0001533, whisper_loss=0.09834, over 23241.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01053, ecapa_loss=0.0001447, whisper_loss=0.08953, over 3773713.20 frames. ], batch size: 95, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:30:36,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3452520.0, ans=0.2 2024-08-17 18:30:36,264 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.10 vs. limit=22.5 2024-08-17 18:30:42,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3452520.0, ans=0.0 2024-08-17 18:30:44,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3452620.0, ans=0.125 2024-08-17 18:30:47,185 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-17 18:30:55,382 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.791e+05 2024-08-17 18:31:04,322 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 9 from Vox, 37 fro AS 2024-08-17 18:31:14,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3452820.0, ans=0.125 2024-08-17 18:31:23,641 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.347e+01 2.592e+01 2.946e+01 6.345e+02, threshold=5.183e+01, percent-clipped=4.0 2024-08-17 18:31:23,661 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 2150, loss[loss=0.12, beats_loss=0.009107, ecapa_loss=0.0001299, whisper_loss=0.1096, over 16809.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01061, ecapa_loss=0.000145, whisper_loss=0.08927, over 3761371.63 frames. ], batch size: 63, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:31:27,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3452920.0, ans=0.1 2024-08-17 18:31:33,119 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 22 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-17 18:31:43,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3453020.0, ans=0.0 2024-08-17 18:31:53,407 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 24 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-17 18:32:03,329 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 14 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-17 18:32:08,426 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 16 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-17 18:32:16,526 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 27 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-17 18:32:17,211 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.64 vs. limit=22.5 2024-08-17 18:32:20,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3453320.0, ans=0.2 2024-08-17 18:32:21,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3453320.0, ans=0.1 2024-08-17 18:32:27,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3453320.0, ans=0.0 2024-08-17 18:32:29,490 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 2200, loss[loss=0.1174, beats_loss=0.009511, ecapa_loss=0.0001918, whisper_loss=0.1059, over 22235.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01062, ecapa_loss=0.0001446, whisper_loss=0.08965, over 3764735.92 frames. ], batch size: 91, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:32:38,700 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 20 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-17 18:32:44,129 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 32 from Vox, 33 fro AS 2024-08-17 18:32:47,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3453520.0, ans=0.125 2024-08-17 18:32:54,386 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=6.24 vs. limit=12.0 2024-08-17 18:33:08,215 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-17 18:33:09,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3453720.0, ans=0.125 2024-08-17 18:33:09,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3453720.0, ans=0.125 2024-08-17 18:33:10,614 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 37 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-17 18:33:13,335 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 33 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-17 18:33:13,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3453720.0, ans=0.0 2024-08-17 18:33:21,075 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-17 18:33:33,993 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.333e+01 2.532e+01 2.820e+01 1.498e+02, threshold=5.063e+01, percent-clipped=1.0 2024-08-17 18:33:34,014 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 2250, loss[loss=0.05883, beats_loss=0.009921, ecapa_loss=0.0001638, whisper_loss=0.04727, over 13983.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01067, ecapa_loss=0.000144, whisper_loss=0.08967, over 3798653.84 frames. ], batch size: 55, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:33:35,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3453920.0, ans=0.015 2024-08-17 18:34:21,823 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 19 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-17 18:34:25,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3454320.0, ans=0.125 2024-08-17 18:34:33,530 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 20 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-17 18:34:36,439 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-17 18:34:40,475 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 2300, loss[loss=0.09493, beats_loss=0.01003, ecapa_loss=0.0001657, whisper_loss=0.08325, over 21821.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01057, ecapa_loss=0.0001457, whisper_loss=0.09046, over 3830203.24 frames. ], batch size: 91, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:34:55,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3454520.0, ans=0.0 2024-08-17 18:35:15,985 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 19 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-17 18:35:20,431 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 21 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-17 18:35:22,896 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 23 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-17 18:35:28,983 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.71 vs. limit=15.0 2024-08-17 18:35:44,020 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.39 vs. limit=22.5 2024-08-17 18:35:44,466 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.769e+01 2.369e+01 2.599e+01 2.912e+01 4.410e+01, threshold=5.198e+01, percent-clipped=0.0 2024-08-17 18:35:44,487 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 2350, loss[loss=0.09464, beats_loss=0.01427, ecapa_loss=0.0001221, whisper_loss=0.07915, over 19690.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01061, ecapa_loss=0.0001457, whisper_loss=0.09013, over 3829842.29 frames. ], batch size: 81, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:35:44,884 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 18:36:09,838 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2024-08-17 18:36:19,821 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.35 vs. limit=15.0 2024-08-17 18:36:32,430 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.73 vs. limit=15.0 2024-08-17 18:36:38,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3455320.0, ans=0.1 2024-08-17 18:36:44,163 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.39 vs. limit=15.0 2024-08-17 18:36:51,546 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 2400, loss[loss=0.1064, beats_loss=0.01135, ecapa_loss=0.0001072, whisper_loss=0.09399, over 24247.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01055, ecapa_loss=0.0001448, whisper_loss=0.09038, over 3838786.48 frames. ], batch size: 92, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:36:55,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3455420.0, ans=0.125 2024-08-17 18:36:57,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3455420.0, ans=0.2 2024-08-17 18:36:58,770 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-17 18:37:18,511 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.48 vs. limit=12.0 2024-08-17 18:37:27,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3455620.0, ans=0.125 2024-08-17 18:37:36,568 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 21 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-17 18:37:38,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3455720.0, ans=0.0 2024-08-17 18:37:49,232 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.32 vs. limit=6.0 2024-08-17 18:38:05,947 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.211e+01 2.408e+01 2.768e+01 3.443e+01, threshold=4.816e+01, percent-clipped=0.0 2024-08-17 18:38:05,967 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 2450, loss[loss=0.1188, beats_loss=0.006005, ecapa_loss=0.000168, whisper_loss=0.1112, over 18762.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01063, ecapa_loss=0.0001443, whisper_loss=0.08995, over 3871078.32 frames. ], batch size: 70, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:38:18,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3455920.0, ans=0.125 2024-08-17 18:38:19,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3455920.0, ans=0.2 2024-08-17 18:38:53,590 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 18:39:01,921 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.46 vs. limit=15.0 2024-08-17 18:39:12,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3456320.0, ans=0.2 2024-08-17 18:39:21,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3456320.0, ans=0.0 2024-08-17 18:39:24,831 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-17 18:39:26,363 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 2500, loss[loss=0.1058, beats_loss=0.01186, ecapa_loss=0.000137, whisper_loss=0.09259, over 23459.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01062, ecapa_loss=0.000144, whisper_loss=0.08935, over 3865739.67 frames. ], batch size: 92, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:39:29,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3456420.0, ans=0.1 2024-08-17 18:39:32,544 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 19 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-17 18:39:44,579 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 25 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-17 18:39:46,513 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 21 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-17 18:40:10,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3456720.0, ans=0.1 2024-08-17 18:40:13,615 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 33 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-17 18:40:15,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3456720.0, ans=0.0 2024-08-17 18:40:32,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3456820.0, ans=0.1 2024-08-17 18:40:41,777 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.780e+01 2.335e+01 2.460e+01 2.782e+01 3.981e+01, threshold=4.921e+01, percent-clipped=0.0 2024-08-17 18:40:41,796 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 2550, loss[loss=0.1272, beats_loss=0.009175, ecapa_loss=0.0001456, whisper_loss=0.1166, over 22174.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01051, ecapa_loss=0.0001443, whisper_loss=0.09034, over 3887587.31 frames. ], batch size: 84, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:41:12,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3457020.0, ans=0.125 2024-08-17 18:41:15,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3457120.0, ans=0.125 2024-08-17 18:41:26,919 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-17 18:42:01,946 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 2600, loss[loss=0.09451, beats_loss=0.01102, ecapa_loss=0.0001257, whisper_loss=0.08223, over 14645.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01047, ecapa_loss=0.0001446, whisper_loss=0.09072, over 3865193.63 frames. ], batch size: 57, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:42:15,575 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2024-08-17 18:42:19,420 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 13 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-17 18:42:27,568 WARNING [optim.py:496] (0/4) Scaling gradients by 0.08065144717693329, model_norm_threshold=49.205039978027344 2024-08-17 18:42:27,731 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.22, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.182e+04, grad_sumsq=8.182e+04, orig_rms_sq=1.000e+00 2024-08-17 18:42:46,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3457720.0, ans=0.125 2024-08-17 18:42:47,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3457720.0, ans=0.0 2024-08-17 18:42:47,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3457720.0, ans=0.0 2024-08-17 18:42:54,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3457720.0, ans=0.125 2024-08-17 18:43:14,409 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.998e+01 2.414e+01 2.571e+01 2.892e+01 6.101e+02, threshold=5.143e+01, percent-clipped=1.0 2024-08-17 18:43:14,431 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 2650, loss[loss=0.07354, beats_loss=0.01323, ecapa_loss=0.0001401, whisper_loss=0.05891, over 15523.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01056, ecapa_loss=0.0001457, whisper_loss=0.09005, over 3865258.14 frames. ], batch size: 65, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:43:17,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3457920.0, ans=0.1 2024-08-17 18:43:21,145 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.97 vs. limit=22.5 2024-08-17 18:43:30,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3458020.0, ans=0.125 2024-08-17 18:43:34,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3458020.0, ans=0.125 2024-08-17 18:43:41,466 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 27 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-17 18:43:44,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3458120.0, ans=0.0 2024-08-17 18:43:47,080 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-17 18:43:55,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3458220.0, ans=0.125 2024-08-17 18:44:11,169 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-17 18:44:25,228 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 2700, loss[loss=0.1073, beats_loss=0.01132, ecapa_loss=0.0001297, whisper_loss=0.09473, over 21338.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01059, ecapa_loss=0.0001459, whisper_loss=0.09041, over 3911374.96 frames. ], batch size: 86, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:44:28,341 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 18 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-17 18:44:32,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3458420.0, ans=0.0 2024-08-17 18:44:35,911 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.28 vs. limit=15.0 2024-08-17 18:44:39,833 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 20 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-17 18:44:41,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3458520.0, ans=0.07 2024-08-17 18:44:54,109 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-17 18:44:59,359 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 23 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-17 18:45:21,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3458820.0, ans=0.125 2024-08-17 18:45:31,944 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 30 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-17 18:45:33,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3458820.0, ans=0.5 2024-08-17 18:45:37,409 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.308e+01 2.578e+01 2.796e+01 3.722e+01, threshold=5.156e+01, percent-clipped=0.0 2024-08-17 18:45:37,428 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 2750, loss[loss=0.09211, beats_loss=0.0118, ecapa_loss=0.0001305, whisper_loss=0.07901, over 22751.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01059, ecapa_loss=0.0001453, whisper_loss=0.09039, over 3910657.45 frames. ], batch size: 90, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:45:38,933 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 20 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-17 18:45:40,821 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.90 vs. limit=15.0 2024-08-17 18:45:42,473 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.52 vs. limit=22.5 2024-08-17 18:45:56,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3459020.0, ans=0.0 2024-08-17 18:46:20,152 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-17 18:46:21,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3459220.0, ans=0.0 2024-08-17 18:46:42,139 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-17 18:46:48,795 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 2800, loss[loss=0.09801, beats_loss=0.01113, ecapa_loss=0.0001635, whisper_loss=0.08525, over 13330.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01061, ecapa_loss=0.0001459, whisper_loss=0.09087, over 3879327.93 frames. ], batch size: 54, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:47:04,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3459520.0, ans=0.125 2024-08-17 18:47:04,537 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.55 vs. limit=15.0 2024-08-17 18:47:14,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3459520.0, ans=0.125 2024-08-17 18:47:17,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3459620.0, ans=0.2 2024-08-17 18:47:23,951 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-17 18:47:35,139 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-17 18:47:50,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3459820.0, ans=0.125 2024-08-17 18:48:02,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3459820.0, ans=0.1 2024-08-17 18:48:04,516 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.294e+01 2.641e+01 2.883e+01 4.874e+01, threshold=5.283e+01, percent-clipped=0.0 2024-08-17 18:48:04,535 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 2850, loss[loss=0.1181, beats_loss=0.00819, ecapa_loss=0.0001758, whisper_loss=0.1082, over 19659.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01057, ecapa_loss=0.0001452, whisper_loss=0.09142, over 3901688.99 frames. ], batch size: 79, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:48:11,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3459920.0, ans=0.125 2024-08-17 18:48:21,006 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-17 18:48:21,648 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.75 vs. limit=15.0 2024-08-17 18:48:24,168 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.11 vs. limit=15.0 2024-08-17 18:48:31,893 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 23 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-17 18:48:32,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3460020.0, ans=10.0 2024-08-17 18:49:08,141 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 14 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-17 18:49:21,708 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 2900, loss[loss=0.127, beats_loss=0.01056, ecapa_loss=0.0001309, whisper_loss=0.1151, over 16697.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01054, ecapa_loss=0.0001461, whisper_loss=0.09146, over 3860874.50 frames. ], batch size: 62, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 18:49:29,950 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.18 vs. limit=12.0 2024-08-17 18:49:35,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3460420.0, ans=0.1 2024-08-17 18:49:45,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3460520.0, ans=0.1 2024-08-17 18:49:54,315 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-17 18:50:04,574 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 15 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-17 18:50:35,138 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.046e+01 2.363e+01 2.529e+01 2.876e+01 1.646e+02, threshold=5.058e+01, percent-clipped=2.0 2024-08-17 18:50:35,157 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 2950, loss[loss=0.09584, beats_loss=0.01123, ecapa_loss=0.000157, whisper_loss=0.08305, over 22161.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01054, ecapa_loss=0.0001483, whisper_loss=0.09084, over 3837294.40 frames. ], batch size: 91, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 18:50:35,342 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 30 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-17 18:50:42,300 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=2.518e-03 2024-08-17 18:50:46,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3460920.0, ans=0.125 2024-08-17 18:50:52,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3461020.0, ans=0.125 2024-08-17 18:51:05,307 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.45 vs. limit=22.5 2024-08-17 18:51:11,222 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 24 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-17 18:51:12,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3461120.0, ans=0.05 2024-08-17 18:51:14,631 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.97 vs. limit=22.5 2024-08-17 18:51:24,427 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.35 vs. limit=6.0 2024-08-17 18:51:34,955 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 16 from Vox, 48 fro AS 2024-08-17 18:51:36,887 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.03 vs. limit=15.0 2024-08-17 18:51:41,678 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 20 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-17 18:51:44,137 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 3000, loss[loss=0.1167, beats_loss=0.009154, ecapa_loss=0.0001553, whisper_loss=0.106, over 23015.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01056, ecapa_loss=0.0001468, whisper_loss=0.09095, over 3874627.43 frames. ], batch size: 93, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 18:51:44,139 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-17 18:52:21,629 INFO [train_multi_KD3.py:1149] (0/4) Epoch 24, validation on ASR_libri: loss=0.2525, beats_loss=0, ecapa_loss=0.0005269, whisper_loss=0.2472, over 922467.00 frames. 2024-08-17 18:52:33,318 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1917, 4.7476, 5.0298, 5.1285], device='cuda:0') 2024-08-17 18:52:37,988 INFO [train_multi_KD3.py:1149] (0/4) Epoch 24, validation on SV_voxceleb1: loss=0.00404, beats_loss=0, ecapa_loss=0.000404, whisper_loss=0, over 939242.00 frames. 2024-08-17 18:53:22,621 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.7151, 2.4816, 2.5707, 2.4154], device='cuda:0') 2024-08-17 18:54:27,941 INFO [train_multi_KD3.py:1149] (0/4) Epoch 24, validation on AT_audioset: loss=0.02333, beats_loss=0.02333, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-17 18:54:27,946 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32389MB 2024-08-17 18:54:59,406 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 13 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-17 18:55:04,236 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-17 18:55:04,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3461620.0, ans=0.2 2024-08-17 18:55:31,360 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-17 18:55:38,518 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.375e+01 2.603e+01 2.812e+01 5.666e+01, threshold=5.206e+01, percent-clipped=2.0 2024-08-17 18:55:38,536 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 3050, loss[loss=0.13, beats_loss=0.008174, ecapa_loss=0.0001369, whisper_loss=0.1205, over 19248.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01061, ecapa_loss=0.0001475, whisper_loss=0.09038, over 3877080.31 frames. ], batch size: 73, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 18:55:44,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3461920.0, ans=0.125 2024-08-17 18:55:47,485 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.77 vs. limit=10.0 2024-08-17 18:55:48,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3461920.0, ans=0.1 2024-08-17 18:56:05,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3462120.0, ans=0.2 2024-08-17 18:56:17,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3462220.0, ans=0.0 2024-08-17 18:56:21,740 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-17 18:56:25,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3462220.0, ans=0.2 2024-08-17 18:56:42,184 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 24 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-17 18:56:45,921 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 3100, loss[loss=0.1245, beats_loss=0.007984, ecapa_loss=0.0001317, whisper_loss=0.1152, over 14931.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01061, ecapa_loss=0.0001484, whisper_loss=0.09119, over 3893132.04 frames. ], batch size: 55, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 18:56:51,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3462420.0, ans=0.125 2024-08-17 18:57:04,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3462520.0, ans=0.1 2024-08-17 18:57:17,624 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 21 from LS+wenet, 39 from Vox, 28 fro AS 2024-08-17 18:57:21,962 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3462620.0, ans=0.1 2024-08-17 18:57:39,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3462820.0, ans=0.125 2024-08-17 18:57:41,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3462820.0, ans=0.0 2024-08-17 18:57:50,156 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-17 18:57:51,524 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 16 from LS+wenet, 25 from Vox, 19 fro AS 2024-08-17 18:57:54,019 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.850e+01 2.306e+01 2.653e+01 2.984e+01 6.724e+01, threshold=5.307e+01, percent-clipped=1.0 2024-08-17 18:57:54,038 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 3150, loss[loss=0.1178, beats_loss=0.009948, ecapa_loss=0.0001327, whisper_loss=0.1066, over 15149.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01065, ecapa_loss=0.0001487, whisper_loss=0.091, over 3884877.11 frames. ], batch size: 58, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 18:58:06,434 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2024-08-17 18:58:24,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3463120.0, ans=0.0 2024-08-17 18:58:28,174 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 26 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-17 18:59:03,812 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 3200, loss[loss=0.1123, beats_loss=0.008555, ecapa_loss=0.0001583, whisper_loss=0.1021, over 16261.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01067, ecapa_loss=0.0001485, whisper_loss=0.09104, over 3881289.08 frames. ], batch size: 63, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 18:59:13,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3463420.0, ans=0.125 2024-08-17 18:59:20,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3463520.0, ans=0.125 2024-08-17 18:59:22,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3463520.0, ans=0.0 2024-08-17 18:59:36,502 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.88 vs. limit=15.0 2024-08-17 18:59:38,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3463620.0, ans=0.1 2024-08-17 18:59:44,135 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-17 18:59:51,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3463720.0, ans=0.1 2024-08-17 18:59:51,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3463720.0, ans=0.0 2024-08-17 18:59:54,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3463720.0, ans=0.125 2024-08-17 18:59:59,911 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-17 19:00:09,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3463920.0, ans=0.1 2024-08-17 19:00:10,528 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.969e+01 2.338e+01 2.602e+01 2.900e+01 3.800e+01, threshold=5.204e+01, percent-clipped=0.0 2024-08-17 19:00:10,547 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 3250, loss[loss=0.0853, beats_loss=0.01251, ecapa_loss=0.0001226, whisper_loss=0.07157, over 14825.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01059, ecapa_loss=0.0001492, whisper_loss=0.0913, over 3878570.56 frames. ], batch size: 59, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:00:26,767 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 22 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-17 19:00:32,472 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.16 vs. limit=12.0 2024-08-17 19:00:34,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3464020.0, ans=0.1 2024-08-17 19:00:45,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3464120.0, ans=0.125 2024-08-17 19:01:02,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3464320.0, ans=0.2 2024-08-17 19:01:07,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3464320.0, ans=0.125 2024-08-17 19:01:16,086 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 3300, loss[loss=0.1301, beats_loss=0.01046, ecapa_loss=0.000146, whisper_loss=0.1182, over 19217.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01066, ecapa_loss=0.000149, whisper_loss=0.09053, over 3869348.60 frames. ], batch size: 75, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:01:19,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3464420.0, ans=0.125 2024-08-17 19:01:27,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3464420.0, ans=0.0 2024-08-17 19:01:32,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3464520.0, ans=0.125 2024-08-17 19:01:33,119 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.04 vs. limit=10.0 2024-08-17 19:01:37,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3464520.0, ans=0.125 2024-08-17 19:01:59,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3464720.0, ans=0.125 2024-08-17 19:02:06,379 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-17 19:02:12,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3464820.0, ans=0.0 2024-08-17 19:02:14,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3464820.0, ans=0.0 2024-08-17 19:02:15,853 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 28 from LS+wenet, 16 from Vox, 17 fro AS 2024-08-17 19:02:22,172 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.671e+01 2.198e+01 2.485e+01 2.907e+01 4.364e+01, threshold=4.970e+01, percent-clipped=0.0 2024-08-17 19:02:22,191 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 3350, loss[loss=0.1062, beats_loss=0.0101, ecapa_loss=0.0001802, whisper_loss=0.09431, over 21135.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01069, ecapa_loss=0.000147, whisper_loss=0.09045, over 3866984.34 frames. ], batch size: 90, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:02:29,625 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.94 vs. limit=15.0 2024-08-17 19:02:42,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3465020.0, ans=0.125 2024-08-17 19:02:43,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=3465020.0, ans=0.2 2024-08-17 19:03:07,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3465220.0, ans=0.0 2024-08-17 19:03:15,674 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.16 vs. limit=12.0 2024-08-17 19:03:15,860 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.61 vs. limit=15.0 2024-08-17 19:03:19,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3465320.0, ans=0.0 2024-08-17 19:03:28,361 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 3400, loss[loss=0.1047, beats_loss=0.008813, ecapa_loss=0.0001826, whisper_loss=0.09403, over 21424.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01061, ecapa_loss=0.0001478, whisper_loss=0.09056, over 3884323.51 frames. ], batch size: 89, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:03:50,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3465520.0, ans=0.2 2024-08-17 19:04:15,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3465720.0, ans=0.0 2024-08-17 19:04:27,222 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-17 19:04:33,203 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.314e+01 2.533e+01 2.860e+01 4.018e+01, threshold=5.066e+01, percent-clipped=0.0 2024-08-17 19:04:33,222 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 3450, loss[loss=0.1211, beats_loss=0.007011, ecapa_loss=0.0001485, whisper_loss=0.1126, over 17375.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01056, ecapa_loss=0.0001475, whisper_loss=0.09116, over 3881989.79 frames. ], batch size: 65, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:04:54,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3466020.0, ans=0.125 2024-08-17 19:04:55,241 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.31 vs. limit=12.0 2024-08-17 19:05:21,579 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.12 vs. limit=15.0 2024-08-17 19:05:23,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3466220.0, ans=0.0 2024-08-17 19:05:32,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3466320.0, ans=0.125 2024-08-17 19:05:34,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3466320.0, ans=0.0 2024-08-17 19:05:39,549 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 3500, loss[loss=0.07102, beats_loss=0.01361, ecapa_loss=0.0001669, whisper_loss=0.05575, over 20473.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01056, ecapa_loss=0.0001476, whisper_loss=0.09072, over 3878621.10 frames. ], batch size: 88, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:05:43,617 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-17 19:05:48,045 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-17 19:06:16,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3466620.0, ans=0.125 2024-08-17 19:06:21,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3466720.0, ans=0.1 2024-08-17 19:06:24,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3466720.0, ans=0.125 2024-08-17 19:06:24,943 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3466720.0, ans=0.125 2024-08-17 19:06:27,112 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 21 from LS+wenet, 22 from Vox, 52 fro AS 2024-08-17 19:06:43,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3466820.0, ans=0.2 2024-08-17 19:06:45,341 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.305e+01 2.521e+01 2.842e+01 7.889e+01, threshold=5.042e+01, percent-clipped=2.0 2024-08-17 19:06:45,360 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 3550, loss[loss=0.1024, beats_loss=0.01116, ecapa_loss=0.0001264, whisper_loss=0.09001, over 14063.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01057, ecapa_loss=0.000147, whisper_loss=0.09068, over 3855895.47 frames. ], batch size: 54, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:06:48,714 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.53 vs. limit=22.5 2024-08-17 19:06:51,253 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-17 19:07:04,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3467020.0, ans=0.0 2024-08-17 19:07:08,050 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-17 19:07:11,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3467120.0, ans=0.125 2024-08-17 19:07:18,344 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.08 vs. limit=10.0 2024-08-17 19:07:26,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3467220.0, ans=0.09899494936611666 2024-08-17 19:07:36,408 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-17 19:07:46,172 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.71 vs. limit=10.0 2024-08-17 19:07:49,662 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 35 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-17 19:07:52,175 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 3600, loss[loss=0.09962, beats_loss=0.009885, ecapa_loss=0.0001778, whisper_loss=0.08795, over 14675.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01055, ecapa_loss=0.0001478, whisper_loss=0.09068, over 3860323.09 frames. ], batch size: 59, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:08:00,630 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 20 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-17 19:08:40,477 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-17 19:08:40,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3467720.0, ans=0.0 2024-08-17 19:08:51,649 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-17 19:08:53,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3467820.0, ans=0.125 2024-08-17 19:08:55,106 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.286e+01 2.581e+01 2.801e+01 4.273e+01, threshold=5.162e+01, percent-clipped=0.0 2024-08-17 19:08:55,125 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 3650, loss[loss=0.08405, beats_loss=0.01358, ecapa_loss=0.0001038, whisper_loss=0.06944, over 23247.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01047, ecapa_loss=0.0001476, whisper_loss=0.09126, over 3848317.09 frames. ], batch size: 92, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:09:10,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3468020.0, ans=0.125 2024-08-17 19:09:19,897 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-17 19:09:20,390 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.00 vs. limit=15.0 2024-08-17 19:09:21,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3468120.0, ans=0.05 2024-08-17 19:09:27,324 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-17 19:09:34,894 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 15 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-17 19:09:46,382 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 23 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-17 19:09:57,688 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 3700, loss[loss=0.1089, beats_loss=0.01022, ecapa_loss=0.0001017, whisper_loss=0.09767, over 19709.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01046, ecapa_loss=0.0001474, whisper_loss=0.09076, over 3845106.42 frames. ], batch size: 71, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:10:00,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3468420.0, ans=0.125 2024-08-17 19:10:05,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3468420.0, ans=0.125 2024-08-17 19:10:11,514 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 21 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-17 19:10:17,782 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 20 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-17 19:10:23,445 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 34 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-17 19:10:30,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3468620.0, ans=0.07 2024-08-17 19:10:54,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3468820.0, ans=0.125 2024-08-17 19:11:02,327 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.199e+01 2.477e+01 2.865e+01 5.326e+01, threshold=4.955e+01, percent-clipped=1.0 2024-08-17 19:11:02,347 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 3750, loss[loss=0.0967, beats_loss=0.01169, ecapa_loss=0.0001403, whisper_loss=0.0836, over 22227.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01054, ecapa_loss=0.0001468, whisper_loss=0.09045, over 3835172.42 frames. ], batch size: 91, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:11:05,665 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.31 vs. limit=22.5 2024-08-17 19:11:07,530 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-17 19:11:17,040 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.11 vs. limit=6.0 2024-08-17 19:11:18,034 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.66 vs. limit=12.0 2024-08-17 19:11:30,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3469120.0, ans=0.125 2024-08-17 19:11:31,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3469120.0, ans=0.2 2024-08-17 19:11:49,371 WARNING [optim.py:496] (0/4) Scaling gradients by 0.09380995482206345, model_norm_threshold=49.54741287231445 2024-08-17 19:11:49,534 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.23, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.469e+04, grad_sumsq=6.357e+06, orig_rms_sq=1.018e-02 2024-08-17 19:12:00,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3469320.0, ans=0.0 2024-08-17 19:12:07,360 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 3800, loss[loss=0.09909, beats_loss=0.01064, ecapa_loss=0.000127, whisper_loss=0.08718, over 16271.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01054, ecapa_loss=0.0001471, whisper_loss=0.09024, over 3862953.57 frames. ], batch size: 66, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:12:12,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3469420.0, ans=0.125 2024-08-17 19:12:16,444 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-17 19:12:18,207 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.72 vs. limit=22.5 2024-08-17 19:12:24,907 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.96 vs. limit=15.0 2024-08-17 19:12:29,992 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-17 19:12:38,638 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-17 19:12:50,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3469720.0, ans=0.1 2024-08-17 19:12:52,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3469720.0, ans=0.035 2024-08-17 19:12:52,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3469720.0, ans=0.125 2024-08-17 19:12:58,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3469720.0, ans=0.125 2024-08-17 19:13:16,150 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.982e+01 2.363e+01 2.589e+01 3.155e+01 5.282e+02, threshold=5.179e+01, percent-clipped=3.0 2024-08-17 19:13:16,170 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 3850, loss[loss=0.1127, beats_loss=0.009968, ecapa_loss=0.0001614, whisper_loss=0.1011, over 22284.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01061, ecapa_loss=0.0001468, whisper_loss=0.08996, over 3864990.75 frames. ], batch size: 88, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:13:21,605 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 22 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-17 19:13:21,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3469920.0, ans=0.125 2024-08-17 19:13:48,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3470120.0, ans=0.0 2024-08-17 19:14:01,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3470220.0, ans=0.0 2024-08-17 19:14:18,174 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-17 19:14:27,573 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 3900, loss[loss=0.07059, beats_loss=0.0128, ecapa_loss=0.0001152, whisper_loss=0.05664, over 16911.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01055, ecapa_loss=0.0001474, whisper_loss=0.09038, over 3860384.11 frames. ], batch size: 68, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:14:30,522 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.00 vs. limit=15.0 2024-08-17 19:14:44,861 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 21 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-17 19:14:49,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3470520.0, ans=0.0 2024-08-17 19:14:53,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3470620.0, ans=0.125 2024-08-17 19:15:02,219 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.09 vs. limit=15.0 2024-08-17 19:15:08,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3470720.0, ans=0.1 2024-08-17 19:15:11,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3470720.0, ans=0.0 2024-08-17 19:15:26,238 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.17 vs. limit=12.0 2024-08-17 19:15:35,747 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.756e+01 2.330e+01 2.586e+01 2.887e+01 6.168e+01, threshold=5.173e+01, percent-clipped=1.0 2024-08-17 19:15:35,766 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 3950, loss[loss=0.09971, beats_loss=0.01171, ecapa_loss=0.0001547, whisper_loss=0.08645, over 22716.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01058, ecapa_loss=0.0001483, whisper_loss=0.09049, over 3897698.72 frames. ], batch size: 92, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:16:10,357 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 19:16:20,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3471220.0, ans=0.2 2024-08-17 19:16:24,415 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 24 from LS+wenet, 25 from Vox, 20 fro AS 2024-08-17 19:16:36,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3471320.0, ans=0.125 2024-08-17 19:16:37,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3471320.0, ans=0.125 2024-08-17 19:16:43,063 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 4000, loss[loss=0.1122, beats_loss=0.009256, ecapa_loss=0.0001397, whisper_loss=0.1015, over 19435.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01052, ecapa_loss=0.0001486, whisper_loss=0.09102, over 3865451.78 frames. ], batch size: 72, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:16:48,252 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-17 19:16:49,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3471420.0, ans=0.0 2024-08-17 19:16:52,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3471420.0, ans=0.1 2024-08-17 19:17:07,962 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 36 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-17 19:17:12,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3471620.0, ans=0.125 2024-08-17 19:17:16,269 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 24 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-17 19:17:18,856 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-17 19:17:21,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3471720.0, ans=0.125 2024-08-17 19:17:24,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3471720.0, ans=0.125 2024-08-17 19:17:27,537 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 24 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-17 19:17:35,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3471820.0, ans=0.1 2024-08-17 19:17:37,060 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.49 vs. limit=15.0 2024-08-17 19:17:41,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3471820.0, ans=0.0 2024-08-17 19:17:44,912 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.303e+01 2.468e+01 2.762e+01 4.613e+01, threshold=4.936e+01, percent-clipped=0.0 2024-08-17 19:17:44,931 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 4050, loss[loss=0.1137, beats_loss=0.01257, ecapa_loss=0.0001156, whisper_loss=0.1, over 17226.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01052, ecapa_loss=0.0001486, whisper_loss=0.09089, over 3867143.31 frames. ], batch size: 67, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:17:48,268 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.89 vs. limit=22.5 2024-08-17 19:17:50,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3471920.0, ans=0.1 2024-08-17 19:18:01,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3472020.0, ans=0.015 2024-08-17 19:18:17,731 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-17 19:18:19,746 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.56 vs. limit=15.0 2024-08-17 19:18:21,530 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-17 19:18:21,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3472220.0, ans=0.015 2024-08-17 19:18:47,903 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 4100, loss[loss=0.1103, beats_loss=0.009368, ecapa_loss=0.0001331, whisper_loss=0.09964, over 16146.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01057, ecapa_loss=0.000148, whisper_loss=0.09101, over 3857113.40 frames. ], batch size: 61, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:18:54,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3472420.0, ans=0.0 2024-08-17 19:18:55,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3472420.0, ans=0.125 2024-08-17 19:19:00,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3472520.0, ans=0.07 2024-08-17 19:19:00,812 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.30 vs. limit=22.5 2024-08-17 19:19:01,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3472520.0, ans=0.125 2024-08-17 19:19:11,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3472620.0, ans=0.0 2024-08-17 19:19:26,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3472720.0, ans=0.0 2024-08-17 19:19:33,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3472720.0, ans=0.125 2024-08-17 19:19:43,306 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.48 vs. limit=15.0 2024-08-17 19:19:47,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3472820.0, ans=0.2 2024-08-17 19:19:49,837 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.275e+01 2.466e+01 2.807e+01 1.119e+02, threshold=4.931e+01, percent-clipped=1.0 2024-08-17 19:19:49,855 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 4150, loss[loss=0.1133, beats_loss=0.01136, ecapa_loss=0.0001069, whisper_loss=0.1008, over 24383.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0106, ecapa_loss=0.0001479, whisper_loss=0.09103, over 3900773.74 frames. ], batch size: 91, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:19:57,483 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-17 19:20:19,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3473120.0, ans=0.0 2024-08-17 19:20:51,337 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-17 19:20:52,354 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 4200, loss[loss=0.1066, beats_loss=0.01103, ecapa_loss=0.0001438, whisper_loss=0.09409, over 18989.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01054, ecapa_loss=0.000148, whisper_loss=0.09105, over 3913933.24 frames. ], batch size: 78, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:20:54,542 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.57 vs. limit=10.0 2024-08-17 19:21:23,854 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 26 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-17 19:21:43,101 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.00 vs. limit=15.0 2024-08-17 19:21:52,774 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 25 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-17 19:21:55,001 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.224e+01 2.485e+01 2.719e+01 4.060e+01, threshold=4.970e+01, percent-clipped=0.0 2024-08-17 19:21:55,022 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 4250, loss[loss=0.08736, beats_loss=0.01402, ecapa_loss=0.0001116, whisper_loss=0.07223, over 15078.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01061, ecapa_loss=0.0001475, whisper_loss=0.09063, over 3911852.96 frames. ], batch size: 59, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:21:57,891 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3473920.0, ans=0.0 2024-08-17 19:22:01,327 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 27 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-17 19:22:02,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3473920.0, ans=0.125 2024-08-17 19:22:04,073 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-17 19:22:05,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3473920.0, ans=0.0 2024-08-17 19:22:32,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3474220.0, ans=0.1 2024-08-17 19:22:34,864 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.09 vs. limit=15.0 2024-08-17 19:22:35,491 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 16 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-17 19:22:35,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3474220.0, ans=0.2 2024-08-17 19:22:41,964 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 20 from LS+wenet, 17 from Vox, 53 fro AS 2024-08-17 19:22:57,997 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 4300, loss[loss=0.08563, beats_loss=0.01029, ecapa_loss=0.0001749, whisper_loss=0.07359, over 16946.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01064, ecapa_loss=0.000148, whisper_loss=0.08977, over 3869232.98 frames. ], batch size: 70, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:23:03,204 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 31 from Vox, 29 fro AS 2024-08-17 19:23:03,801 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.20 vs. limit=22.5 2024-08-17 19:23:04,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3474420.0, ans=0.035 2024-08-17 19:23:29,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3474620.0, ans=0.0 2024-08-17 19:23:48,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3474820.0, ans=0.0 2024-08-17 19:23:52,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3474820.0, ans=0.0 2024-08-17 19:23:55,805 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-17 19:23:57,682 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.54 vs. limit=10.0 2024-08-17 19:24:01,019 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.274e+01 2.524e+01 2.891e+01 1.236e+02, threshold=5.048e+01, percent-clipped=2.0 2024-08-17 19:24:01,038 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 4350, loss[loss=0.112, beats_loss=0.009758, ecapa_loss=0.0001496, whisper_loss=0.1008, over 21320.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01053, ecapa_loss=0.0001487, whisper_loss=0.09016, over 3873635.84 frames. ], batch size: 81, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:24:15,851 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 13 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-17 19:24:31,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3475120.0, ans=0.0 2024-08-17 19:24:58,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3475320.0, ans=0.125 2024-08-17 19:25:03,552 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 4400, loss[loss=0.08481, beats_loss=0.01299, ecapa_loss=0.0001235, whisper_loss=0.07058, over 16236.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01055, ecapa_loss=0.0001483, whisper_loss=0.0902, over 3888973.81 frames. ], batch size: 66, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:25:11,994 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 19 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-17 19:25:12,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3475420.0, ans=0.125 2024-08-17 19:25:19,685 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 30 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-17 19:25:20,902 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-17 19:25:24,193 WARNING [optim.py:496] (0/4) Scaling gradients by 0.0026171719655394554, model_norm_threshold=50.480224609375 2024-08-17 19:25:24,356 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.45, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.667e+08, grad_sumsq=1.636e+10, orig_rms_sq=1.019e-02 2024-08-17 19:25:32,949 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-17 19:25:38,674 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2024-08-17 19:25:43,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3475720.0, ans=0.2 2024-08-17 19:26:04,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3475920.0, ans=0.2 2024-08-17 19:26:05,663 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.732e+01 2.336e+01 2.604e+01 2.872e+01 1.929e+04, threshold=5.209e+01, percent-clipped=1.0 2024-08-17 19:26:05,682 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 4450, loss[loss=0.09329, beats_loss=0.008989, ecapa_loss=0.000187, whisper_loss=0.08244, over 15311.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0105, ecapa_loss=0.0001485, whisper_loss=0.09008, over 3862426.97 frames. ], batch size: 63, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:26:09,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3475920.0, ans=0.0 2024-08-17 19:26:16,243 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-17 19:26:16,871 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.69 vs. limit=15.0 2024-08-17 19:26:22,281 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-17 19:26:30,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3476120.0, ans=0.2 2024-08-17 19:26:35,279 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.77 vs. limit=10.0 2024-08-17 19:26:38,897 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.16 vs. limit=10.0 2024-08-17 19:26:39,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3476120.0, ans=0.125 2024-08-17 19:26:44,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3476220.0, ans=0.125 2024-08-17 19:26:50,684 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 16 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-17 19:26:58,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3476320.0, ans=0.1 2024-08-17 19:26:58,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3476320.0, ans=0.125 2024-08-17 19:27:03,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3476320.0, ans=0.125 2024-08-17 19:27:04,687 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 17 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-17 19:27:08,307 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 4500, loss[loss=0.09367, beats_loss=0.01202, ecapa_loss=0.0001426, whisper_loss=0.08023, over 15480.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01054, ecapa_loss=0.0001481, whisper_loss=0.08975, over 3878135.76 frames. ], batch size: 60, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:27:12,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3476420.0, ans=0.125 2024-08-17 19:27:13,999 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.85 vs. limit=15.0 2024-08-17 19:27:27,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3476520.0, ans=0.125 2024-08-17 19:27:33,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3476620.0, ans=0.0 2024-08-17 19:27:37,992 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-17 19:27:39,519 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 17 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-17 19:27:40,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3476620.0, ans=0.0 2024-08-17 19:27:44,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3476720.0, ans=0.0 2024-08-17 19:27:53,056 WARNING [optim.py:496] (0/4) Scaling gradients by 0.05500377342104912, model_norm_threshold=52.087669372558594 2024-08-17 19:27:53,220 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.2.out_combiner.bypass_scale with proportion 0.09, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.494e+04, grad_sumsq=1.471e+05, orig_rms_sq=5.773e-01 2024-08-17 19:28:01,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3476820.0, ans=0.0 2024-08-17 19:28:08,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3476820.0, ans=0.125 2024-08-17 19:28:09,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3476920.0, ans=0.0 2024-08-17 19:28:10,283 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.751e+01 2.357e+01 2.568e+01 2.877e+01 9.470e+02, threshold=5.136e+01, percent-clipped=3.0 2024-08-17 19:28:10,303 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 4550, loss[loss=0.1034, beats_loss=0.01236, ecapa_loss=0.0001386, whisper_loss=0.08962, over 17279.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01057, ecapa_loss=0.0001483, whisper_loss=0.09047, over 3888537.51 frames. ], batch size: 69, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:28:14,994 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.58 vs. limit=15.0 2024-08-17 19:28:23,037 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.22 vs. limit=15.0 2024-08-17 19:28:47,593 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3477120.0, ans=0.125 2024-08-17 19:28:57,149 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 13 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-17 19:29:01,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3477220.0, ans=0.125 2024-08-17 19:29:09,585 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.13 vs. limit=22.5 2024-08-17 19:29:12,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3477320.0, ans=0.125 2024-08-17 19:29:16,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3477420.0, ans=0.1 2024-08-17 19:29:17,427 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 4600, loss[loss=0.1004, beats_loss=0.01113, ecapa_loss=0.000164, whisper_loss=0.0876, over 22595.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01058, ecapa_loss=0.0001482, whisper_loss=0.09028, over 3899229.27 frames. ], batch size: 93, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:29:20,817 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.38 vs. limit=6.0 2024-08-17 19:29:26,228 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.71 vs. limit=15.0 2024-08-17 19:29:30,350 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.57 vs. limit=6.0 2024-08-17 19:29:48,121 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 17 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-17 19:29:48,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3477620.0, ans=0.0 2024-08-17 19:30:02,581 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 19 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-17 19:30:05,562 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 25 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-17 19:30:09,445 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-17 19:30:24,448 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.235e+01 2.492e+01 2.713e+01 4.136e+01, threshold=4.985e+01, percent-clipped=0.0 2024-08-17 19:30:24,467 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 4650, loss[loss=0.1042, beats_loss=0.009809, ecapa_loss=0.0001513, whisper_loss=0.09287, over 17372.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01067, ecapa_loss=0.0001481, whisper_loss=0.08933, over 3880762.51 frames. ], batch size: 69, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:30:35,788 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.49 vs. limit=15.0 2024-08-17 19:30:36,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=3478020.0, ans=15.0 2024-08-17 19:30:43,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3478020.0, ans=0.2 2024-08-17 19:30:55,496 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 21 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-17 19:30:56,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3478120.0, ans=0.1 2024-08-17 19:31:02,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3478120.0, ans=0.0 2024-08-17 19:31:12,689 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-17 19:31:32,478 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 4700, loss[loss=0.112, beats_loss=0.01073, ecapa_loss=0.0001189, whisper_loss=0.1001, over 22521.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01058, ecapa_loss=0.0001476, whisper_loss=0.09053, over 3905565.65 frames. ], batch size: 87, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:31:38,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3478420.0, ans=0.0 2024-08-17 19:31:51,108 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.91 vs. limit=15.0 2024-08-17 19:31:52,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3478520.0, ans=0.125 2024-08-17 19:32:13,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3478720.0, ans=0.0 2024-08-17 19:32:21,975 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 25 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-17 19:32:28,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3478820.0, ans=0.125 2024-08-17 19:32:39,304 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.685e+01 2.375e+01 2.548e+01 2.840e+01 1.213e+02, threshold=5.097e+01, percent-clipped=2.0 2024-08-17 19:32:39,323 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 4750, loss[loss=0.09189, beats_loss=0.01291, ecapa_loss=0.0001408, whisper_loss=0.07758, over 21679.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01063, ecapa_loss=0.0001478, whisper_loss=0.09011, over 3928990.28 frames. ], batch size: 88, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:32:40,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3478920.0, ans=0.125 2024-08-17 19:32:47,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3478920.0, ans=0.5 2024-08-17 19:32:50,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3478920.0, ans=0.125 2024-08-17 19:32:52,830 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-17 19:33:05,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3479120.0, ans=0.125 2024-08-17 19:33:11,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=3479120.0, ans=0.5 2024-08-17 19:33:13,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3479120.0, ans=0.2 2024-08-17 19:33:17,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3479120.0, ans=0.125 2024-08-17 19:33:18,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3479220.0, ans=0.0 2024-08-17 19:33:20,511 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.87 vs. limit=15.0 2024-08-17 19:33:46,497 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 4800, loss[loss=0.09167, beats_loss=0.01018, ecapa_loss=0.0001203, whisper_loss=0.08029, over 17791.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01064, ecapa_loss=0.0001473, whisper_loss=0.09033, over 3938452.24 frames. ], batch size: 66, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:33:48,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3479420.0, ans=0.0 2024-08-17 19:33:55,179 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 29 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-17 19:34:12,849 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-17 19:34:13,978 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-17 19:34:15,172 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 25 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-17 19:34:25,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3479720.0, ans=0.0 2024-08-17 19:34:31,174 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 33 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-17 19:34:44,102 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 18 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-17 19:34:51,998 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.389e+01 2.627e+01 2.841e+01 3.778e+01, threshold=5.254e+01, percent-clipped=0.0 2024-08-17 19:34:52,017 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 4850, loss[loss=0.1174, beats_loss=0.009999, ecapa_loss=0.0001382, whisper_loss=0.1061, over 18566.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01064, ecapa_loss=0.0001469, whisper_loss=0.09057, over 3906302.79 frames. ], batch size: 69, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:35:00,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3479920.0, ans=0.0 2024-08-17 19:35:01,245 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-348000.pt 2024-08-17 19:35:07,345 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 24 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-17 19:35:08,627 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-17 19:35:09,986 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-17 19:35:22,211 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 20 from LS+wenet, 22 from Vox, 16 fro AS 2024-08-17 19:35:34,470 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.72 vs. limit=15.0 2024-08-17 19:35:58,439 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-17 19:35:59,731 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 4900, loss[loss=0.105, beats_loss=0.01014, ecapa_loss=0.0001816, whisper_loss=0.09302, over 16450.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01061, ecapa_loss=0.000147, whisper_loss=0.09072, over 3887686.42 frames. ], batch size: 67, lr: 2.56e-03, grad_scale: 1.152921504606847e+18 2024-08-17 19:36:14,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3480520.0, ans=0.0 2024-08-17 19:36:16,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3480520.0, ans=0.0 2024-08-17 19:36:23,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3480520.0, ans=0.125 2024-08-17 19:36:23,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3480520.0, ans=0.0 2024-08-17 19:36:44,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3480720.0, ans=0.0 2024-08-17 19:36:46,590 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-17 19:37:04,873 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.256e+01 2.549e+01 2.782e+01 4.181e+01, threshold=5.098e+01, percent-clipped=0.0 2024-08-17 19:37:04,894 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 4950, loss[loss=0.08738, beats_loss=0.0104, ecapa_loss=0.0001542, whisper_loss=0.07543, over 15478.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01051, ecapa_loss=0.0001479, whisper_loss=0.09075, over 3888006.89 frames. ], batch size: 59, lr: 2.56e-03, grad_scale: 1.152921504606847e+18 2024-08-17 19:37:12,021 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-17 19:37:15,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3480920.0, ans=0.125 2024-08-17 19:37:15,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3480920.0, ans=0.1 2024-08-17 19:37:16,998 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 17 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-17 19:37:21,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3481020.0, ans=0.125 2024-08-17 19:37:22,099 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 24 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-17 19:37:46,581 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.40 vs. limit=10.0 2024-08-17 19:37:51,088 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.89 vs. limit=15.0 2024-08-17 19:38:04,949 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 39 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-17 19:38:08,635 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 5000, loss[loss=0.09491, beats_loss=0.01407, ecapa_loss=0.0001271, whisper_loss=0.07958, over 22244.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01049, ecapa_loss=0.0001493, whisper_loss=0.09128, over 3889253.23 frames. ], batch size: 89, lr: 2.56e-03, grad_scale: 1.152921504606847e+18 2024-08-17 19:38:15,162 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 14 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-17 19:38:26,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3481520.0, ans=0.0 2024-08-17 19:38:36,662 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2024-08-17 19:38:37,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=3481620.0, ans=0.02 2024-08-17 19:38:39,553 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-17 19:38:50,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3481720.0, ans=0.125 2024-08-17 19:39:09,779 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.326e+01 2.618e+01 2.940e+01 4.201e+01, threshold=5.237e+01, percent-clipped=0.0 2024-08-17 19:39:09,798 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 5050, loss[loss=0.09877, beats_loss=0.01122, ecapa_loss=0.0001289, whisper_loss=0.08626, over 23903.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01056, ecapa_loss=0.0001492, whisper_loss=0.09139, over 3901827.36 frames. ], batch size: 91, lr: 2.56e-03, grad_scale: 1.152921504606847e+18 2024-08-17 19:39:21,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3482020.0, ans=0.125 2024-08-17 19:39:21,718 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 19:39:34,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3482020.0, ans=0.125 2024-08-17 19:40:10,062 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-08-17 19:40:11,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3482320.0, ans=0.1 2024-08-17 19:40:17,448 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 5100, loss[loss=0.1, beats_loss=0.01028, ecapa_loss=0.0001686, whisper_loss=0.08807, over 17643.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01057, ecapa_loss=0.0001499, whisper_loss=0.09121, over 3861541.28 frames. ], batch size: 74, lr: 2.56e-03, grad_scale: 1.152921504606847e+18 2024-08-17 19:40:21,600 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.65 vs. limit=12.0 2024-08-17 19:40:29,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3482420.0, ans=0.125 2024-08-17 19:40:55,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3482620.0, ans=0.1 2024-08-17 19:41:02,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3482720.0, ans=0.125 2024-08-17 19:41:09,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3482720.0, ans=0.125 2024-08-17 19:41:25,956 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.21 vs. limit=15.0 2024-08-17 19:41:29,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3482920.0, ans=0.1 2024-08-17 19:41:30,557 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.372e+01 2.547e+01 2.927e+01 4.256e+01, threshold=5.094e+01, percent-clipped=0.0 2024-08-17 19:41:30,577 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 5150, loss[loss=0.09565, beats_loss=0.01064, ecapa_loss=0.000149, whisper_loss=0.08352, over 17910.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01064, ecapa_loss=0.0001481, whisper_loss=0.09056, over 3867590.95 frames. ], batch size: 72, lr: 2.56e-03, grad_scale: 1.152921504606847e+18 2024-08-17 19:41:30,987 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 26 from LS+wenet, 32 from Vox, 37 fro AS 2024-08-17 19:41:33,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3482920.0, ans=0.07 2024-08-17 19:41:33,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3482920.0, ans=0.125 2024-08-17 19:41:39,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3482920.0, ans=0.0 2024-08-17 19:41:44,145 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 17 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-17 19:41:52,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3483020.0, ans=0.125 2024-08-17 19:42:00,085 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-17 19:42:09,322 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 29 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-17 19:42:09,738 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.92 vs. limit=15.0 2024-08-17 19:42:14,418 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 23 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-17 19:42:26,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3483220.0, ans=0.125 2024-08-17 19:42:41,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3483320.0, ans=0.125 2024-08-17 19:42:44,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3483320.0, ans=0.0 2024-08-17 19:42:47,285 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 5200, loss[loss=0.09885, beats_loss=0.01265, ecapa_loss=0.000108, whisper_loss=0.08512, over 19960.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01056, ecapa_loss=0.0001483, whisper_loss=0.09117, over 3878689.87 frames. ], batch size: 76, lr: 2.56e-03, grad_scale: 1.152921504606847e+18 2024-08-17 19:42:57,195 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 11 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-17 19:43:01,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3483520.0, ans=0.0 2024-08-17 19:43:21,433 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-17 19:43:23,240 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 18 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-17 19:43:39,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3483720.0, ans=0.035 2024-08-17 19:43:39,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3483720.0, ans=0.0 2024-08-17 19:43:39,994 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 19 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-17 19:43:43,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3483720.0, ans=0.2 2024-08-17 19:43:53,679 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-17 19:44:06,796 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 5250, loss[loss=0.09581, beats_loss=0.009875, ecapa_loss=0.0001539, whisper_loss=0.08439, over 22001.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01049, ecapa_loss=0.0001487, whisper_loss=0.09057, over 3865918.36 frames. ], batch size: 88, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:44:07,892 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.262e+01 2.523e+01 2.892e+01 1.179e+02, threshold=5.045e+01, percent-clipped=2.0 2024-08-17 19:44:12,912 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 19:44:29,115 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 24 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-17 19:44:42,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3484120.0, ans=0.1 2024-08-17 19:44:59,783 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 17 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-17 19:45:02,226 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 15 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-17 19:45:05,114 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3484320.0, ans=0.0 2024-08-17 19:45:10,826 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.83 vs. limit=15.0 2024-08-17 19:45:11,186 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 5300, loss[loss=0.1272, beats_loss=0.0104, ecapa_loss=0.0001268, whisper_loss=0.1156, over 21943.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0105, ecapa_loss=0.000149, whisper_loss=0.09047, over 3898317.57 frames. ], batch size: 82, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:45:22,760 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-17 19:45:32,697 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 22 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-17 19:45:48,724 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 15 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-17 19:45:59,318 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.41 vs. limit=22.5 2024-08-17 19:46:14,040 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 5350, loss[loss=0.124, beats_loss=0.006712, ecapa_loss=0.0001441, whisper_loss=0.1159, over 15927.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01054, ecapa_loss=0.0001476, whisper_loss=0.09024, over 3879740.29 frames. ], batch size: 60, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:46:14,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3484920.0, ans=0.125 2024-08-17 19:46:15,222 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.678e+01 2.274e+01 2.526e+01 2.902e+01 3.170e+02, threshold=5.052e+01, percent-clipped=3.0 2024-08-17 19:46:18,460 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.73 vs. limit=15.0 2024-08-17 19:46:19,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3484920.0, ans=0.04949747468305833 2024-08-17 19:46:34,469 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 14 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-17 19:46:58,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3485220.0, ans=0.0 2024-08-17 19:47:07,758 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.61 vs. limit=15.0 2024-08-17 19:47:11,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3485320.0, ans=0.0 2024-08-17 19:47:13,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3485320.0, ans=0.125 2024-08-17 19:47:18,390 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 5400, loss[loss=0.1012, beats_loss=0.0122, ecapa_loss=0.0001378, whisper_loss=0.08762, over 19879.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01055, ecapa_loss=0.0001467, whisper_loss=0.08996, over 3866754.92 frames. ], batch size: 79, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:47:18,548 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 22 from LS+wenet, 9 from Vox, 27 fro AS 2024-08-17 19:47:27,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3485420.0, ans=0.07 2024-08-17 19:48:00,307 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=15.42 vs. limit=15.0 2024-08-17 19:48:07,555 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.25 vs. limit=15.0 2024-08-17 19:48:16,239 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.99 vs. limit=10.0 2024-08-17 19:48:17,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3485820.0, ans=0.125 2024-08-17 19:48:29,257 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 5450, loss[loss=0.1004, beats_loss=0.01106, ecapa_loss=0.0001257, whisper_loss=0.08807, over 18489.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01052, ecapa_loss=0.0001467, whisper_loss=0.09031, over 3845708.30 frames. ], batch size: 73, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:48:30,531 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.407e+01 2.677e+01 2.868e+01 8.391e+01, threshold=5.355e+01, percent-clipped=1.0 2024-08-17 19:48:41,781 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3485920.0, ans=0.0 2024-08-17 19:49:02,159 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.84 vs. limit=15.0 2024-08-17 19:49:13,061 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 23 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-17 19:49:46,122 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 5500, loss[loss=0.09999, beats_loss=0.009997, ecapa_loss=0.0001431, whisper_loss=0.08856, over 19773.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01057, ecapa_loss=0.0001469, whisper_loss=0.09018, over 3866641.25 frames. ], batch size: 78, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 19:49:51,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3486420.0, ans=0.0 2024-08-17 19:50:01,679 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.38 vs. limit=15.0 2024-08-17 19:50:03,736 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-17 19:50:08,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3486520.0, ans=0.125 2024-08-17 19:50:13,488 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 26 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-17 19:50:19,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3486620.0, ans=0.125 2024-08-17 19:50:23,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3486620.0, ans=0.025 2024-08-17 19:50:29,195 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.884e+00 2024-08-17 19:50:30,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3486620.0, ans=0.1 2024-08-17 19:50:46,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3486720.0, ans=0.125 2024-08-17 19:50:59,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3486820.0, ans=0.0 2024-08-17 19:51:02,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3486820.0, ans=0.1 2024-08-17 19:51:04,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3486820.0, ans=0.2 2024-08-17 19:51:08,861 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 5550, loss[loss=0.09841, beats_loss=0.01095, ecapa_loss=0.0001699, whisper_loss=0.08576, over 22243.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01056, ecapa_loss=0.0001476, whisper_loss=0.09125, over 3894606.52 frames. ], batch size: 91, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 19:51:10,938 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 15 from Vox, 48 fro AS 2024-08-17 19:51:11,877 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.945e+01 2.339e+01 2.599e+01 2.892e+01 2.617e+02, threshold=5.198e+01, percent-clipped=1.0 2024-08-17 19:51:14,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3486920.0, ans=0.0 2024-08-17 19:51:18,449 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-17 19:51:22,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3486920.0, ans=0.0 2024-08-17 19:51:29,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3487020.0, ans=0.125 2024-08-17 19:51:52,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3487120.0, ans=0.0 2024-08-17 19:52:03,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3487220.0, ans=0.125 2024-08-17 19:52:21,588 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 5600, loss[loss=0.1036, beats_loss=0.01077, ecapa_loss=0.00014, whisper_loss=0.09141, over 20697.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0106, ecapa_loss=0.0001457, whisper_loss=0.09112, over 3906483.34 frames. ], batch size: 83, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 19:52:21,691 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 28 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-17 19:52:23,244 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.627e+05 2024-08-17 19:52:38,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3487520.0, ans=0.0 2024-08-17 19:52:56,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3487620.0, ans=0.125 2024-08-17 19:52:57,346 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 16 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-17 19:52:58,920 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 19:53:01,806 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.79 vs. limit=22.5 2024-08-17 19:53:04,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3487720.0, ans=0.0 2024-08-17 19:53:12,445 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 17 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-17 19:53:26,382 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 5650, loss[loss=0.0992, beats_loss=0.01181, ecapa_loss=0.0001101, whisper_loss=0.08629, over 19083.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0106, ecapa_loss=0.0001462, whisper_loss=0.09043, over 3905918.81 frames. ], batch size: 75, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 19:53:28,913 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+01 2.385e+01 2.591e+01 2.978e+01 4.325e+01, threshold=5.182e+01, percent-clipped=0.0 2024-08-17 19:53:29,620 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.28 vs. limit=15.0 2024-08-17 19:53:31,363 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.95 vs. limit=22.5 2024-08-17 19:53:32,313 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.627e+00 2024-08-17 19:53:36,147 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 18 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-17 19:54:01,835 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-17 19:54:16,106 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 34 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-17 19:54:17,670 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.51 vs. limit=15.0 2024-08-17 19:54:27,509 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 14 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-17 19:54:30,726 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 15 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-17 19:54:31,332 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.81 vs. limit=15.0 2024-08-17 19:54:31,784 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 5700, loss[loss=0.08359, beats_loss=0.01286, ecapa_loss=0.0001017, whisper_loss=0.06972, over 17661.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01057, ecapa_loss=0.0001471, whisper_loss=0.09063, over 3926175.75 frames. ], batch size: 69, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 19:54:34,607 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 14 from Vox, 44 fro AS 2024-08-17 19:54:43,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3488520.0, ans=0.0 2024-08-17 19:55:06,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3488620.0, ans=0.0 2024-08-17 19:55:09,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3488720.0, ans=0.125 2024-08-17 19:55:11,487 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.48 vs. limit=15.0 2024-08-17 19:55:35,798 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 5750, loss[loss=0.08777, beats_loss=0.01079, ecapa_loss=0.0001408, whisper_loss=0.07557, over 17148.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01048, ecapa_loss=0.0001473, whisper_loss=0.09171, over 3964802.70 frames. ], batch size: 68, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 19:55:36,399 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.008e+00 2024-08-17 19:55:38,277 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.375e+01 2.666e+01 3.139e+01 4.378e+01, threshold=5.332e+01, percent-clipped=0.0 2024-08-17 19:55:38,452 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 37 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-17 19:55:40,841 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 20 from LS+wenet, 9 from Vox, 25 fro AS 2024-08-17 19:55:50,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3489020.0, ans=0.1 2024-08-17 19:56:15,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3489220.0, ans=0.125 2024-08-17 19:56:22,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3489220.0, ans=0.05 2024-08-17 19:56:26,058 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.37 vs. limit=15.0 2024-08-17 19:56:28,871 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 19:56:32,449 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 32 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-17 19:56:39,890 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 5800, loss[loss=0.08702, beats_loss=0.01122, ecapa_loss=0.0001279, whisper_loss=0.07453, over 19196.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01046, ecapa_loss=0.0001482, whisper_loss=0.09154, over 3924839.73 frames. ], batch size: 75, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 19:57:05,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3489620.0, ans=0.125 2024-08-17 19:57:09,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3489620.0, ans=0.125 2024-08-17 19:57:15,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3489620.0, ans=0.5 2024-08-17 19:57:22,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3489720.0, ans=0.125 2024-08-17 19:57:24,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=3489720.0, ans=15.0 2024-08-17 19:57:26,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3489720.0, ans=0.1 2024-08-17 19:57:28,871 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.36 vs. limit=15.0 2024-08-17 19:57:42,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3489820.0, ans=0.125 2024-08-17 19:57:45,920 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 5850, loss[loss=0.0963, beats_loss=0.01109, ecapa_loss=0.00014, whisper_loss=0.08381, over 14823.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01053, ecapa_loss=0.0001476, whisper_loss=0.0907, over 3907414.14 frames. ], batch size: 60, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 19:57:48,650 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.235e+01 2.518e+01 2.723e+01 7.884e+01, threshold=5.036e+01, percent-clipped=1.0 2024-08-17 19:57:49,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3489920.0, ans=0.1 2024-08-17 19:57:53,849 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 21 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-17 19:58:00,528 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 28 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-17 19:58:02,961 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 27 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-17 19:58:04,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3490020.0, ans=0.0 2024-08-17 19:58:05,812 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.302e+05 2024-08-17 19:58:25,032 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-17 19:58:25,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3490220.0, ans=0.125 2024-08-17 19:58:26,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3490220.0, ans=0.0 2024-08-17 19:58:28,542 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=22.03 vs. limit=22.5 2024-08-17 19:58:37,374 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.87 vs. limit=22.5 2024-08-17 19:58:50,998 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 5900, loss[loss=0.118, beats_loss=0.01124, ecapa_loss=0.0001425, whisper_loss=0.1053, over 23263.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01057, ecapa_loss=0.0001483, whisper_loss=0.09066, over 3907452.41 frames. ], batch size: 93, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 19:59:01,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3490420.0, ans=0.0 2024-08-17 19:59:05,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3490520.0, ans=0.1 2024-08-17 19:59:17,804 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.13 vs. limit=15.0 2024-08-17 19:59:28,987 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.24 vs. limit=15.0 2024-08-17 19:59:34,263 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.71 vs. limit=15.0 2024-08-17 19:59:40,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3490720.0, ans=0.1 2024-08-17 19:59:45,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3490820.0, ans=0.125 2024-08-17 19:59:56,870 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 5950, loss[loss=0.1048, beats_loss=0.0087, ecapa_loss=0.0001663, whisper_loss=0.09448, over 17446.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01061, ecapa_loss=0.0001484, whisper_loss=0.09036, over 3916337.95 frames. ], batch size: 71, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 19:59:59,484 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.656e+01 2.262e+01 2.461e+01 2.864e+01 3.747e+01, threshold=4.922e+01, percent-clipped=0.0 2024-08-17 20:00:01,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3490920.0, ans=0.0 2024-08-17 20:00:01,720 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.89 vs. limit=12.0 2024-08-17 20:00:20,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3491020.0, ans=0.125 2024-08-17 20:00:28,564 WARNING [optim.py:496] (0/4) Scaling gradients by 0.08441564440727234, model_norm_threshold=49.215904235839844 2024-08-17 20:00:28,729 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.3.norm.log_scale with proportion 0.11, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.653e+04, grad_sumsq=3.653e+04, orig_rms_sq=1.000e+00 2024-08-17 20:00:29,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3491120.0, ans=0.2 2024-08-17 20:00:34,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3491120.0, ans=0.0 2024-08-17 20:00:35,744 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 22 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-17 20:00:36,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3491220.0, ans=0.125 2024-08-17 20:00:40,151 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 20:00:45,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3491220.0, ans=0.125 2024-08-17 20:00:47,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3491220.0, ans=0.0 2024-08-17 20:01:04,679 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 6000, loss[loss=0.1075, beats_loss=0.009424, ecapa_loss=0.0001412, whisper_loss=0.09667, over 17288.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01054, ecapa_loss=0.0001484, whisper_loss=0.09088, over 3881142.48 frames. ], batch size: 69, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:01:04,681 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-17 20:01:38,365 INFO [train_multi_KD3.py:1149] (0/4) Epoch 24, validation on ASR_libri: loss=0.2507, beats_loss=0, ecapa_loss=0.000535, whisper_loss=0.2453, over 922467.00 frames. 2024-08-17 20:01:44,123 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.3595, 2.2011, 2.5267, 1.6113], device='cuda:0') 2024-08-17 20:01:56,018 INFO [train_multi_KD3.py:1149] (0/4) Epoch 24, validation on SV_voxceleb1: loss=0.00412, beats_loss=0, ecapa_loss=0.000412, whisper_loss=0, over 939242.00 frames. 2024-08-17 20:03:40,221 INFO [train_multi_KD3.py:1149] (0/4) Epoch 24, validation on AT_audioset: loss=0.02332, beats_loss=0.02332, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-17 20:03:40,227 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32389MB 2024-08-17 20:03:48,629 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.32 vs. limit=22.5 2024-08-17 20:03:54,249 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 25 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-17 20:03:55,519 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-17 20:04:07,096 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-17 20:04:20,459 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-17 20:04:20,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3491720.0, ans=0.125 2024-08-17 20:04:25,955 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-17 20:04:41,770 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.56 vs. limit=15.0 2024-08-17 20:04:44,844 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 6050, loss[loss=0.08585, beats_loss=0.01215, ecapa_loss=0.0001303, whisper_loss=0.07239, over 20595.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01054, ecapa_loss=0.000148, whisper_loss=0.09086, over 3887139.50 frames. ], batch size: 83, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:04:47,353 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.332e+01 2.568e+01 2.986e+01 5.830e+02, threshold=5.136e+01, percent-clipped=1.0 2024-08-17 20:04:52,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3491920.0, ans=0.125 2024-08-17 20:04:55,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3491920.0, ans=0.04949747468305833 2024-08-17 20:05:02,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3492020.0, ans=0.2 2024-08-17 20:05:04,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3492020.0, ans=0.2 2024-08-17 20:05:10,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3492120.0, ans=0.125 2024-08-17 20:05:20,471 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.81 vs. limit=15.0 2024-08-17 20:05:55,385 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 6100, loss[loss=0.08789, beats_loss=0.01168, ecapa_loss=0.000167, whisper_loss=0.07455, over 20643.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01056, ecapa_loss=0.0001478, whisper_loss=0.09062, over 3916047.41 frames. ], batch size: 87, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:06:21,297 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 25 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-17 20:06:29,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3492620.0, ans=0.125 2024-08-17 20:06:34,890 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-17 20:06:39,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3492720.0, ans=0.1 2024-08-17 20:07:06,133 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 6150, loss[loss=0.09823, beats_loss=0.01155, ecapa_loss=0.0001243, whisper_loss=0.08543, over 23773.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01049, ecapa_loss=0.0001484, whisper_loss=0.09098, over 3888609.12 frames. ], batch size: 91, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:07:08,589 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.377e+01 2.638e+01 2.970e+01 6.161e+01, threshold=5.275e+01, percent-clipped=1.0 2024-08-17 20:07:10,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3492920.0, ans=0.0 2024-08-17 20:07:14,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3492920.0, ans=0.0 2024-08-17 20:07:29,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3493020.0, ans=0.125 2024-08-17 20:07:35,798 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.99 vs. limit=15.0 2024-08-17 20:07:51,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3493220.0, ans=0.1 2024-08-17 20:07:54,887 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 24 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-17 20:08:08,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3493320.0, ans=0.125 2024-08-17 20:08:11,885 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 6200, loss[loss=0.1091, beats_loss=0.008827, ecapa_loss=0.0001777, whisper_loss=0.09851, over 21695.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01052, ecapa_loss=0.0001478, whisper_loss=0.09071, over 3909746.91 frames. ], batch size: 85, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:08:20,187 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.38 vs. limit=15.0 2024-08-17 20:08:25,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3493520.0, ans=0.125 2024-08-17 20:08:26,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3493520.0, ans=0.0 2024-08-17 20:08:47,334 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 18 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-17 20:09:00,586 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.75 vs. limit=15.0 2024-08-17 20:09:04,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3493820.0, ans=0.0 2024-08-17 20:09:16,767 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 6250, loss[loss=0.1173, beats_loss=0.008482, ecapa_loss=0.0001749, whisper_loss=0.1071, over 21969.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01057, ecapa_loss=0.0001477, whisper_loss=0.0906, over 3895672.57 frames. ], batch size: 88, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:09:16,903 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 22 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-17 20:09:17,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3493920.0, ans=0.0 2024-08-17 20:09:18,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3493920.0, ans=0.125 2024-08-17 20:09:19,388 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.612e+01 2.313e+01 2.518e+01 2.765e+01 5.277e+01, threshold=5.035e+01, percent-clipped=1.0 2024-08-17 20:09:22,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3493920.0, ans=0.125 2024-08-17 20:09:51,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3494120.0, ans=0.125 2024-08-17 20:10:06,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3494220.0, ans=0.2 2024-08-17 20:10:18,393 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.17 vs. limit=22.5 2024-08-17 20:10:22,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3494420.0, ans=0.125 2024-08-17 20:10:23,128 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 6300, loss[loss=0.1197, beats_loss=0.009569, ecapa_loss=0.0001567, whisper_loss=0.1085, over 20567.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01056, ecapa_loss=0.000147, whisper_loss=0.09118, over 3909195.34 frames. ], batch size: 80, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:10:36,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3494520.0, ans=0.0 2024-08-17 20:10:45,827 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.42 vs. limit=6.0 2024-08-17 20:10:49,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3494620.0, ans=0.125 2024-08-17 20:11:03,116 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2024-08-17 20:11:11,624 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 18 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-17 20:11:13,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3494720.0, ans=0.125 2024-08-17 20:11:23,310 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-17 20:11:23,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3494820.0, ans=0.2 2024-08-17 20:11:28,204 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 6350, loss[loss=0.08311, beats_loss=0.01136, ecapa_loss=0.0001717, whisper_loss=0.07003, over 12963.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01056, ecapa_loss=0.0001477, whisper_loss=0.09142, over 3932598.95 frames. ], batch size: 54, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:11:30,939 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.269e+01 2.498e+01 3.027e+01 2.064e+02, threshold=4.996e+01, percent-clipped=1.0 2024-08-17 20:11:32,397 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-17 20:11:37,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3494920.0, ans=0.125 2024-08-17 20:11:43,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3495020.0, ans=0.125 2024-08-17 20:11:47,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3495020.0, ans=0.1 2024-08-17 20:11:58,609 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-17 20:11:58,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3495120.0, ans=0.0 2024-08-17 20:12:17,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3495220.0, ans=0.0 2024-08-17 20:12:17,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=3495220.0, ans=0.2 2024-08-17 20:12:19,497 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 26 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-17 20:12:20,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3495320.0, ans=0.125 2024-08-17 20:12:31,788 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 6400, loss[loss=0.1192, beats_loss=0.009253, ecapa_loss=0.000146, whisper_loss=0.1085, over 22888.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01051, ecapa_loss=0.0001475, whisper_loss=0.09144, over 3920878.85 frames. ], batch size: 91, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:12:33,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3495420.0, ans=0.125 2024-08-17 20:12:42,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3495420.0, ans=0.125 2024-08-17 20:12:49,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3495520.0, ans=0.05 2024-08-17 20:12:52,610 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.27 vs. limit=15.0 2024-08-17 20:13:10,575 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.96 vs. limit=15.0 2024-08-17 20:13:12,433 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 27 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-17 20:13:21,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3495820.0, ans=0.1 2024-08-17 20:13:35,348 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 6450, loss[loss=0.08549, beats_loss=0.01207, ecapa_loss=0.0001522, whisper_loss=0.0719, over 21283.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01056, ecapa_loss=0.0001481, whisper_loss=0.09076, over 3910062.53 frames. ], batch size: 93, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:13:38,103 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+01 2.330e+01 2.566e+01 2.943e+01 3.819e+01, threshold=5.133e+01, percent-clipped=0.0 2024-08-17 20:14:00,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3496120.0, ans=0.125 2024-08-17 20:14:02,008 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-17 20:14:07,083 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-17 20:14:14,470 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-17 20:14:17,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3496220.0, ans=0.125 2024-08-17 20:14:35,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3496320.0, ans=0.125 2024-08-17 20:14:36,287 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-17 20:14:38,783 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 6500, loss[loss=0.1011, beats_loss=0.01049, ecapa_loss=0.000164, whisper_loss=0.08901, over 16456.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01056, ecapa_loss=0.0001482, whisper_loss=0.0914, over 3931120.54 frames. ], batch size: 69, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:14:38,962 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 21 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-17 20:14:41,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3496420.0, ans=0.0 2024-08-17 20:15:01,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3496520.0, ans=0.125 2024-08-17 20:15:01,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3496520.0, ans=0.125 2024-08-17 20:15:08,307 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.17 vs. limit=15.0 2024-08-17 20:15:36,103 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-17 20:15:41,281 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 6550, loss[loss=0.08603, beats_loss=0.01322, ecapa_loss=0.0001381, whisper_loss=0.07142, over 22366.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01048, ecapa_loss=0.0001481, whisper_loss=0.09186, over 3939619.15 frames. ], batch size: 92, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:15:43,823 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.730e+01 2.275e+01 2.582e+01 2.844e+01 5.439e+01, threshold=5.163e+01, percent-clipped=1.0 2024-08-17 20:15:51,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3496920.0, ans=0.0 2024-08-17 20:15:52,566 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 40 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-17 20:16:00,059 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-17 20:16:10,407 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 37 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-17 20:16:12,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3497120.0, ans=0.1 2024-08-17 20:16:13,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=3497120.0, ans=15.0 2024-08-17 20:16:14,229 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.820e-02 2024-08-17 20:16:18,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3497220.0, ans=0.125 2024-08-17 20:16:20,211 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 17 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-17 20:16:32,867 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 26 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-17 20:16:34,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=3497320.0, ans=0.2 2024-08-17 20:16:43,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3497420.0, ans=0.125 2024-08-17 20:16:44,144 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 6600, loss[loss=0.1219, beats_loss=0.009033, ecapa_loss=0.0001765, whisper_loss=0.1111, over 21717.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01042, ecapa_loss=0.0001487, whisper_loss=0.09297, over 3979456.62 frames. ], batch size: 89, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:16:47,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3497420.0, ans=0.125 2024-08-17 20:16:54,649 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-17 20:16:58,727 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=7.117e+00 2024-08-17 20:17:00,770 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-17 20:17:05,587 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-17 20:17:06,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3497520.0, ans=0.0 2024-08-17 20:17:07,992 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-17 20:17:21,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3497720.0, ans=0.125 2024-08-17 20:17:31,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3497720.0, ans=0.125 2024-08-17 20:17:48,872 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 6650, loss[loss=0.1041, beats_loss=0.01093, ecapa_loss=0.0001682, whisper_loss=0.0915, over 21671.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01048, ecapa_loss=0.000149, whisper_loss=0.09217, over 3973575.77 frames. ], batch size: 89, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:17:52,005 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.725e+01 2.296e+01 2.528e+01 2.803e+01 4.875e+01, threshold=5.056e+01, percent-clipped=0.0 2024-08-17 20:17:55,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3497920.0, ans=0.125 2024-08-17 20:18:06,566 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.31 vs. limit=15.0 2024-08-17 20:18:15,150 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3498120.0, ans=0.0 2024-08-17 20:18:27,198 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.69 vs. limit=15.0 2024-08-17 20:18:46,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3498320.0, ans=0.1 2024-08-17 20:18:48,055 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 20:18:52,840 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 11 from Vox, 39 fro AS 2024-08-17 20:18:55,600 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 6700, loss[loss=0.09386, beats_loss=0.01052, ecapa_loss=0.0001481, whisper_loss=0.08186, over 17142.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01046, ecapa_loss=0.0001483, whisper_loss=0.09218, over 3966696.35 frames. ], batch size: 69, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:19:00,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3498420.0, ans=0.0 2024-08-17 20:19:20,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3498620.0, ans=0.125 2024-08-17 20:19:21,883 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-17 20:19:24,903 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.22 vs. limit=15.0 2024-08-17 20:19:28,107 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.94 vs. limit=15.0 2024-08-17 20:19:41,937 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 26 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-17 20:19:52,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3498820.0, ans=0.125 2024-08-17 20:19:57,568 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 15 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-17 20:20:02,619 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 6750, loss[loss=0.1208, beats_loss=0.006485, ecapa_loss=0.0001642, whisper_loss=0.1127, over 17138.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01038, ecapa_loss=0.000149, whisper_loss=0.09253, over 3924613.15 frames. ], batch size: 66, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:20:02,847 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 29 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-17 20:20:05,080 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.341e+01 2.553e+01 2.881e+01 4.288e+01, threshold=5.105e+01, percent-clipped=0.0 2024-08-17 20:20:08,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3498920.0, ans=0.0 2024-08-17 20:20:09,178 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 10 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-17 20:20:12,265 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 27 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-17 20:20:12,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3498920.0, ans=0.125 2024-08-17 20:20:26,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3499020.0, ans=0.0 2024-08-17 20:20:40,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3499120.0, ans=0.1 2024-08-17 20:20:48,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3499220.0, ans=0.0 2024-08-17 20:20:53,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3499220.0, ans=0.125 2024-08-17 20:20:59,816 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 24 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-17 20:21:00,526 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2024-08-17 20:21:10,349 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 6800, loss[loss=0.1041, beats_loss=0.01028, ecapa_loss=0.0001249, whisper_loss=0.09253, over 21816.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01034, ecapa_loss=0.000149, whisper_loss=0.09237, over 3928271.16 frames. ], batch size: 85, lr: 2.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:21:17,032 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 24 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-17 20:21:50,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3499720.0, ans=0.0 2024-08-17 20:21:50,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3499720.0, ans=0.2 2024-08-17 20:21:54,900 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 18 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-17 20:22:12,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3499820.0, ans=0.125 2024-08-17 20:22:19,111 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 6850, loss[loss=0.09296, beats_loss=0.0111, ecapa_loss=0.0001442, whisper_loss=0.08042, over 15411.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01042, ecapa_loss=0.0001484, whisper_loss=0.09177, over 3899486.57 frames. ], batch size: 60, lr: 2.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:22:22,023 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.292e+01 2.494e+01 2.760e+01 3.944e+01, threshold=4.988e+01, percent-clipped=0.0 2024-08-17 20:22:25,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3499920.0, ans=0.125 2024-08-17 20:23:20,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3500320.0, ans=0.1 2024-08-17 20:23:28,534 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 6900, loss[loss=0.1207, beats_loss=0.006761, ecapa_loss=0.0001567, whisper_loss=0.1124, over 15074.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01048, ecapa_loss=0.0001482, whisper_loss=0.09136, over 3856982.56 frames. ], batch size: 56, lr: 2.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:23:37,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3500420.0, ans=0.125 2024-08-17 20:23:49,300 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.84 vs. limit=22.5 2024-08-17 20:23:51,550 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 22 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-17 20:23:54,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3500620.0, ans=0.125 2024-08-17 20:23:57,091 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-17 20:24:03,597 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-17 20:24:08,816 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-17 20:24:15,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3500720.0, ans=0.125 2024-08-17 20:24:21,434 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 17 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-17 20:24:30,035 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 32 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-17 20:24:30,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3500820.0, ans=0.0 2024-08-17 20:24:34,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3500820.0, ans=0.125 2024-08-17 20:24:36,373 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.66 vs. limit=15.0 2024-08-17 20:24:36,732 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 6950, loss[loss=0.1044, beats_loss=0.01159, ecapa_loss=0.0001465, whisper_loss=0.09133, over 22391.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01056, ecapa_loss=0.0001478, whisper_loss=0.09068, over 3892367.00 frames. ], batch size: 91, lr: 2.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:24:39,296 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.285e+01 2.467e+01 2.826e+01 3.694e+01, threshold=4.933e+01, percent-clipped=0.0 2024-08-17 20:24:39,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3500920.0, ans=0.125 2024-08-17 20:24:43,938 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 27 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-17 20:24:47,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3500920.0, ans=0.0 2024-08-17 20:24:50,667 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-17 20:25:08,248 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-17 20:25:09,565 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 14 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-17 20:25:25,516 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.13 vs. limit=22.5 2024-08-17 20:25:28,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3501220.0, ans=0.125 2024-08-17 20:25:40,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3501320.0, ans=0.125 2024-08-17 20:25:43,693 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 7000, loss[loss=0.1068, beats_loss=0.008548, ecapa_loss=0.0002005, whisper_loss=0.09629, over 19488.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01057, ecapa_loss=0.0001483, whisper_loss=0.0901, over 3871589.85 frames. ], batch size: 79, lr: 2.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:25:54,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3501420.0, ans=0.125 2024-08-17 20:25:56,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3501520.0, ans=0.125 2024-08-17 20:26:03,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3501520.0, ans=0.1 2024-08-17 20:26:30,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3501720.0, ans=0.125 2024-08-17 20:26:45,386 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.16 vs. limit=15.0 2024-08-17 20:26:53,303 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 7050, loss[loss=0.1008, beats_loss=0.01286, ecapa_loss=0.0001296, whisper_loss=0.08666, over 20424.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01057, ecapa_loss=0.0001488, whisper_loss=0.0906, over 3870197.27 frames. ], batch size: 83, lr: 2.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:26:56,151 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.324e+01 2.539e+01 2.808e+01 3.658e+01, threshold=5.079e+01, percent-clipped=0.0 2024-08-17 20:26:56,286 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-17 20:27:07,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3502020.0, ans=0.1 2024-08-17 20:27:20,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3502120.0, ans=0.04949747468305833 2024-08-17 20:27:39,929 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=6.572e-02 2024-08-17 20:27:46,884 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 22 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-17 20:27:48,189 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 21 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-17 20:27:56,562 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 22 from LS+wenet, 8 from Vox, 28 fro AS 2024-08-17 20:28:00,864 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 25 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-17 20:28:03,825 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 26 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-17 20:28:04,822 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 7100, loss[loss=0.1101, beats_loss=0.009625, ecapa_loss=0.0001257, whisper_loss=0.0992, over 19285.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01064, ecapa_loss=0.0001478, whisper_loss=0.09014, over 3857693.59 frames. ], batch size: 76, lr: 2.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:28:13,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3502420.0, ans=0.0 2024-08-17 20:28:15,336 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2024-08-17 20:28:17,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3502520.0, ans=0.2 2024-08-17 20:28:20,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3502520.0, ans=0.2 2024-08-17 20:28:26,446 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 20:28:32,801 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 20:28:41,108 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 34 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-17 20:28:41,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3502620.0, ans=0.1 2024-08-17 20:28:48,192 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 18 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-17 20:28:54,178 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.08 vs. limit=12.0 2024-08-17 20:29:13,403 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 7150, loss[loss=0.1081, beats_loss=0.01065, ecapa_loss=0.0001643, whisper_loss=0.09585, over 22109.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01069, ecapa_loss=0.0001481, whisper_loss=0.08961, over 3859309.88 frames. ], batch size: 92, lr: 2.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:29:13,514 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 24 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-17 20:29:16,402 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.338e+01 2.584e+01 3.067e+01 1.427e+02, threshold=5.169e+01, percent-clipped=2.0 2024-08-17 20:29:24,234 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.22 vs. limit=10.0 2024-08-17 20:29:29,558 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 37 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-17 20:29:35,114 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 14 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-17 20:29:41,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3503120.0, ans=0.125 2024-08-17 20:29:51,995 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=5.539e-03 2024-08-17 20:29:54,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3503120.0, ans=0.1 2024-08-17 20:29:59,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3503220.0, ans=0.125 2024-08-17 20:30:09,847 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.73 vs. limit=15.0 2024-08-17 20:30:12,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3503320.0, ans=0.1 2024-08-17 20:30:25,187 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 7200, loss[loss=0.09556, beats_loss=0.01135, ecapa_loss=0.0001184, whisper_loss=0.08303, over 17819.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01062, ecapa_loss=0.0001473, whisper_loss=0.08981, over 3870823.94 frames. ], batch size: 69, lr: 2.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:30:27,918 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-17 20:30:38,078 WARNING [optim.py:496] (0/4) Scaling gradients by 0.0867324024438858, model_norm_threshold=51.6881103515625 2024-08-17 20:30:38,250 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.27, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=9.576e+04, grad_sumsq=9.576e+04, orig_rms_sq=1.000e+00 2024-08-17 20:30:45,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3503520.0, ans=0.125 2024-08-17 20:31:17,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3503720.0, ans=0.0 2024-08-17 20:31:19,263 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.15 vs. limit=15.0 2024-08-17 20:31:30,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3503820.0, ans=0.125 2024-08-17 20:31:35,666 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 7250, loss[loss=0.06812, beats_loss=0.01429, ecapa_loss=0.0001757, whisper_loss=0.05208, over 20791.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01073, ecapa_loss=0.0001474, whisper_loss=0.08928, over 3870860.44 frames. ], batch size: 93, lr: 2.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:31:37,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3503920.0, ans=0.2 2024-08-17 20:31:38,258 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.366e+01 2.657e+01 3.070e+01 5.959e+02, threshold=5.314e+01, percent-clipped=2.0 2024-08-17 20:31:57,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3504020.0, ans=0.2 2024-08-17 20:31:58,119 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 25 from LS+wenet, 7 from Vox, 34 fro AS 2024-08-17 20:32:06,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3504120.0, ans=0.0 2024-08-17 20:32:17,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=3504120.0, ans=0.05 2024-08-17 20:32:30,920 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.29 vs. limit=15.0 2024-08-17 20:32:36,742 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 14 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-17 20:32:44,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3504320.0, ans=0.125 2024-08-17 20:32:49,208 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 7300, loss[loss=0.0862, beats_loss=0.009242, ecapa_loss=0.0001685, whisper_loss=0.07527, over 14004.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01066, ecapa_loss=0.0001479, whisper_loss=0.08994, over 3884859.72 frames. ], batch size: 58, lr: 2.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:32:53,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3504420.0, ans=0.125 2024-08-17 20:32:56,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3504420.0, ans=0.0 2024-08-17 20:32:58,168 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 22 from LS+wenet, 9 from Vox, 23 fro AS 2024-08-17 20:33:13,241 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.16 vs. limit=15.0 2024-08-17 20:33:14,664 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 27 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-17 20:33:49,343 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 20 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-17 20:33:49,766 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.71 vs. limit=15.0 2024-08-17 20:33:52,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3504820.0, ans=0.125 2024-08-17 20:33:58,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3504820.0, ans=0.0 2024-08-17 20:34:09,049 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 7350, loss[loss=0.1029, beats_loss=0.01208, ecapa_loss=0.0001223, whisper_loss=0.08961, over 21317.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01066, ecapa_loss=0.0001469, whisper_loss=0.08955, over 3845066.56 frames. ], batch size: 87, lr: 2.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:34:12,373 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.817e+01 2.308e+01 2.645e+01 2.863e+01 4.311e+01, threshold=5.290e+01, percent-clipped=0.0 2024-08-17 20:34:25,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3505020.0, ans=0.1 2024-08-17 20:34:28,844 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.78 vs. limit=10.0 2024-08-17 20:34:39,271 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-17 20:34:57,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3505220.0, ans=0.025 2024-08-17 20:35:16,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3505320.0, ans=0.2 2024-08-17 20:35:29,776 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 7400, loss[loss=0.09302, beats_loss=0.01102, ecapa_loss=0.0001517, whisper_loss=0.08048, over 14732.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01075, ecapa_loss=0.000147, whisper_loss=0.08915, over 3863837.01 frames. ], batch size: 58, lr: 2.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:35:30,299 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-17 20:35:36,363 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-17 20:35:38,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3505420.0, ans=0.2 2024-08-17 20:35:44,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3505520.0, ans=0.09899494936611666 2024-08-17 20:35:47,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3505520.0, ans=0.0 2024-08-17 20:35:47,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3505520.0, ans=0.1 2024-08-17 20:36:09,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3505620.0, ans=0.125 2024-08-17 20:36:12,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3505620.0, ans=0.0 2024-08-17 20:36:15,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3505720.0, ans=0.125 2024-08-17 20:36:33,963 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-17 20:36:47,316 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 7450, loss[loss=0.1083, beats_loss=0.01015, ecapa_loss=0.0001553, whisper_loss=0.09657, over 22896.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01066, ecapa_loss=0.0001478, whisper_loss=0.08989, over 3889597.80 frames. ], batch size: 92, lr: 2.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:36:51,132 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.583e+01 2.401e+01 2.543e+01 2.763e+01 3.752e+01, threshold=5.086e+01, percent-clipped=0.0 2024-08-17 20:36:51,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3505920.0, ans=0.125 2024-08-17 20:37:10,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3506020.0, ans=0.0 2024-08-17 20:37:11,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3506020.0, ans=0.0 2024-08-17 20:37:36,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3506220.0, ans=0.125 2024-08-17 20:37:39,054 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 23 from LS+wenet, 29 from Vox, 42 fro AS 2024-08-17 20:37:49,136 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 25 from LS+wenet, 12 from Vox, 18 fro AS 2024-08-17 20:37:50,833 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.08 vs. limit=15.0 2024-08-17 20:38:06,742 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 7500, loss[loss=0.09621, beats_loss=0.01279, ecapa_loss=0.0001258, whisper_loss=0.08216, over 22775.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01066, ecapa_loss=0.0001471, whisper_loss=0.08967, over 3897203.70 frames. ], batch size: 91, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:38:07,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3506420.0, ans=0.2 2024-08-17 20:38:07,224 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.74 vs. limit=15.0 2024-08-17 20:38:14,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3506420.0, ans=0.0 2024-08-17 20:38:24,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3506520.0, ans=0.2 2024-08-17 20:38:26,912 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-17 20:38:27,595 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.98 vs. limit=15.0 2024-08-17 20:39:01,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3506720.0, ans=0.125 2024-08-17 20:39:06,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3506820.0, ans=0.1 2024-08-17 20:39:22,076 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 7550, loss[loss=0.1099, beats_loss=0.01096, ecapa_loss=0.0001508, whisper_loss=0.09743, over 18828.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0106, ecapa_loss=0.0001466, whisper_loss=0.0895, over 3872022.99 frames. ], batch size: 72, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:39:24,770 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.340e+01 2.512e+01 2.890e+01 6.756e+01, threshold=5.024e+01, percent-clipped=1.0 2024-08-17 20:39:28,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3506920.0, ans=0.125 2024-08-17 20:39:35,742 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-17 20:39:43,479 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.32 vs. limit=10.0 2024-08-17 20:39:55,092 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-17 20:39:58,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3507120.0, ans=0.125 2024-08-17 20:40:15,791 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.16 vs. limit=15.0 2024-08-17 20:40:18,237 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 34 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-17 20:40:37,097 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 7600, loss[loss=0.1075, beats_loss=0.01087, ecapa_loss=0.00014, whisper_loss=0.09527, over 22439.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01048, ecapa_loss=0.0001465, whisper_loss=0.08993, over 3880239.93 frames. ], batch size: 91, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:40:43,472 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-17 20:40:59,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3507520.0, ans=0.0 2024-08-17 20:41:05,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3507620.0, ans=0.125 2024-08-17 20:41:15,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3507620.0, ans=0.1 2024-08-17 20:41:30,506 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-17 20:41:33,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3507820.0, ans=0.2 2024-08-17 20:41:43,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3507820.0, ans=0.0 2024-08-17 20:41:45,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3507820.0, ans=0.1 2024-08-17 20:41:48,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3507920.0, ans=0.125 2024-08-17 20:41:49,140 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 7650, loss[loss=0.1133, beats_loss=0.01039, ecapa_loss=0.0001415, whisper_loss=0.1015, over 22251.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01045, ecapa_loss=0.0001457, whisper_loss=0.09043, over 3884876.57 frames. ], batch size: 87, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:41:49,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3507920.0, ans=0.125 2024-08-17 20:41:49,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3507920.0, ans=0.0 2024-08-17 20:41:52,190 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.783e+01 2.316e+01 2.490e+01 2.754e+01 3.586e+01, threshold=4.980e+01, percent-clipped=0.0 2024-08-17 20:42:11,885 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 24 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-17 20:42:27,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3508120.0, ans=0.125 2024-08-17 20:42:44,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3508220.0, ans=0.0 2024-08-17 20:42:52,564 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 16 from LS+wenet, 10 from Vox, 39 fro AS 2024-08-17 20:43:00,461 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 21 from LS+wenet, 21 from Vox, 53 fro AS 2024-08-17 20:43:02,385 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 7700, loss[loss=0.08055, beats_loss=0.01381, ecapa_loss=0.0001029, whisper_loss=0.06571, over 23630.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01048, ecapa_loss=0.0001455, whisper_loss=0.09032, over 3889245.60 frames. ], batch size: 95, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:43:13,371 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.832e+01 2024-08-17 20:43:26,272 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.39 vs. limit=12.0 2024-08-17 20:43:27,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3508520.0, ans=0.125 2024-08-17 20:43:34,043 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 14 from LS+wenet, 24 from Vox, 19 fro AS 2024-08-17 20:43:40,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3508620.0, ans=0.125 2024-08-17 20:43:41,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3508620.0, ans=0.125 2024-08-17 20:43:46,356 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 24 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-17 20:43:52,603 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2024-08-17 20:43:59,956 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.74 vs. limit=15.0 2024-08-17 20:44:20,367 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 7750, loss[loss=0.1072, beats_loss=0.01158, ecapa_loss=0.0001469, whisper_loss=0.09418, over 22074.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01049, ecapa_loss=0.0001454, whisper_loss=0.09001, over 3872998.75 frames. ], batch size: 89, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:44:23,682 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.352e+01 2.581e+01 3.039e+01 8.036e+01, threshold=5.163e+01, percent-clipped=1.0 2024-08-17 20:44:30,105 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-17 20:44:48,294 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-17 20:45:09,920 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-17 20:45:36,113 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 7800, loss[loss=0.08876, beats_loss=0.01405, ecapa_loss=0.0001006, whisper_loss=0.0737, over 17458.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0105, ecapa_loss=0.0001468, whisper_loss=0.09004, over 3906599.59 frames. ], batch size: 69, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:45:44,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3509420.0, ans=0.04949747468305833 2024-08-17 20:45:48,538 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-17 20:46:06,530 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-17 20:46:12,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3509620.0, ans=0.025 2024-08-17 20:46:21,290 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 23 from LS+wenet, 9 from Vox, 24 fro AS 2024-08-17 20:46:26,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3509720.0, ans=0.0 2024-08-17 20:46:30,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3509720.0, ans=0.1 2024-08-17 20:46:31,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3509720.0, ans=0.0 2024-08-17 20:46:39,332 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.61 vs. limit=10.0 2024-08-17 20:46:42,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3509820.0, ans=0.0 2024-08-17 20:46:51,372 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 7850, loss[loss=0.1076, beats_loss=0.01125, ecapa_loss=0.0001317, whisper_loss=0.09503, over 16880.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01059, ecapa_loss=0.0001466, whisper_loss=0.08979, over 3898764.00 frames. ], batch size: 66, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:46:53,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3509920.0, ans=0.125 2024-08-17 20:46:54,178 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.319e+01 2.575e+01 2.873e+01 4.382e+02, threshold=5.150e+01, percent-clipped=1.0 2024-08-17 20:47:04,126 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 15 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-17 20:47:23,758 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 27 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-17 20:47:23,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3510120.0, ans=0.125 2024-08-17 20:47:25,064 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 24 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-17 20:47:25,713 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.80 vs. limit=15.0 2024-08-17 20:47:29,719 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 29 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-17 20:47:41,207 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 25 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-17 20:47:43,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3510220.0, ans=0.0 2024-08-17 20:47:57,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3510320.0, ans=0.125 2024-08-17 20:48:00,167 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.295e-01 2024-08-17 20:48:01,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3510320.0, ans=0.125 2024-08-17 20:48:01,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3510320.0, ans=0.2 2024-08-17 20:48:03,790 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 7900, loss[loss=0.1138, beats_loss=0.01002, ecapa_loss=0.000147, whisper_loss=0.1023, over 18862.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01065, ecapa_loss=0.0001471, whisper_loss=0.08985, over 3885475.41 frames. ], batch size: 73, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:48:11,762 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.52 vs. limit=22.5 2024-08-17 20:48:17,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3510520.0, ans=0.125 2024-08-17 20:48:20,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3510520.0, ans=0.125 2024-08-17 20:48:21,439 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 16 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-17 20:48:22,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3510520.0, ans=0.125 2024-08-17 20:48:24,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3510520.0, ans=0.125 2024-08-17 20:48:30,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3510620.0, ans=0.0 2024-08-17 20:48:51,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3510720.0, ans=0.0 2024-08-17 20:48:53,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3510720.0, ans=0.125 2024-08-17 20:49:03,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3510820.0, ans=0.0 2024-08-17 20:49:11,869 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3510820.0, ans=0.125 2024-08-17 20:49:14,159 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 7950, loss[loss=0.0965, beats_loss=0.009891, ecapa_loss=0.0001435, whisper_loss=0.08517, over 16591.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01065, ecapa_loss=0.0001463, whisper_loss=0.08987, over 3904784.43 frames. ], batch size: 66, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:49:16,456 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.370e+01 2.553e+01 2.861e+01 6.638e+01, threshold=5.106e+01, percent-clipped=2.0 2024-08-17 20:49:22,263 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-17 20:49:25,418 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-17 20:49:37,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3511020.0, ans=0.125 2024-08-17 20:49:43,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3511120.0, ans=0.125 2024-08-17 20:49:45,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3511120.0, ans=0.125 2024-08-17 20:49:51,625 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.65 vs. limit=22.5 2024-08-17 20:49:56,204 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 24 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-17 20:50:09,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3511220.0, ans=0.125 2024-08-17 20:50:16,947 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-17 20:50:26,076 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 8000, loss[loss=0.07853, beats_loss=0.01413, ecapa_loss=0.0001502, whisper_loss=0.0629, over 13872.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01071, ecapa_loss=0.0001453, whisper_loss=0.09013, over 3932010.79 frames. ], batch size: 59, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:50:39,594 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 32 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-17 20:51:05,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3511620.0, ans=0.125 2024-08-17 20:51:15,735 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-17 20:51:15,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3511720.0, ans=0.1 2024-08-17 20:51:20,205 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-17 20:51:25,869 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 15 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-17 20:51:32,509 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-17 20:51:37,882 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 22 from LS+wenet, 6 from Vox, 30 fro AS 2024-08-17 20:51:40,625 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 8050, loss[loss=0.1008, beats_loss=0.007645, ecapa_loss=0.0001429, whisper_loss=0.09168, over 15155.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01077, ecapa_loss=0.0001452, whisper_loss=0.089, over 3899344.92 frames. ], batch size: 56, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:51:44,028 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.251e+01 2.590e+01 2.848e+01 4.049e+01, threshold=5.180e+01, percent-clipped=0.0 2024-08-17 20:51:45,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3511920.0, ans=0.05 2024-08-17 20:51:58,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3512020.0, ans=0.0 2024-08-17 20:52:20,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3512120.0, ans=0.1 2024-08-17 20:52:21,003 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.42 vs. limit=22.5 2024-08-17 20:52:36,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3512320.0, ans=0.0 2024-08-17 20:52:37,222 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.22 vs. limit=15.0 2024-08-17 20:52:38,454 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.25 vs. limit=15.0 2024-08-17 20:52:39,147 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 17 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-17 20:52:43,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3512320.0, ans=0.1 2024-08-17 20:52:44,775 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-17 20:52:49,849 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 8100, loss[loss=0.1002, beats_loss=0.01092, ecapa_loss=0.0001516, whisper_loss=0.08776, over 21607.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01073, ecapa_loss=0.0001458, whisper_loss=0.08927, over 3883089.19 frames. ], batch size: 89, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:52:55,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3512420.0, ans=0.0 2024-08-17 20:52:59,495 WARNING [optim.py:496] (0/4) Scaling gradients by 0.0995599552989006, model_norm_threshold=51.80342102050781 2024-08-17 20:52:59,666 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.42, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.127e+05, grad_sumsq=1.103e+07, orig_rms_sq=1.022e-02 2024-08-17 20:53:02,710 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-17 20:53:03,513 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.63 vs. limit=15.0 2024-08-17 20:53:25,010 WARNING [optim.py:496] (0/4) Scaling gradients by 0.09344208240509033, model_norm_threshold=51.80342102050781 2024-08-17 20:53:25,187 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.conv_module1.depthwise_conv.causal_conv.weight with proportion 0.20, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.259e+04, grad_sumsq=1.018e+05, orig_rms_sq=6.150e-01 2024-08-17 20:53:59,280 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 8150, loss[loss=0.08498, beats_loss=0.009994, ecapa_loss=0.0001438, whisper_loss=0.07354, over 18286.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01064, ecapa_loss=0.0001467, whisper_loss=0.08946, over 3875998.63 frames. ], batch size: 74, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:54:02,013 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.389e+01 2.629e+01 2.995e+01 5.544e+02, threshold=5.257e+01, percent-clipped=3.0 2024-08-17 20:54:07,258 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.72 vs. limit=15.0 2024-08-17 20:54:15,727 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.47 vs. limit=15.0 2024-08-17 20:54:18,854 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.43 vs. limit=12.0 2024-08-17 20:54:24,319 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 23 from LS+wenet, 35 from Vox, 34 fro AS 2024-08-17 20:54:33,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3513120.0, ans=0.125 2024-08-17 20:54:38,438 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-17 20:55:05,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3513320.0, ans=0.125 2024-08-17 20:55:08,098 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 8200, loss[loss=0.1072, beats_loss=0.008732, ecapa_loss=0.0001801, whisper_loss=0.09662, over 19247.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01062, ecapa_loss=0.0001474, whisper_loss=0.08891, over 3864784.79 frames. ], batch size: 79, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:55:08,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3513420.0, ans=0.0 2024-08-17 20:55:13,764 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 18 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-17 20:55:24,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3513520.0, ans=0.0 2024-08-17 20:55:31,339 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.11 vs. limit=15.0 2024-08-17 20:55:44,769 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 27 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-17 20:55:54,011 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 15 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-17 20:56:14,608 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 8250, loss[loss=0.1011, beats_loss=0.01107, ecapa_loss=0.0001301, whisper_loss=0.08869, over 20699.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01053, ecapa_loss=0.0001474, whisper_loss=0.08951, over 3858791.28 frames. ], batch size: 81, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:56:17,218 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.807e+01 2.363e+01 2.592e+01 2.897e+01 5.680e+01, threshold=5.184e+01, percent-clipped=1.0 2024-08-17 20:56:30,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3514020.0, ans=0.125 2024-08-17 20:56:52,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3514120.0, ans=0.0 2024-08-17 20:56:54,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3514220.0, ans=0.1 2024-08-17 20:56:54,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3514220.0, ans=0.125 2024-08-17 20:56:58,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3514220.0, ans=0.0 2024-08-17 20:56:59,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3514220.0, ans=0.1 2024-08-17 20:57:07,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3514320.0, ans=0.09899494936611666 2024-08-17 20:57:08,536 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 17 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-17 20:57:19,749 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 8300, loss[loss=0.1065, beats_loss=0.009181, ecapa_loss=0.0001614, whisper_loss=0.0957, over 22125.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01059, ecapa_loss=0.0001471, whisper_loss=0.08962, over 3876070.40 frames. ], batch size: 92, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:57:45,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3514620.0, ans=0.125 2024-08-17 20:57:53,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3514620.0, ans=0.0 2024-08-17 20:57:56,738 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.57 vs. limit=22.5 2024-08-17 20:58:01,425 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 32 from Vox, 34 fro AS 2024-08-17 20:58:07,182 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 41 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-17 20:58:10,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3514720.0, ans=0.125 2024-08-17 20:58:15,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3514820.0, ans=0.0 2024-08-17 20:58:17,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3514820.0, ans=0.125 2024-08-17 20:58:18,827 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 26 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-17 20:58:22,072 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 18 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-17 20:58:28,637 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 8350, loss[loss=0.1253, beats_loss=0.009729, ecapa_loss=0.0001441, whisper_loss=0.1141, over 20510.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01057, ecapa_loss=0.0001469, whisper_loss=0.0899, over 3895475.39 frames. ], batch size: 81, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:58:32,343 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.299e+01 2.560e+01 2.760e+01 1.780e+02, threshold=5.121e+01, percent-clipped=1.0 2024-08-17 20:58:38,402 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-17 20:58:43,354 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-17 20:58:44,554 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 24 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-17 20:58:54,195 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 24 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-17 20:59:02,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3515120.0, ans=0.0 2024-08-17 20:59:13,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3515220.0, ans=0.125 2024-08-17 20:59:17,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3515220.0, ans=0.0 2024-08-17 20:59:18,445 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 28 from LS+wenet, 28 from Vox, 24 fro AS 2024-08-17 20:59:23,581 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.20 vs. limit=15.0 2024-08-17 20:59:27,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3515320.0, ans=0.0 2024-08-17 20:59:36,002 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.33 vs. limit=15.0 2024-08-17 20:59:40,370 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 8400, loss[loss=0.1039, beats_loss=0.0122, ecapa_loss=0.0001315, whisper_loss=0.0904, over 21891.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01058, ecapa_loss=0.0001472, whisper_loss=0.09061, over 3936760.06 frames. ], batch size: 88, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:59:54,483 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 22 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-17 21:00:05,945 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-17 21:00:07,499 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 23 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-17 21:00:20,732 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.64 vs. limit=10.0 2024-08-17 21:00:41,168 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 36 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-17 21:00:42,462 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 21:00:42,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=3515820.0, ans=0.1 2024-08-17 21:00:48,568 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 8450, loss[loss=0.1135, beats_loss=0.009946, ecapa_loss=0.0001435, whisper_loss=0.1021, over 22817.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01057, ecapa_loss=0.0001474, whisper_loss=0.0905, over 3945555.20 frames. ], batch size: 89, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:00:50,403 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 33 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-17 21:00:51,503 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+01 2.340e+01 2.576e+01 2.813e+01 3.735e+01, threshold=5.152e+01, percent-clipped=0.0 2024-08-17 21:01:02,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3516020.0, ans=0.1 2024-08-17 21:01:10,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3516020.0, ans=0.125 2024-08-17 21:01:10,875 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 16 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-17 21:01:13,846 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-17 21:01:31,874 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 27 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-17 21:01:34,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3516220.0, ans=0.2 2024-08-17 21:01:42,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3516220.0, ans=0.2 2024-08-17 21:01:49,859 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 22 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-17 21:01:59,949 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 8500, loss[loss=0.112, beats_loss=0.01121, ecapa_loss=0.0001401, whisper_loss=0.09941, over 18763.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01049, ecapa_loss=0.0001481, whisper_loss=0.0908, over 3937434.06 frames. ], batch size: 75, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:02:08,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3516420.0, ans=0.0 2024-08-17 21:02:18,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3516520.0, ans=0.0 2024-08-17 21:02:23,880 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 35 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-17 21:02:26,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3516620.0, ans=0.125 2024-08-17 21:02:54,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3516720.0, ans=0.125 2024-08-17 21:02:56,186 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 34 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-17 21:02:59,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3516820.0, ans=0.125 2024-08-17 21:02:59,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3516820.0, ans=0.0 2024-08-17 21:03:07,503 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 14 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-17 21:03:09,416 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.92 vs. limit=22.5 2024-08-17 21:03:12,985 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 8550, loss[loss=0.1092, beats_loss=0.01032, ecapa_loss=0.0001382, whisper_loss=0.09752, over 19128.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0104, ecapa_loss=0.000149, whisper_loss=0.09087, over 3909302.15 frames. ], batch size: 74, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:03:16,620 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.766e+01 2.310e+01 2.634e+01 2.977e+01 2.577e+02, threshold=5.269e+01, percent-clipped=4.0 2024-08-17 21:03:18,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3516920.0, ans=0.0 2024-08-17 21:03:35,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3517020.0, ans=0.0 2024-08-17 21:03:35,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3517020.0, ans=0.07 2024-08-17 21:03:54,976 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.31 vs. limit=15.0 2024-08-17 21:04:11,884 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-17 21:04:24,306 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 8600, loss[loss=0.1249, beats_loss=0.008913, ecapa_loss=0.0001362, whisper_loss=0.1146, over 17956.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01033, ecapa_loss=0.0001492, whisper_loss=0.0913, over 3880326.05 frames. ], batch size: 69, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:04:36,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3517420.0, ans=0.0 2024-08-17 21:04:49,754 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.60 vs. limit=15.0 2024-08-17 21:04:53,693 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 30 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-17 21:05:10,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3517720.0, ans=0.125 2024-08-17 21:05:12,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3517720.0, ans=0.125 2024-08-17 21:05:12,906 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.77 vs. limit=6.0 2024-08-17 21:05:15,480 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.87 vs. limit=15.0 2024-08-17 21:05:17,698 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-17 21:05:28,634 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.88 vs. limit=15.0 2024-08-17 21:05:36,490 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 8650, loss[loss=0.1174, beats_loss=0.009578, ecapa_loss=0.0001396, whisper_loss=0.1064, over 20285.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01029, ecapa_loss=0.0001495, whisper_loss=0.09173, over 3885864.91 frames. ], batch size: 78, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:05:39,275 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.019e+01 2.385e+01 2.642e+01 2.981e+01 2.241e+02, threshold=5.284e+01, percent-clipped=1.0 2024-08-17 21:05:39,471 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 17 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-17 21:05:48,823 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.94 vs. limit=15.0 2024-08-17 21:05:50,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3518020.0, ans=0.125 2024-08-17 21:05:51,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3518020.0, ans=0.0 2024-08-17 21:05:53,137 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=17.02 vs. limit=15.0 2024-08-17 21:05:59,445 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-17 21:06:17,421 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-17 21:06:31,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3518220.0, ans=0.2 2024-08-17 21:06:33,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3518320.0, ans=0.125 2024-08-17 21:06:38,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3518320.0, ans=0.0 2024-08-17 21:06:49,459 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 8700, loss[loss=0.08672, beats_loss=0.01028, ecapa_loss=0.0001476, whisper_loss=0.07497, over 18458.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01039, ecapa_loss=0.0001492, whisper_loss=0.09135, over 3858488.62 frames. ], batch size: 75, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:06:51,215 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-17 21:07:10,241 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.00 vs. limit=22.5 2024-08-17 21:07:21,186 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 22 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-17 21:07:37,658 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-17 21:07:47,830 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 28 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-17 21:07:52,774 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.06 vs. limit=22.5 2024-08-17 21:08:03,626 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 8750, loss[loss=0.1017, beats_loss=0.01076, ecapa_loss=0.000146, whisper_loss=0.08951, over 21651.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01039, ecapa_loss=0.0001495, whisper_loss=0.09129, over 3838671.57 frames. ], batch size: 87, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:08:07,238 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.737e+01 2.308e+01 2.508e+01 2.741e+01 1.105e+02, threshold=5.017e+01, percent-clipped=1.0 2024-08-17 21:08:07,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3518920.0, ans=0.125 2024-08-17 21:08:41,021 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=11.02 vs. limit=12.0 2024-08-17 21:08:41,723 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 19 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-17 21:08:44,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3519120.0, ans=0.0 2024-08-17 21:09:01,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3519220.0, ans=0.0 2024-08-17 21:09:07,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3519320.0, ans=0.125 2024-08-17 21:09:20,853 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 8800, loss[loss=0.1043, beats_loss=0.01265, ecapa_loss=0.0001169, whisper_loss=0.09045, over 22208.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01061, ecapa_loss=0.0001479, whisper_loss=0.09008, over 3865581.69 frames. ], batch size: 90, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:09:28,436 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.306e-02 2024-08-17 21:09:31,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3519420.0, ans=0.125 2024-08-17 21:09:32,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3519420.0, ans=0.125 2024-08-17 21:09:44,693 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-17 21:09:47,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3519520.0, ans=0.2 2024-08-17 21:09:55,296 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.47 vs. limit=15.0 2024-08-17 21:09:59,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3519620.0, ans=0.0 2024-08-17 21:10:08,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3519720.0, ans=0.09899494936611666 2024-08-17 21:10:21,788 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.44 vs. limit=22.5 2024-08-17 21:10:34,490 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 8850, loss[loss=0.09492, beats_loss=0.01148, ecapa_loss=0.0001571, whisper_loss=0.08188, over 16736.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01063, ecapa_loss=0.0001481, whisper_loss=0.0906, over 3842021.46 frames. ], batch size: 69, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:10:34,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3519920.0, ans=0.125 2024-08-17 21:10:37,029 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.338e+01 2.557e+01 2.876e+01 3.818e+01, threshold=5.114e+01, percent-clipped=0.0 2024-08-17 21:10:45,891 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-352000.pt 2024-08-17 21:10:50,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3520020.0, ans=0.1 2024-08-17 21:11:41,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3520320.0, ans=0.125 2024-08-17 21:11:50,162 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 8900, loss[loss=0.1049, beats_loss=0.01053, ecapa_loss=0.0001209, whisper_loss=0.09314, over 23434.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0106, ecapa_loss=0.000147, whisper_loss=0.09105, over 3889254.07 frames. ], batch size: 89, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:12:03,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=3520520.0, ans=0.05 2024-08-17 21:12:05,577 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.29 vs. limit=15.0 2024-08-17 21:12:06,392 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 24 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-17 21:12:23,148 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-17 21:12:33,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3520620.0, ans=0.05 2024-08-17 21:12:34,057 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.48 vs. limit=15.0 2024-08-17 21:12:35,265 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-17 21:12:51,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3520820.0, ans=0.05 2024-08-17 21:12:52,648 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-17 21:12:54,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3520820.0, ans=0.1 2024-08-17 21:12:57,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3520820.0, ans=0.07 2024-08-17 21:13:07,608 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 8950, loss[loss=0.1031, beats_loss=0.01063, ecapa_loss=0.0001283, whisper_loss=0.09117, over 14849.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01058, ecapa_loss=0.0001459, whisper_loss=0.09065, over 3870297.70 frames. ], batch size: 58, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:13:10,742 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.743e+01 2.279e+01 2.513e+01 2.850e+01 4.067e+01, threshold=5.027e+01, percent-clipped=0.0 2024-08-17 21:13:23,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3521020.0, ans=0.0 2024-08-17 21:13:45,430 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 21 from LS+wenet, 32 from Vox, 37 fro AS 2024-08-17 21:13:45,965 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.59 vs. limit=22.5 2024-08-17 21:13:55,978 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-17 21:14:01,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3521220.0, ans=0.0 2024-08-17 21:14:10,447 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 14 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-17 21:14:11,307 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.71 vs. limit=15.0 2024-08-17 21:14:14,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3521320.0, ans=0.0 2024-08-17 21:14:16,890 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.89 vs. limit=22.5 2024-08-17 21:14:25,592 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 31 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-17 21:14:25,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3521420.0, ans=0.125 2024-08-17 21:14:26,720 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 9000, loss[loss=0.1218, beats_loss=0.00908, ecapa_loss=0.0001075, whisper_loss=0.1116, over 22097.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01056, ecapa_loss=0.0001465, whisper_loss=0.09082, over 3875571.98 frames. ], batch size: 81, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:14:26,721 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-17 21:15:03,692 INFO [train_multi_KD3.py:1149] (0/4) Epoch 24, validation on ASR_libri: loss=0.2507, beats_loss=0, ecapa_loss=0.0005281, whisper_loss=0.2454, over 922467.00 frames. 2024-08-17 21:15:22,224 INFO [train_multi_KD3.py:1149] (0/4) Epoch 24, validation on SV_voxceleb1: loss=0.004114, beats_loss=0, ecapa_loss=0.0004114, whisper_loss=0, over 939242.00 frames. 2024-08-17 21:15:49,759 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([6.3167e-04, 2.3365e-02, 1.0382e-03, 3.4210e+00, 4.6483e-03, 4.9344e-02, 3.1123e-02, 3.3386e-02], device='cuda:0') 2024-08-17 21:17:01,706 INFO [train_multi_KD3.py:1149] (0/4) Epoch 24, validation on AT_audioset: loss=0.02322, beats_loss=0.02322, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-17 21:17:01,712 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32389MB 2024-08-17 21:17:18,260 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.53 vs. limit=15.0 2024-08-17 21:17:25,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3521520.0, ans=0.125 2024-08-17 21:17:30,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3521620.0, ans=0.125 2024-08-17 21:17:30,322 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.12 vs. limit=22.5 2024-08-17 21:17:47,377 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 17 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-17 21:18:10,080 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 24 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-17 21:18:17,980 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 9050, loss[loss=0.09139, beats_loss=0.01035, ecapa_loss=0.0001513, whisper_loss=0.07953, over 17633.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01054, ecapa_loss=0.0001463, whisper_loss=0.09081, over 3874154.63 frames. ], batch size: 71, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:18:18,189 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 13 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-17 21:18:21,894 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.405e+01 2.622e+01 2.954e+01 2.025e+02, threshold=5.245e+01, percent-clipped=2.0 2024-08-17 21:18:27,792 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-17 21:18:29,922 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-17 21:18:50,791 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 21 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-17 21:18:52,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3522120.0, ans=0.125 2024-08-17 21:18:52,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3522120.0, ans=0.125 2024-08-17 21:19:21,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3522320.0, ans=0.2 2024-08-17 21:19:22,422 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-17 21:19:33,206 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.43 vs. limit=15.0 2024-08-17 21:19:34,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3522320.0, ans=0.125 2024-08-17 21:19:38,106 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 9100, loss[loss=0.111, beats_loss=0.008687, ecapa_loss=0.0001451, whisper_loss=0.1009, over 21219.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01047, ecapa_loss=0.0001465, whisper_loss=0.09092, over 3865409.10 frames. ], batch size: 85, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:19:54,032 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 21:20:11,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3522620.0, ans=0.0 2024-08-17 21:20:15,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3522620.0, ans=0.125 2024-08-17 21:20:29,074 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 20 from LS+wenet, 13 from Vox, 41 fro AS 2024-08-17 21:20:31,019 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.86 vs. limit=15.0 2024-08-17 21:20:51,041 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 9150, loss[loss=0.1126, beats_loss=0.008501, ecapa_loss=0.000157, whisper_loss=0.1025, over 23447.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0105, ecapa_loss=0.0001457, whisper_loss=0.09085, over 3896618.23 frames. ], batch size: 90, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:20:54,004 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.307e+01 2.548e+01 2.836e+01 3.815e+01, threshold=5.097e+01, percent-clipped=0.0 2024-08-17 21:21:03,751 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.14 vs. limit=15.0 2024-08-17 21:21:08,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3523020.0, ans=0.125 2024-08-17 21:21:33,291 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 27 from LS+wenet, 8 from Vox, 25 fro AS 2024-08-17 21:22:01,294 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 9200, loss[loss=0.1181, beats_loss=0.01032, ecapa_loss=0.0001297, whisper_loss=0.1065, over 23217.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01053, ecapa_loss=0.000146, whisper_loss=0.09054, over 3908515.39 frames. ], batch size: 91, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:22:03,865 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 21 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-17 21:22:04,431 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.25 vs. limit=15.0 2024-08-17 21:22:06,717 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-17 21:22:18,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3523520.0, ans=0.125 2024-08-17 21:22:40,599 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-17 21:22:56,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3523820.0, ans=0.2 2024-08-17 21:23:06,133 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 9250, loss[loss=0.0916, beats_loss=0.01137, ecapa_loss=0.0001733, whisper_loss=0.07849, over 18264.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01053, ecapa_loss=0.000148, whisper_loss=0.09038, over 3930147.46 frames. ], batch size: 76, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:23:06,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3523920.0, ans=0.0 2024-08-17 21:23:08,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3523920.0, ans=0.0 2024-08-17 21:23:08,952 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.350e+01 2.657e+01 3.037e+01 4.188e+01, threshold=5.314e+01, percent-clipped=0.0 2024-08-17 21:23:19,246 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 21:23:19,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3524020.0, ans=0.0 2024-08-17 21:23:20,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3524020.0, ans=0.125 2024-08-17 21:23:28,618 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-17 21:23:40,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3524120.0, ans=0.2 2024-08-17 21:23:40,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3524120.0, ans=0.125 2024-08-17 21:23:51,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3524220.0, ans=0.0 2024-08-17 21:23:54,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3524220.0, ans=0.04949747468305833 2024-08-17 21:24:12,772 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.56 vs. limit=15.0 2024-08-17 21:24:13,198 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 9300, loss[loss=0.1138, beats_loss=0.008509, ecapa_loss=0.0001668, whisper_loss=0.1036, over 19567.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01054, ecapa_loss=0.0001483, whisper_loss=0.09055, over 3941741.70 frames. ], batch size: 78, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:24:20,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3524420.0, ans=0.125 2024-08-17 21:24:22,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3524420.0, ans=0.2 2024-08-17 21:24:23,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3524420.0, ans=0.1 2024-08-17 21:24:30,415 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.24 vs. limit=15.0 2024-08-17 21:24:31,155 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 30 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-17 21:24:36,833 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=3.421e-02 2024-08-17 21:24:37,734 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 19 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-17 21:24:38,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3524620.0, ans=0.0 2024-08-17 21:24:41,460 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-17 21:24:47,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3524620.0, ans=0.0 2024-08-17 21:24:48,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3524620.0, ans=0.125 2024-08-17 21:24:54,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3524720.0, ans=0.05 2024-08-17 21:25:16,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3524820.0, ans=0.2 2024-08-17 21:25:18,741 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 9350, loss[loss=0.08589, beats_loss=0.01027, ecapa_loss=0.0001678, whisper_loss=0.07395, over 19721.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01047, ecapa_loss=0.0001484, whisper_loss=0.09101, over 3906011.87 frames. ], batch size: 83, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:25:20,623 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 17 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-17 21:25:20,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3524920.0, ans=0.125 2024-08-17 21:25:21,630 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.849e+01 2.336e+01 2.549e+01 2.795e+01 4.217e+01, threshold=5.098e+01, percent-clipped=0.0 2024-08-17 21:25:43,267 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 23 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-17 21:25:43,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3525020.0, ans=0.125 2024-08-17 21:25:46,357 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3525120.0, ans=0.125 2024-08-17 21:25:53,739 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 15 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-17 21:25:54,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3525120.0, ans=0.1 2024-08-17 21:25:56,276 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 16 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-17 21:25:57,772 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 26 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-17 21:26:08,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3525220.0, ans=0.09899494936611666 2024-08-17 21:26:14,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3525320.0, ans=0.125 2024-08-17 21:26:24,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3525320.0, ans=0.0 2024-08-17 21:26:27,690 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 9400, loss[loss=0.1077, beats_loss=0.01045, ecapa_loss=0.0001156, whisper_loss=0.09614, over 18452.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01047, ecapa_loss=0.000148, whisper_loss=0.09097, over 3872506.90 frames. ], batch size: 72, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:26:32,003 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 25 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-17 21:26:32,897 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.21 vs. limit=10.0 2024-08-17 21:26:43,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3525520.0, ans=0.125 2024-08-17 21:26:47,071 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-17 21:27:02,528 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.868e+05 2024-08-17 21:27:03,519 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-17 21:27:03,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3525620.0, ans=0.025 2024-08-17 21:27:11,532 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.54 vs. limit=12.0 2024-08-17 21:27:35,349 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.67 vs. limit=22.5 2024-08-17 21:27:37,757 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 9450, loss[loss=0.1031, beats_loss=0.008553, ecapa_loss=0.0001669, whisper_loss=0.09287, over 16087.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01048, ecapa_loss=0.0001493, whisper_loss=0.09087, over 3868295.74 frames. ], batch size: 67, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:27:41,288 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.632e+01 2.404e+01 2.620e+01 3.015e+01 5.071e+01, threshold=5.241e+01, percent-clipped=0.0 2024-08-17 21:28:04,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3526120.0, ans=0.125 2024-08-17 21:28:05,110 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 24 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-17 21:28:13,234 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 38 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-17 21:28:15,242 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.21 vs. limit=12.0 2024-08-17 21:28:40,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3526320.0, ans=0.1 2024-08-17 21:28:48,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3526420.0, ans=0.0 2024-08-17 21:28:49,404 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 9500, loss[loss=0.09959, beats_loss=0.01188, ecapa_loss=0.0001544, whisper_loss=0.08616, over 22155.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0105, ecapa_loss=0.0001486, whisper_loss=0.09086, over 3877424.03 frames. ], batch size: 94, lr: 2.55e-03, grad_scale: 1.152921504606847e+18 2024-08-17 21:28:53,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3526420.0, ans=0.09899494936611666 2024-08-17 21:29:06,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3526520.0, ans=0.0 2024-08-17 21:29:16,372 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-17 21:29:23,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3526620.0, ans=0.2 2024-08-17 21:29:29,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3526620.0, ans=0.1 2024-08-17 21:30:10,015 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 9550, loss[loss=0.1074, beats_loss=0.009111, ecapa_loss=0.0001763, whisper_loss=0.09655, over 19713.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01053, ecapa_loss=0.0001489, whisper_loss=0.09021, over 3879961.24 frames. ], batch size: 82, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:30:15,620 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.375e+01 2.611e+01 2.915e+01 4.364e+01, threshold=5.222e+01, percent-clipped=0.0 2024-08-17 21:30:16,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3526920.0, ans=0.2 2024-08-17 21:30:25,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3526920.0, ans=0.0 2024-08-17 21:30:28,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3527020.0, ans=0.125 2024-08-17 21:30:44,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3527020.0, ans=0.0 2024-08-17 21:30:44,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3527020.0, ans=0.2 2024-08-17 21:31:04,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3527120.0, ans=0.125 2024-08-17 21:31:18,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3527220.0, ans=0.125 2024-08-17 21:31:21,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3527220.0, ans=0.0 2024-08-17 21:31:23,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3527220.0, ans=0.125 2024-08-17 21:31:28,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3527320.0, ans=0.1 2024-08-17 21:31:31,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3527320.0, ans=0.125 2024-08-17 21:31:44,509 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 9600, loss[loss=0.09243, beats_loss=0.009645, ecapa_loss=0.0001671, whisper_loss=0.08112, over 14586.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01055, ecapa_loss=0.0001482, whisper_loss=0.08978, over 3899647.78 frames. ], batch size: 56, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:31:50,346 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 9 from Vox, 34 fro AS 2024-08-17 21:32:13,818 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.11 vs. limit=15.0 2024-08-17 21:32:32,410 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-17 21:32:39,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3527620.0, ans=0.0 2024-08-17 21:32:42,769 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.96 vs. limit=15.0 2024-08-17 21:32:51,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3527720.0, ans=0.2 2024-08-17 21:33:06,467 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-17 21:33:23,372 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 9650, loss[loss=0.1303, beats_loss=0.00793, ecapa_loss=0.0001472, whisper_loss=0.1209, over 22792.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0105, ecapa_loss=0.0001486, whisper_loss=0.08948, over 3895709.47 frames. ], batch size: 85, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:33:24,273 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.01 vs. limit=12.0 2024-08-17 21:33:29,200 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.929e+01 2.363e+01 2.592e+01 2.956e+01 8.123e+01, threshold=5.183e+01, percent-clipped=2.0 2024-08-17 21:33:37,931 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-17 21:34:45,974 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 23 from LS+wenet, 26 from Vox, 46 fro AS 2024-08-17 21:35:04,729 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 9700, loss[loss=0.1125, beats_loss=0.009918, ecapa_loss=0.0001494, whisper_loss=0.1011, over 14722.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01051, ecapa_loss=0.0001493, whisper_loss=0.08966, over 3882440.45 frames. ], batch size: 60, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:35:12,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3528420.0, ans=0.0 2024-08-17 21:35:30,630 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.86 vs. limit=22.5 2024-08-17 21:36:10,328 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-17 21:36:16,940 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 9750, loss[loss=0.1164, beats_loss=0.009325, ecapa_loss=0.0001543, whisper_loss=0.1056, over 20851.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01051, ecapa_loss=0.0001481, whisper_loss=0.08988, over 3861888.57 frames. ], batch size: 85, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:36:20,771 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.350e+01 2.623e+01 3.001e+01 4.380e+01, threshold=5.247e+01, percent-clipped=0.0 2024-08-17 21:36:27,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3528920.0, ans=0.0 2024-08-17 21:36:32,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3529020.0, ans=0.0 2024-08-17 21:36:37,062 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.02 vs. limit=8.0 2024-08-17 21:36:45,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3529120.0, ans=0.0 2024-08-17 21:36:55,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3529120.0, ans=0.0 2024-08-17 21:37:19,924 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-17 21:37:25,663 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 26 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-17 21:37:30,201 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 9800, loss[loss=0.1171, beats_loss=0.008929, ecapa_loss=0.0001316, whisper_loss=0.1068, over 14725.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01051, ecapa_loss=0.0001475, whisper_loss=0.09033, over 3869222.97 frames. ], batch size: 56, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:37:34,567 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.02 vs. limit=15.0 2024-08-17 21:37:37,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3529420.0, ans=0.0 2024-08-17 21:37:50,363 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.92 vs. limit=15.0 2024-08-17 21:37:54,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3529520.0, ans=0.0 2024-08-17 21:37:57,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3529520.0, ans=0.2 2024-08-17 21:37:59,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3529520.0, ans=0.0 2024-08-17 21:38:14,430 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 21 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-17 21:38:16,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3529720.0, ans=0.0 2024-08-17 21:38:38,352 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.84 vs. limit=15.0 2024-08-17 21:38:39,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3529820.0, ans=0.0 2024-08-17 21:38:48,034 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 9850, loss[loss=0.1115, beats_loss=0.009128, ecapa_loss=0.0001217, whisper_loss=0.1012, over 19004.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01049, ecapa_loss=0.0001486, whisper_loss=0.09077, over 3853147.55 frames. ], batch size: 70, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:38:52,470 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.238e+01 2.528e+01 2.791e+01 4.527e+01, threshold=5.056e+01, percent-clipped=0.0 2024-08-17 21:39:14,024 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.01 vs. limit=6.0 2024-08-17 21:39:19,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3530120.0, ans=0.05 2024-08-17 21:39:22,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3530120.0, ans=0.0 2024-08-17 21:39:28,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3530120.0, ans=0.0 2024-08-17 21:39:41,924 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-17 21:39:44,491 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 13 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-17 21:40:04,751 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 9900, loss[loss=0.0941, beats_loss=0.01135, ecapa_loss=0.0001253, whisper_loss=0.08149, over 18549.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01057, ecapa_loss=0.0001486, whisper_loss=0.09047, over 3866111.22 frames. ], batch size: 73, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:40:06,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3530420.0, ans=0.125 2024-08-17 21:40:11,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3530420.0, ans=0.0 2024-08-17 21:40:39,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3530620.0, ans=0.125 2024-08-17 21:40:40,831 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 25 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-17 21:40:43,913 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.31 vs. limit=15.0 2024-08-17 21:40:57,983 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.85 vs. limit=15.0 2024-08-17 21:41:01,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3530720.0, ans=0.125 2024-08-17 21:41:13,091 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-17 21:41:19,761 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 9950, loss[loss=0.1009, beats_loss=0.01051, ecapa_loss=0.0001693, whisper_loss=0.08873, over 21679.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01062, ecapa_loss=0.0001481, whisper_loss=0.08958, over 3855868.00 frames. ], batch size: 91, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:41:23,681 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.774e+01 2.318e+01 2.487e+01 2.825e+01 4.305e+01, threshold=4.974e+01, percent-clipped=0.0 2024-08-17 21:41:40,596 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-17 21:42:34,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3531320.0, ans=0.125 2024-08-17 21:42:38,367 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 10000, loss[loss=0.09737, beats_loss=0.009505, ecapa_loss=0.0001361, whisper_loss=0.0865, over 16152.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01064, ecapa_loss=0.0001483, whisper_loss=0.08978, over 3872695.02 frames. ], batch size: 61, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:42:46,546 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.72 vs. limit=10.0 2024-08-17 21:42:54,499 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 19 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-17 21:43:11,365 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-17 21:43:20,224 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-17 21:43:50,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3531820.0, ans=0.125 2024-08-17 21:43:52,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3531820.0, ans=0.125 2024-08-17 21:43:54,958 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 10050, loss[loss=0.0895, beats_loss=0.01039, ecapa_loss=0.000166, whisper_loss=0.07745, over 17788.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01062, ecapa_loss=0.0001476, whisper_loss=0.09042, over 3885337.85 frames. ], batch size: 74, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:43:55,645 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.13 vs. limit=10.0 2024-08-17 21:43:59,733 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.404e+01 2.606e+01 2.788e+01 4.457e+01, threshold=5.212e+01, percent-clipped=0.0 2024-08-17 21:44:17,788 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.80 vs. limit=15.0 2024-08-17 21:44:24,745 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.87 vs. limit=22.5 2024-08-17 21:44:29,774 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-17 21:44:33,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3532120.0, ans=0.04949747468305833 2024-08-17 21:44:39,850 WARNING [optim.py:496] (0/4) Scaling gradients by 0.09485381096601486, model_norm_threshold=52.118003845214844 2024-08-17 21:44:40,021 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.28, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.481e+04, grad_sumsq=8.481e+04, orig_rms_sq=1.000e+00 2024-08-17 21:44:49,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3532220.0, ans=0.04949747468305833 2024-08-17 21:44:57,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3532320.0, ans=0.0 2024-08-17 21:45:05,086 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-17 21:45:12,849 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 10100, loss[loss=0.08564, beats_loss=0.01131, ecapa_loss=0.0001525, whisper_loss=0.0728, over 16846.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01064, ecapa_loss=0.0001476, whisper_loss=0.0903, over 3912118.71 frames. ], batch size: 69, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:45:16,092 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-17 21:45:16,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3532420.0, ans=0.125 2024-08-17 21:45:24,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3532420.0, ans=0.0 2024-08-17 21:45:32,164 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 36 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-17 21:45:47,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3532620.0, ans=0.2 2024-08-17 21:45:52,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3532620.0, ans=0.2 2024-08-17 21:46:12,559 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-17 21:46:16,553 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 12 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-17 21:46:26,189 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 10150, loss[loss=0.0915, beats_loss=0.009604, ecapa_loss=0.0001606, whisper_loss=0.08029, over 19595.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01063, ecapa_loss=0.0001486, whisper_loss=0.08963, over 3897233.17 frames. ], batch size: 84, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:46:30,189 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.357e+01 2.542e+01 2.923e+01 5.495e+02, threshold=5.084e+01, percent-clipped=3.0 2024-08-17 21:47:00,576 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.58 vs. limit=15.0 2024-08-17 21:47:05,918 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 34 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-17 21:47:09,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3533220.0, ans=0.1 2024-08-17 21:47:21,180 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 16 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-17 21:47:24,221 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.71 vs. limit=12.0 2024-08-17 21:47:28,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3533320.0, ans=0.125 2024-08-17 21:47:37,753 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 10200, loss[loss=0.09077, beats_loss=0.007863, ecapa_loss=0.0001935, whisper_loss=0.08098, over 13675.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01058, ecapa_loss=0.0001474, whisper_loss=0.09003, over 3912522.06 frames. ], batch size: 55, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:47:46,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3533420.0, ans=0.125 2024-08-17 21:48:01,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3533520.0, ans=0.2 2024-08-17 21:48:01,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3533520.0, ans=0.0 2024-08-17 21:48:06,014 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 23 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-17 21:48:08,476 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 18 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-17 21:48:14,642 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.40 vs. limit=12.0 2024-08-17 21:48:21,265 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 19 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-17 21:48:29,025 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-17 21:48:43,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3533820.0, ans=0.125 2024-08-17 21:48:49,022 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 10250, loss[loss=0.09772, beats_loss=0.009089, ecapa_loss=0.0001694, whisper_loss=0.08694, over 16793.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01061, ecapa_loss=0.0001479, whisper_loss=0.08971, over 3900792.72 frames. ], batch size: 70, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:48:50,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3533920.0, ans=0.125 2024-08-17 21:48:52,936 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.974e+01 2.355e+01 2.589e+01 2.999e+01 4.439e+02, threshold=5.177e+01, percent-clipped=1.0 2024-08-17 21:49:08,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3534020.0, ans=0.0 2024-08-17 21:49:11,618 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 23 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-17 21:49:16,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3534120.0, ans=0.125 2024-08-17 21:49:16,788 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.13 vs. limit=10.0 2024-08-17 21:49:50,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3534320.0, ans=0.125 2024-08-17 21:49:57,529 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 10300, loss[loss=0.1167, beats_loss=0.007917, ecapa_loss=0.0001474, whisper_loss=0.1073, over 15632.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01056, ecapa_loss=0.0001476, whisper_loss=0.08992, over 3920158.55 frames. ], batch size: 59, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:50:19,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3534520.0, ans=0.125 2024-08-17 21:50:26,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3534620.0, ans=0.0 2024-08-17 21:50:35,840 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.15 vs. limit=15.0 2024-08-17 21:50:43,246 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 27 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-17 21:50:52,611 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.46 vs. limit=15.0 2024-08-17 21:51:02,731 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.33 vs. limit=22.5 2024-08-17 21:51:05,877 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 10350, loss[loss=0.09751, beats_loss=0.01297, ecapa_loss=0.0001285, whisper_loss=0.08326, over 21731.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01053, ecapa_loss=0.0001473, whisper_loss=0.09025, over 3934573.31 frames. ], batch size: 83, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:51:09,943 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.918e+01 2.272e+01 2.511e+01 2.861e+01 6.288e+01, threshold=5.023e+01, percent-clipped=1.0 2024-08-17 21:51:11,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3534920.0, ans=0.0 2024-08-17 21:51:16,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3534920.0, ans=0.125 2024-08-17 21:51:17,389 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-17 21:51:24,875 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 23 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-17 21:51:35,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3535120.0, ans=0.0 2024-08-17 21:51:57,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3535220.0, ans=0.1 2024-08-17 21:51:57,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3535220.0, ans=0.125 2024-08-17 21:52:04,606 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 21:52:07,043 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 18 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-17 21:52:08,247 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 26 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-17 21:52:13,447 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 10400, loss[loss=0.08917, beats_loss=0.009493, ecapa_loss=0.0001658, whisper_loss=0.07802, over 18446.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01053, ecapa_loss=0.0001476, whisper_loss=0.09014, over 3919024.72 frames. ], batch size: 76, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:52:16,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3535420.0, ans=0.0 2024-08-17 21:52:17,997 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.42 vs. limit=15.0 2024-08-17 21:52:21,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3535420.0, ans=0.0 2024-08-17 21:52:34,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3535520.0, ans=0.07 2024-08-17 21:52:40,618 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-17 21:52:44,711 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 38 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-17 21:52:46,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3535620.0, ans=0.2 2024-08-17 21:53:00,485 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.615e+05 2024-08-17 21:53:19,110 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 10450, loss[loss=0.1143, beats_loss=0.01006, ecapa_loss=0.0001533, whisper_loss=0.1027, over 21600.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01059, ecapa_loss=0.0001472, whisper_loss=0.09001, over 3915189.74 frames. ], batch size: 85, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:53:22,783 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.737e+01 2.244e+01 2.463e+01 2.760e+01 5.655e+01, threshold=4.925e+01, percent-clipped=1.0 2024-08-17 21:53:41,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3536020.0, ans=0.125 2024-08-17 21:53:42,116 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 24 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-17 21:53:54,570 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 19 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-17 21:54:02,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3536220.0, ans=0.125 2024-08-17 21:54:11,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3536320.0, ans=0.0 2024-08-17 21:54:15,202 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-17 21:54:24,184 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 10500, loss[loss=0.06028, beats_loss=0.009431, ecapa_loss=0.0001748, whisper_loss=0.0491, over 13564.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01051, ecapa_loss=0.0001478, whisper_loss=0.09074, over 3921474.47 frames. ], batch size: 55, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:54:35,038 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-17 21:54:41,349 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 18 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-17 21:54:42,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3536520.0, ans=0.125 2024-08-17 21:54:56,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3536620.0, ans=0.07 2024-08-17 21:54:58,585 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 21 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-17 21:55:19,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3536820.0, ans=0.0 2024-08-17 21:55:29,218 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 10550, loss[loss=0.109, beats_loss=0.009937, ecapa_loss=0.0001509, whisper_loss=0.09759, over 21058.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01056, ecapa_loss=0.0001471, whisper_loss=0.0905, over 3935623.70 frames. ], batch size: 83, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:55:33,486 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.761e+01 2.376e+01 2.745e+01 3.046e+01 5.243e+01, threshold=5.490e+01, percent-clipped=1.0 2024-08-17 21:55:42,573 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-17 21:55:54,356 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.94 vs. limit=10.0 2024-08-17 21:56:07,797 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 30 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-17 21:56:09,048 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 26 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-17 21:56:33,346 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 10600, loss[loss=0.1018, beats_loss=0.00919, ecapa_loss=0.0001568, whisper_loss=0.09101, over 18948.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01049, ecapa_loss=0.0001472, whisper_loss=0.09049, over 3881568.69 frames. ], batch size: 74, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:56:36,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3537420.0, ans=0.1 2024-08-17 21:56:38,702 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 25 from LS+wenet, 23 from Vox, 17 fro AS 2024-08-17 21:56:46,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3537520.0, ans=0.125 2024-08-17 21:56:47,515 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 18 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-17 21:56:48,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3537520.0, ans=0.125 2024-08-17 21:56:50,155 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-17 21:56:53,224 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.68 vs. limit=15.0 2024-08-17 21:56:55,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3537520.0, ans=0.0 2024-08-17 21:57:10,338 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 26 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-17 21:57:15,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3537720.0, ans=0.1 2024-08-17 21:57:37,114 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 10650, loss[loss=0.08035, beats_loss=0.01053, ecapa_loss=0.0001755, whisper_loss=0.06806, over 13838.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01047, ecapa_loss=0.0001457, whisper_loss=0.09106, over 3853618.57 frames. ], batch size: 55, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:57:37,324 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 20 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-17 21:57:40,717 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.432e+01 2.682e+01 2.988e+01 4.178e+01, threshold=5.364e+01, percent-clipped=0.0 2024-08-17 21:57:51,604 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 28 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-17 21:57:51,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3538020.0, ans=0.125 2024-08-17 21:57:54,110 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 40 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-17 21:58:03,602 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 21:58:18,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3538220.0, ans=0.0 2024-08-17 21:58:18,868 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=13.24 vs. limit=12.0 2024-08-17 21:58:30,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3538320.0, ans=0.125 2024-08-17 21:58:41,403 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 10700, loss[loss=0.09903, beats_loss=0.01161, ecapa_loss=0.0001376, whisper_loss=0.08605, over 21898.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01048, ecapa_loss=0.0001454, whisper_loss=0.09167, over 3883371.10 frames. ], batch size: 89, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:58:54,483 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-17 21:59:04,519 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3538520.0, ans=0.125 2024-08-17 21:59:05,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3538620.0, ans=0.125 2024-08-17 21:59:19,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3538720.0, ans=0.125 2024-08-17 21:59:30,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3538820.0, ans=0.1 2024-08-17 21:59:34,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3538820.0, ans=0.0 2024-08-17 21:59:37,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3538820.0, ans=0.125 2024-08-17 21:59:42,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3538920.0, ans=0.125 2024-08-17 21:59:44,050 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 10750, loss[loss=0.1005, beats_loss=0.01128, ecapa_loss=0.0001367, whisper_loss=0.0879, over 13793.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01059, ecapa_loss=0.0001449, whisper_loss=0.09122, over 3874653.59 frames. ], batch size: 54, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:59:47,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3538920.0, ans=0.125 2024-08-17 21:59:48,272 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.762e+01 2.354e+01 2.532e+01 2.828e+01 4.238e+01, threshold=5.063e+01, percent-clipped=0.0 2024-08-17 21:59:53,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3538920.0, ans=0.1 2024-08-17 22:00:15,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3539120.0, ans=0.125 2024-08-17 22:00:29,848 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-17 22:00:31,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3539220.0, ans=0.1 2024-08-17 22:00:32,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3539220.0, ans=0.125 2024-08-17 22:00:42,152 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 20 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-17 22:00:44,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3539320.0, ans=0.2 2024-08-17 22:00:47,867 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 10800, loss[loss=0.0808, beats_loss=0.01323, ecapa_loss=0.0001567, whisper_loss=0.066, over 13225.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01052, ecapa_loss=0.0001455, whisper_loss=0.09231, over 3906117.55 frames. ], batch size: 57, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:00:49,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3539420.0, ans=0.0 2024-08-17 22:00:49,740 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.76 vs. limit=10.0 2024-08-17 22:00:55,921 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.86 vs. limit=10.0 2024-08-17 22:01:00,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3539520.0, ans=0.125 2024-08-17 22:01:06,929 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 14 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-17 22:01:07,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3539520.0, ans=0.2 2024-08-17 22:01:10,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3539520.0, ans=0.0 2024-08-17 22:01:10,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3539520.0, ans=0.09899494936611666 2024-08-17 22:01:11,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3539620.0, ans=0.1 2024-08-17 22:01:13,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3539620.0, ans=0.0 2024-08-17 22:01:26,188 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.24 vs. limit=15.0 2024-08-17 22:01:28,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3539720.0, ans=0.125 2024-08-17 22:01:39,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3539820.0, ans=0.0 2024-08-17 22:01:47,023 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 37 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-17 22:01:50,488 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 10850, loss[loss=0.1162, beats_loss=0.01071, ecapa_loss=0.000177, whisper_loss=0.1037, over 17467.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01053, ecapa_loss=0.0001461, whisper_loss=0.09272, over 3904990.08 frames. ], batch size: 73, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:01:51,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3539920.0, ans=0.125 2024-08-17 22:01:53,565 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.90 vs. limit=22.5 2024-08-17 22:01:54,183 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.336e+01 2.508e+01 2.767e+01 4.451e+01, threshold=5.015e+01, percent-clipped=0.0 2024-08-17 22:01:58,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3539920.0, ans=0.0 2024-08-17 22:02:12,369 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 34 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-17 22:02:39,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3540220.0, ans=0.1 2024-08-17 22:02:48,018 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 24 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-17 22:02:53,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3540420.0, ans=0.125 2024-08-17 22:02:54,003 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 10900, loss[loss=0.109, beats_loss=0.008978, ecapa_loss=0.0001475, whisper_loss=0.09854, over 22030.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01051, ecapa_loss=0.0001459, whisper_loss=0.09268, over 3918888.57 frames. ], batch size: 84, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:03:02,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3540420.0, ans=0.125 2024-08-17 22:03:03,304 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 22 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-17 22:03:24,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3540620.0, ans=0.0 2024-08-17 22:03:27,962 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3540620.0, ans=0.0 2024-08-17 22:03:28,915 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 27 from LS+wenet, 8 from Vox, 46 fro AS 2024-08-17 22:03:35,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3540720.0, ans=0.1 2024-08-17 22:03:52,400 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 23 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-17 22:03:57,310 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 10950, loss[loss=0.1077, beats_loss=0.01064, ecapa_loss=0.0001438, whisper_loss=0.09565, over 22729.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01063, ecapa_loss=0.0001462, whisper_loss=0.0916, over 3928268.16 frames. ], batch size: 92, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:03:58,832 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 28 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-17 22:03:59,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3540920.0, ans=0.1 2024-08-17 22:04:01,043 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.429e+01 2.667e+01 3.020e+01 4.482e+01, threshold=5.334e+01, percent-clipped=0.0 2024-08-17 22:04:01,212 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-17 22:04:17,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3541020.0, ans=0.035 2024-08-17 22:04:17,952 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.08 vs. limit=15.0 2024-08-17 22:04:19,808 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-17 22:04:30,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3541120.0, ans=0.125 2024-08-17 22:04:38,269 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.37 vs. limit=15.0 2024-08-17 22:04:38,805 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-17 22:04:46,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3541320.0, ans=0.2 2024-08-17 22:04:47,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3541320.0, ans=0.0 2024-08-17 22:04:53,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3541320.0, ans=0.0 2024-08-17 22:04:59,027 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-17 22:05:00,125 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 11000, loss[loss=0.1017, beats_loss=0.01022, ecapa_loss=0.0001772, whisper_loss=0.08966, over 17602.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01056, ecapa_loss=0.000148, whisper_loss=0.09182, over 3926288.43 frames. ], batch size: 71, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:05:04,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3541420.0, ans=0.125 2024-08-17 22:05:09,053 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 18 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-17 22:05:18,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3541520.0, ans=0.125 2024-08-17 22:05:51,573 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2024-08-17 22:06:02,582 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 11050, loss[loss=0.08155, beats_loss=0.01186, ecapa_loss=0.0001164, whisper_loss=0.06853, over 15429.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01056, ecapa_loss=0.0001489, whisper_loss=0.09175, over 3933826.76 frames. ], batch size: 59, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:06:06,469 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.361e+01 2.552e+01 2.844e+01 4.106e+02, threshold=5.103e+01, percent-clipped=1.0 2024-08-17 22:06:06,658 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-17 22:06:11,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3541920.0, ans=0.0 2024-08-17 22:06:37,290 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-17 22:06:48,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3542220.0, ans=0.125 2024-08-17 22:06:52,156 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 35 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-17 22:06:57,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3542320.0, ans=0.125 2024-08-17 22:06:58,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3542320.0, ans=0.125 2024-08-17 22:07:05,403 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 11100, loss[loss=0.08491, beats_loss=0.009603, ecapa_loss=0.0001649, whisper_loss=0.07366, over 16100.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01058, ecapa_loss=0.0001481, whisper_loss=0.09106, over 3911447.82 frames. ], batch size: 66, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:07:05,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3542420.0, ans=0.1 2024-08-17 22:07:20,481 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 24 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-17 22:07:28,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3542520.0, ans=0.1 2024-08-17 22:07:37,487 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.56 vs. limit=15.0 2024-08-17 22:07:39,652 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 22:07:44,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3542720.0, ans=0.125 2024-08-17 22:07:50,227 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.79 vs. limit=12.0 2024-08-17 22:07:55,989 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-17 22:08:08,411 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 11150, loss[loss=0.09246, beats_loss=0.01162, ecapa_loss=0.0001441, whisper_loss=0.0794, over 17692.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01059, ecapa_loss=0.0001476, whisper_loss=0.09056, over 3915366.05 frames. ], batch size: 71, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:08:12,139 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.728e+01 2.281e+01 2.502e+01 2.861e+01 4.409e+01, threshold=5.005e+01, percent-clipped=0.0 2024-08-17 22:08:12,345 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 21 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-17 22:08:19,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3543020.0, ans=0.125 2024-08-17 22:08:32,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3543120.0, ans=0.1 2024-08-17 22:08:33,587 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-17 22:08:33,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3543120.0, ans=0.125 2024-08-17 22:08:33,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3543120.0, ans=0.1 2024-08-17 22:08:35,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3543120.0, ans=0.0 2024-08-17 22:08:54,720 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 28 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-17 22:08:55,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3543220.0, ans=15.0 2024-08-17 22:09:10,945 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 11200, loss[loss=0.1008, beats_loss=0.01075, ecapa_loss=0.0001322, whisper_loss=0.08878, over 23563.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01059, ecapa_loss=0.0001475, whisper_loss=0.08997, over 3884616.13 frames. ], batch size: 94, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:09:48,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3543720.0, ans=0.015 2024-08-17 22:10:01,510 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-17 22:10:04,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3543820.0, ans=0.0 2024-08-17 22:10:13,536 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 11250, loss[loss=0.1017, beats_loss=0.01115, ecapa_loss=0.000153, whisper_loss=0.08899, over 21616.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01059, ecapa_loss=0.0001479, whisper_loss=0.0899, over 3890052.70 frames. ], batch size: 86, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:10:14,490 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.63 vs. limit=15.0 2024-08-17 22:10:17,705 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.330e+01 2.572e+01 2.919e+01 3.914e+02, threshold=5.145e+01, percent-clipped=2.0 2024-08-17 22:10:19,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3543920.0, ans=0.125 2024-08-17 22:10:20,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3543920.0, ans=0.125 2024-08-17 22:10:28,274 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 41 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-17 22:10:33,391 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-17 22:10:41,252 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 20 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-17 22:10:47,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3544120.0, ans=0.0 2024-08-17 22:11:10,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3544320.0, ans=0.125 2024-08-17 22:11:18,407 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 11300, loss[loss=0.113, beats_loss=0.009901, ecapa_loss=0.0001366, whisper_loss=0.1017, over 23103.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01058, ecapa_loss=0.0001465, whisper_loss=0.09028, over 3894276.91 frames. ], batch size: 90, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:11:19,049 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.16 vs. limit=15.0 2024-08-17 22:11:21,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3544420.0, ans=0.0 2024-08-17 22:11:46,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3544620.0, ans=0.2 2024-08-17 22:11:47,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3544620.0, ans=0.1 2024-08-17 22:12:05,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3544720.0, ans=0.2 2024-08-17 22:12:08,098 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.38 vs. limit=15.0 2024-08-17 22:12:26,048 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 11350, loss[loss=0.1158, beats_loss=0.007829, ecapa_loss=0.000174, whisper_loss=0.1062, over 13614.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01054, ecapa_loss=0.000147, whisper_loss=0.09003, over 3911096.96 frames. ], batch size: 54, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:12:29,977 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.915e+01 2.322e+01 2.583e+01 3.031e+01 6.064e+01, threshold=5.166e+01, percent-clipped=1.0 2024-08-17 22:12:32,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3544920.0, ans=0.125 2024-08-17 22:12:37,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3544920.0, ans=0.125 2024-08-17 22:12:49,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3545020.0, ans=0.1 2024-08-17 22:13:04,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3545120.0, ans=0.1 2024-08-17 22:13:18,110 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-17 22:13:33,277 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 11400, loss[loss=0.1083, beats_loss=0.01133, ecapa_loss=0.0001391, whisper_loss=0.09555, over 23759.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01053, ecapa_loss=0.0001473, whisper_loss=0.09004, over 3915561.00 frames. ], batch size: 94, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:13:33,381 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 20 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-17 22:14:09,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3545620.0, ans=0.09899494936611666 2024-08-17 22:14:28,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3545820.0, ans=0.125 2024-08-17 22:14:30,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3545820.0, ans=0.0 2024-08-17 22:14:36,947 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 31 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-17 22:14:43,796 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 11450, loss[loss=0.1017, beats_loss=0.0113, ecapa_loss=0.0001253, whisper_loss=0.08919, over 22682.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01054, ecapa_loss=0.0001476, whisper_loss=0.09009, over 3905082.64 frames. ], batch size: 92, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:14:47,656 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.375e+01 2.632e+01 2.898e+01 5.397e+01, threshold=5.264e+01, percent-clipped=1.0 2024-08-17 22:14:49,505 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 22:14:54,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3545920.0, ans=0.0 2024-08-17 22:14:58,293 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-17 22:15:02,681 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 24 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-17 22:15:07,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3546020.0, ans=0.0 2024-08-17 22:15:08,067 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 13 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-17 22:15:11,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3546120.0, ans=0.2 2024-08-17 22:15:21,609 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 10 from Vox, 33 fro AS 2024-08-17 22:15:33,491 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-17 22:15:46,745 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 27 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-17 22:15:51,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3546320.0, ans=0.125 2024-08-17 22:15:54,430 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-17 22:15:56,053 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 11500, loss[loss=0.1026, beats_loss=0.01082, ecapa_loss=0.0001422, whisper_loss=0.09039, over 22617.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01054, ecapa_loss=0.000148, whisper_loss=0.09022, over 3906823.03 frames. ], batch size: 87, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:15:59,536 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.17 vs. limit=10.0 2024-08-17 22:16:00,507 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 28 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-17 22:16:03,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3546420.0, ans=0.0 2024-08-17 22:16:43,768 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 20 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-17 22:16:45,179 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 17 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-17 22:16:58,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3546820.0, ans=0.0 2024-08-17 22:17:00,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3546820.0, ans=0.2 2024-08-17 22:17:03,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3546820.0, ans=0.125 2024-08-17 22:17:11,848 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 26 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-17 22:17:13,449 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 11550, loss[loss=0.1091, beats_loss=0.009984, ecapa_loss=0.0001161, whisper_loss=0.09799, over 19951.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01055, ecapa_loss=0.0001477, whisper_loss=0.0904, over 3916145.11 frames. ], batch size: 76, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:17:17,595 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.251e+01 2.574e+01 2.799e+01 8.248e+01, threshold=5.147e+01, percent-clipped=1.0 2024-08-17 22:17:23,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3546920.0, ans=0.125 2024-08-17 22:17:34,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3547020.0, ans=0.0 2024-08-17 22:17:36,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3547020.0, ans=0.04949747468305833 2024-08-17 22:17:43,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3547120.0, ans=0.125 2024-08-17 22:17:47,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3547120.0, ans=0.125 2024-08-17 22:17:49,323 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.00 vs. limit=15.0 2024-08-17 22:17:52,446 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 23 from LS+wenet, 12 from Vox, 20 fro AS 2024-08-17 22:17:57,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3547120.0, ans=0.0 2024-08-17 22:17:59,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3547220.0, ans=0.125 2024-08-17 22:18:07,191 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 22 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-17 22:18:13,140 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 20 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-17 22:18:36,711 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.75 vs. limit=15.0 2024-08-17 22:18:39,550 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 11600, loss[loss=0.09508, beats_loss=0.01029, ecapa_loss=0.0001462, whisper_loss=0.08332, over 20543.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01054, ecapa_loss=0.0001473, whisper_loss=0.08984, over 3892767.60 frames. ], batch size: 83, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:18:41,625 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.54 vs. limit=15.0 2024-08-17 22:18:58,534 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-17 22:19:00,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3547520.0, ans=0.0 2024-08-17 22:19:02,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3547520.0, ans=0.125 2024-08-17 22:19:09,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3547520.0, ans=0.0 2024-08-17 22:19:11,769 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=18.46 vs. limit=15.0 2024-08-17 22:19:18,297 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-17 22:19:24,897 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 20 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-17 22:19:31,129 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-17 22:19:54,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3547720.0, ans=0.125 2024-08-17 22:20:03,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3547820.0, ans=0.1 2024-08-17 22:20:20,392 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 11650, loss[loss=0.09497, beats_loss=0.01145, ecapa_loss=0.0001319, whisper_loss=0.08221, over 15317.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01051, ecapa_loss=0.0001469, whisper_loss=0.09051, over 3905606.31 frames. ], batch size: 59, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:20:27,304 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.333e+01 2.551e+01 2.882e+01 3.740e+01, threshold=5.102e+01, percent-clipped=0.0 2024-08-17 22:21:09,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3548220.0, ans=0.0 2024-08-17 22:21:15,080 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.16 vs. limit=22.5 2024-08-17 22:21:18,482 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-17 22:21:40,171 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 11700, loss[loss=0.07043, beats_loss=0.01346, ecapa_loss=9.657e-05, whisper_loss=0.056, over 17641.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0106, ecapa_loss=0.000147, whisper_loss=0.09083, over 3942811.55 frames. ], batch size: 69, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:21:43,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3548420.0, ans=0.0 2024-08-17 22:21:45,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3548420.0, ans=0.035 2024-08-17 22:21:55,456 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.465e+05 2024-08-17 22:22:01,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3548520.0, ans=0.125 2024-08-17 22:22:31,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3548520.0, ans=0.125 2024-08-17 22:22:42,051 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 29 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-17 22:22:42,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3548620.0, ans=0.125 2024-08-17 22:23:15,387 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.57 vs. limit=12.0 2024-08-17 22:23:21,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3548820.0, ans=0.125 2024-08-17 22:23:22,059 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.10 vs. limit=15.0 2024-08-17 22:23:34,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3548820.0, ans=0.0 2024-08-17 22:23:37,562 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 11750, loss[loss=0.1039, beats_loss=0.01264, ecapa_loss=0.0001061, whisper_loss=0.09023, over 23441.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01054, ecapa_loss=0.0001465, whisper_loss=0.0913, over 3946444.07 frames. ], batch size: 93, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:23:42,256 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.932e+01 2.390e+01 2.565e+01 2.987e+01 4.892e+01, threshold=5.130e+01, percent-clipped=0.0 2024-08-17 22:23:49,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3548920.0, ans=0.0 2024-08-17 22:23:59,225 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-17 22:24:12,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3549120.0, ans=0.2 2024-08-17 22:24:28,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=3549220.0, ans=15.0 2024-08-17 22:24:52,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3549320.0, ans=0.0 2024-08-17 22:25:01,585 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 27 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-17 22:25:11,867 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 11800, loss[loss=0.1023, beats_loss=0.01145, ecapa_loss=0.0001421, whisper_loss=0.08939, over 22814.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01059, ecapa_loss=0.0001466, whisper_loss=0.09101, over 3945578.16 frames. ], batch size: 94, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:25:14,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3549420.0, ans=0.125 2024-08-17 22:25:25,039 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 20 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-17 22:25:25,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3549420.0, ans=0.125 2024-08-17 22:25:29,222 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-17 22:25:46,545 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 28 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-17 22:25:52,842 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.23 vs. limit=6.0 2024-08-17 22:25:58,597 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-17 22:26:00,815 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.923e-01 2024-08-17 22:26:20,362 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 14 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-17 22:26:25,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3549720.0, ans=0.0 2024-08-17 22:26:55,215 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 11850, loss[loss=0.1129, beats_loss=0.01012, ecapa_loss=0.0001636, whisper_loss=0.1011, over 19252.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01053, ecapa_loss=0.0001483, whisper_loss=0.09088, over 3957430.88 frames. ], batch size: 77, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:27:02,145 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.725e+01 2.313e+01 2.497e+01 2.701e+01 4.196e+01, threshold=4.994e+01, percent-clipped=0.0 2024-08-17 22:27:08,313 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 29 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-17 22:27:17,738 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.02 vs. limit=10.0 2024-08-17 22:27:55,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3550120.0, ans=0.1 2024-08-17 22:27:59,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3550120.0, ans=0.0 2024-08-17 22:27:59,320 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.85 vs. limit=22.5 2024-08-17 22:28:20,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3550220.0, ans=0.0 2024-08-17 22:28:38,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=3550320.0, ans=0.5 2024-08-17 22:28:38,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3550320.0, ans=0.0 2024-08-17 22:28:53,860 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 11900, loss[loss=0.1084, beats_loss=0.01029, ecapa_loss=0.0001218, whisper_loss=0.09685, over 20623.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01064, ecapa_loss=0.0001474, whisper_loss=0.09008, over 3960219.85 frames. ], batch size: 79, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:28:58,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3550420.0, ans=0.1 2024-08-17 22:29:05,659 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-17 22:29:15,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3550520.0, ans=0.0 2024-08-17 22:29:16,982 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-17 22:29:29,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3550520.0, ans=0.1 2024-08-17 22:29:55,974 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 14 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-17 22:30:17,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3550720.0, ans=0.125 2024-08-17 22:30:21,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3550820.0, ans=0.1 2024-08-17 22:30:46,152 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 11950, loss[loss=0.09052, beats_loss=0.01222, ecapa_loss=0.000104, whisper_loss=0.07726, over 20325.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01051, ecapa_loss=0.0001479, whisper_loss=0.0909, over 3911930.20 frames. ], batch size: 77, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:30:46,287 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-17 22:30:53,070 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.845e+01 2.173e+01 2.418e+01 2.712e+01 4.261e+01, threshold=4.835e+01, percent-clipped=0.0 2024-08-17 22:31:10,729 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 9 from Vox, 36 fro AS 2024-08-17 22:31:32,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3551120.0, ans=0.0 2024-08-17 22:31:38,848 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 22 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-17 22:31:43,558 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.68 vs. limit=15.0 2024-08-17 22:32:06,962 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 22:32:14,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3551320.0, ans=0.125 2024-08-17 22:32:19,373 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 12000, loss[loss=0.09942, beats_loss=0.01012, ecapa_loss=0.0001141, whisper_loss=0.08816, over 20298.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01052, ecapa_loss=0.0001481, whisper_loss=0.09034, over 3877554.31 frames. ], batch size: 75, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:32:19,374 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-17 22:33:02,558 INFO [train_multi_KD3.py:1149] (0/4) Epoch 24, validation on ASR_libri: loss=0.251, beats_loss=0, ecapa_loss=0.0005236, whisper_loss=0.2457, over 922467.00 frames. 2024-08-17 22:33:16,261 INFO [train_multi_KD3.py:1149] (0/4) Epoch 24, validation on SV_voxceleb1: loss=0.004219, beats_loss=0, ecapa_loss=0.0004219, whisper_loss=0, over 939242.00 frames. 2024-08-17 22:34:41,275 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.1853, 3.2478, 3.3941, 3.1545], device='cuda:0') 2024-08-17 22:35:20,187 INFO [train_multi_KD3.py:1149] (0/4) Epoch 24, validation on AT_audioset: loss=0.02322, beats_loss=0.02322, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-17 22:35:20,191 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32389MB 2024-08-17 22:35:26,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3551420.0, ans=0.05 2024-08-17 22:35:34,183 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 19 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-17 22:35:39,953 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-17 22:35:46,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3551520.0, ans=0.1 2024-08-17 22:35:52,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3551620.0, ans=0.125 2024-08-17 22:36:10,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3551720.0, ans=0.125 2024-08-17 22:36:37,129 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 12050, loss[loss=0.1026, beats_loss=0.00845, ecapa_loss=0.0001649, whisper_loss=0.0925, over 14334.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01052, ecapa_loss=0.0001477, whisper_loss=0.09019, over 3833671.75 frames. ], batch size: 60, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:36:41,542 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.293e+01 2.540e+01 2.892e+01 1.917e+02, threshold=5.080e+01, percent-clipped=1.0 2024-08-17 22:36:56,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3552020.0, ans=0.2 2024-08-17 22:37:02,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3552020.0, ans=0.125 2024-08-17 22:37:08,222 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 12 from Vox, 52 fro AS 2024-08-17 22:37:14,157 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-17 22:37:14,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3552120.0, ans=0.1 2024-08-17 22:37:33,676 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.42 vs. limit=15.0 2024-08-17 22:37:44,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3552320.0, ans=0.125 2024-08-17 22:37:46,065 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-17 22:37:54,085 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 12100, loss[loss=0.09961, beats_loss=0.009412, ecapa_loss=0.0001847, whisper_loss=0.08835, over 19544.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0106, ecapa_loss=0.0001465, whisper_loss=0.09006, over 3863882.61 frames. ], batch size: 78, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:38:05,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3552420.0, ans=0.04949747468305833 2024-08-17 22:38:12,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3552520.0, ans=0.0 2024-08-17 22:38:18,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3552520.0, ans=0.125 2024-08-17 22:38:30,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3552620.0, ans=0.07 2024-08-17 22:38:33,932 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.79 vs. limit=22.5 2024-08-17 22:38:43,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3552720.0, ans=0.125 2024-08-17 22:39:07,821 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3552820.0, ans=0.0 2024-08-17 22:39:09,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3552920.0, ans=0.1 2024-08-17 22:39:10,451 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 12150, loss[loss=0.1099, beats_loss=0.008244, ecapa_loss=0.000159, whisper_loss=0.1001, over 21613.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01064, ecapa_loss=0.0001468, whisper_loss=0.08942, over 3873247.72 frames. ], batch size: 83, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:39:14,796 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.279e+01 2.475e+01 2.710e+01 6.792e+01, threshold=4.950e+01, percent-clipped=1.0 2024-08-17 22:39:16,464 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 28 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-17 22:39:16,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3552920.0, ans=0.125 2024-08-17 22:39:19,317 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-17 22:39:32,391 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-17 22:40:00,185 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-17 22:40:09,218 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 36 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-17 22:40:09,755 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.90 vs. limit=6.0 2024-08-17 22:40:10,979 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.85 vs. limit=15.0 2024-08-17 22:40:22,885 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 12200, loss[loss=0.09938, beats_loss=0.01061, ecapa_loss=0.0001685, whisper_loss=0.08709, over 21561.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01059, ecapa_loss=0.0001464, whisper_loss=0.09014, over 3869959.57 frames. ], batch size: 92, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:40:39,566 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 34 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-17 22:40:43,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3553520.0, ans=0.0 2024-08-17 22:41:10,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3553720.0, ans=0.1 2024-08-17 22:41:11,675 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.34 vs. limit=5.0 2024-08-17 22:41:29,599 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 15 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-17 22:41:33,245 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.45 vs. limit=15.0 2024-08-17 22:41:35,246 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 12250, loss[loss=0.1017, beats_loss=0.01237, ecapa_loss=0.0001039, whisper_loss=0.08825, over 20554.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01056, ecapa_loss=0.0001468, whisper_loss=0.09021, over 3880585.85 frames. ], batch size: 80, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:41:39,673 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.433e+01 2.663e+01 3.002e+01 4.108e+01, threshold=5.326e+01, percent-clipped=0.0 2024-08-17 22:41:44,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3553920.0, ans=0.0 2024-08-17 22:41:46,747 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-17 22:41:49,588 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 18 from LS+wenet, 16 from Vox, 47 fro AS 2024-08-17 22:42:00,120 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.59 vs. limit=15.0 2024-08-17 22:42:00,829 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 30 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-17 22:42:02,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3554120.0, ans=0.125 2024-08-17 22:42:08,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3554120.0, ans=0.0 2024-08-17 22:42:17,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3554220.0, ans=0.025 2024-08-17 22:42:30,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3554220.0, ans=0.1 2024-08-17 22:42:35,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3554320.0, ans=0.2 2024-08-17 22:42:45,073 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.62 vs. limit=10.0 2024-08-17 22:42:46,818 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 24 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-17 22:42:47,856 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 12300, loss[loss=0.09508, beats_loss=0.009935, ecapa_loss=0.0001153, whisper_loss=0.08399, over 21038.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01051, ecapa_loss=0.0001479, whisper_loss=0.08999, over 3895241.66 frames. ], batch size: 79, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:43:07,265 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-17 22:43:16,584 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.23 vs. limit=15.0 2024-08-17 22:43:27,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3554620.0, ans=0.0 2024-08-17 22:43:33,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3554720.0, ans=0.5 2024-08-17 22:43:41,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3554720.0, ans=0.0 2024-08-17 22:43:51,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3554820.0, ans=0.1 2024-08-17 22:44:00,853 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 12350, loss[loss=0.09882, beats_loss=0.01134, ecapa_loss=0.0001398, whisper_loss=0.08609, over 15217.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01037, ecapa_loss=0.0001491, whisper_loss=0.09076, over 3909865.56 frames. ], batch size: 60, lr: 2.53e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:44:01,373 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-17 22:44:05,312 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.318e+01 2.520e+01 2.807e+01 5.445e+01, threshold=5.040e+01, percent-clipped=1.0 2024-08-17 22:44:37,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3555120.0, ans=0.125 2024-08-17 22:44:47,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3555220.0, ans=0.0 2024-08-17 22:45:11,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3555320.0, ans=0.125 2024-08-17 22:45:13,541 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 12400, loss[loss=0.07545, beats_loss=0.01164, ecapa_loss=0.0001263, whisper_loss=0.06254, over 15423.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01039, ecapa_loss=0.0001489, whisper_loss=0.09033, over 3899112.34 frames. ], batch size: 63, lr: 2.53e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:45:20,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3555420.0, ans=0.125 2024-08-17 22:45:26,416 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 29 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-17 22:45:33,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3555520.0, ans=0.125 2024-08-17 22:45:42,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3555620.0, ans=0.125 2024-08-17 22:45:52,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3555620.0, ans=0.0 2024-08-17 22:45:53,885 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 17 from LS+wenet, 33 from Vox, 42 fro AS 2024-08-17 22:45:59,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3555720.0, ans=0.2 2024-08-17 22:46:18,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=3555820.0, ans=0.2 2024-08-17 22:46:18,694 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.82 vs. limit=10.0 2024-08-17 22:46:23,491 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 12450, loss[loss=0.07313, beats_loss=0.01204, ecapa_loss=0.0001405, whisper_loss=0.05968, over 15434.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01045, ecapa_loss=0.0001477, whisper_loss=0.09006, over 3897153.70 frames. ], batch size: 58, lr: 2.53e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:46:27,516 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.960e+01 2.268e+01 2.505e+01 2.895e+01 6.006e+01, threshold=5.010e+01, percent-clipped=2.0 2024-08-17 22:46:37,837 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-17 22:46:47,160 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 23 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-17 22:47:09,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3556220.0, ans=0.2 2024-08-17 22:47:10,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3556220.0, ans=0.0 2024-08-17 22:47:23,036 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 22 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-17 22:47:27,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3556320.0, ans=0.2 2024-08-17 22:47:29,145 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-17 22:47:30,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3556320.0, ans=0.1 2024-08-17 22:47:33,004 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 12500, loss[loss=0.06881, beats_loss=0.0142, ecapa_loss=0.0001074, whisper_loss=0.05354, over 15379.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01041, ecapa_loss=0.000148, whisper_loss=0.09088, over 3888806.52 frames. ], batch size: 61, lr: 2.53e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:47:51,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3556520.0, ans=0.0 2024-08-17 22:47:55,367 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-17 22:48:01,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3556620.0, ans=0.125 2024-08-17 22:48:16,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3556720.0, ans=0.0 2024-08-17 22:48:20,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3556720.0, ans=0.125 2024-08-17 22:48:21,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=3556720.0, ans=10.0 2024-08-17 22:48:37,731 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-17 22:48:41,845 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 12550, loss[loss=0.09432, beats_loss=0.009485, ecapa_loss=0.0001695, whisper_loss=0.08314, over 19193.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0104, ecapa_loss=0.0001489, whisper_loss=0.09049, over 3878551.38 frames. ], batch size: 80, lr: 2.53e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:48:46,294 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.984e+01 2.357e+01 2.585e+01 2.988e+01 4.779e+01, threshold=5.170e+01, percent-clipped=0.0 2024-08-17 22:48:47,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3556920.0, ans=0.1 2024-08-17 22:48:50,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3556920.0, ans=0.025 2024-08-17 22:49:04,502 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-17 22:49:11,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3557120.0, ans=0.125 2024-08-17 22:49:45,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3557320.0, ans=0.07 2024-08-17 22:49:51,220 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 12600, loss[loss=0.102, beats_loss=0.01254, ecapa_loss=0.0001128, whisper_loss=0.08838, over 17432.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01047, ecapa_loss=0.0001492, whisper_loss=0.0902, over 3877529.94 frames. ], batch size: 69, lr: 2.53e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:49:57,006 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.16 vs. limit=22.5 2024-08-17 22:49:58,024 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 19 from LS+wenet, 26 from Vox, 49 fro AS 2024-08-17 22:49:59,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3557420.0, ans=0.2 2024-08-17 22:50:00,923 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 24 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-17 22:50:07,025 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 28 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-17 22:50:10,924 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-17 22:50:12,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3557520.0, ans=0.2 2024-08-17 22:50:12,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3557520.0, ans=0.2 2024-08-17 22:50:20,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3557620.0, ans=0.0 2024-08-17 22:50:26,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3557620.0, ans=0.2 2024-08-17 22:50:57,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3557820.0, ans=15.0 2024-08-17 22:51:00,101 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 12650, loss[loss=0.08791, beats_loss=0.01321, ecapa_loss=0.0001564, whisper_loss=0.07314, over 21491.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01048, ecapa_loss=0.0001487, whisper_loss=0.09073, over 3887126.89 frames. ], batch size: 92, lr: 2.53e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:51:04,578 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.293e+01 2.533e+01 2.794e+01 5.900e+01, threshold=5.066e+01, percent-clipped=1.0 2024-08-17 22:51:19,405 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.35 vs. limit=15.0 2024-08-17 22:51:37,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3558120.0, ans=0.0 2024-08-17 22:51:46,192 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 14 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-17 22:51:51,163 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.52 vs. limit=15.0 2024-08-17 22:51:53,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3558220.0, ans=0.125 2024-08-17 22:51:57,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3558320.0, ans=0.125 2024-08-17 22:52:01,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3558320.0, ans=0.125 2024-08-17 22:52:08,968 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 12700, loss[loss=0.1125, beats_loss=0.009261, ecapa_loss=0.0001538, whisper_loss=0.1017, over 20688.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01054, ecapa_loss=0.0001486, whisper_loss=0.09054, over 3871120.03 frames. ], batch size: 80, lr: 2.53e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:52:19,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3558420.0, ans=0.2 2024-08-17 22:52:22,171 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.41 vs. limit=22.5 2024-08-17 22:52:33,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3558520.0, ans=0.125 2024-08-17 22:52:37,648 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-17 22:52:56,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3558720.0, ans=0.1 2024-08-17 22:53:10,318 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-17 22:53:10,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3558820.0, ans=0.125 2024-08-17 22:53:18,182 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 12750, loss[loss=0.1035, beats_loss=0.00686, ecapa_loss=0.0001625, whisper_loss=0.09507, over 16047.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01062, ecapa_loss=0.0001473, whisper_loss=0.09027, over 3880776.11 frames. ], batch size: 63, lr: 2.53e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:53:22,185 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.727e+01 2.283e+01 2.578e+01 2.885e+01 4.284e+01, threshold=5.156e+01, percent-clipped=0.0 2024-08-17 22:53:27,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3558920.0, ans=0.1 2024-08-17 22:53:31,748 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 24 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-17 22:53:38,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3559020.0, ans=0.0 2024-08-17 22:53:42,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3559020.0, ans=0.2 2024-08-17 22:53:45,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3559120.0, ans=0.0 2024-08-17 22:53:51,646 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 15 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-17 22:53:52,890 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 16 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-17 22:54:17,042 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-17 22:54:24,150 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.89 vs. limit=15.0 2024-08-17 22:54:25,994 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.06 vs. limit=15.0 2024-08-17 22:54:26,614 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 12800, loss[loss=0.06982, beats_loss=0.01281, ecapa_loss=0.0001942, whisper_loss=0.05507, over 14536.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01064, ecapa_loss=0.0001478, whisper_loss=0.09008, over 3878390.40 frames. ], batch size: 60, lr: 2.53e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:54:48,093 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.518e-03 2024-08-17 22:55:01,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3559620.0, ans=0.2 2024-08-17 22:55:05,000 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 22:55:30,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3559820.0, ans=0.125 2024-08-17 22:55:36,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3559920.0, ans=0.2 2024-08-17 22:55:37,023 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 12850, loss[loss=0.115, beats_loss=0.0115, ecapa_loss=0.0001412, whisper_loss=0.1021, over 22631.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01069, ecapa_loss=0.0001473, whisper_loss=0.08942, over 3845943.17 frames. ], batch size: 91, lr: 2.53e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:55:41,268 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.754e+01 2.263e+01 2.521e+01 2.838e+01 3.742e+01, threshold=5.041e+01, percent-clipped=0.0 2024-08-17 22:55:47,048 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-356000.pt 2024-08-17 22:55:49,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3559920.0, ans=0.125 2024-08-17 22:55:49,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3559920.0, ans=0.1 2024-08-17 22:55:52,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3560020.0, ans=0.0 2024-08-17 22:55:55,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3560020.0, ans=0.0 2024-08-17 22:56:05,616 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 24 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-17 22:56:09,175 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.51 vs. limit=15.0 2024-08-17 22:56:13,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3560120.0, ans=0.1 2024-08-17 22:56:19,616 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-17 22:56:32,301 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.04 vs. limit=15.0 2024-08-17 22:56:43,779 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.87 vs. limit=15.0 2024-08-17 22:56:49,670 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 12900, loss[loss=0.09568, beats_loss=0.01003, ecapa_loss=0.0001419, whisper_loss=0.08423, over 19365.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01063, ecapa_loss=0.0001476, whisper_loss=0.08974, over 3834483.23 frames. ], batch size: 76, lr: 2.53e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:56:52,679 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 19 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-17 22:57:00,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3560420.0, ans=0.125 2024-08-17 22:57:02,852 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-17 22:57:32,523 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.70 vs. limit=15.0 2024-08-17 22:57:35,887 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 29 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-17 22:57:40,988 WARNING [optim.py:496] (0/4) Scaling gradients by 0.045217473059892654, model_norm_threshold=50.41379928588867 2024-08-17 22:57:41,158 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.15, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.812e+05, grad_sumsq=1.812e+05, orig_rms_sq=1.000e+00 2024-08-17 22:57:41,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3560720.0, ans=0.0 2024-08-17 22:57:54,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3560820.0, ans=0.0 2024-08-17 22:57:55,755 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 15 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-17 22:58:01,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3560920.0, ans=0.125 2024-08-17 22:58:01,542 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.19 vs. limit=12.0 2024-08-17 22:58:02,145 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 12950, loss[loss=0.09156, beats_loss=0.01121, ecapa_loss=0.0001524, whisper_loss=0.07882, over 20384.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01051, ecapa_loss=0.0001478, whisper_loss=0.0904, over 3849937.89 frames. ], batch size: 83, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:58:03,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3560920.0, ans=0.125 2024-08-17 22:58:07,897 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.175e+01 2.399e+01 2.908e+01 1.115e+03, threshold=4.798e+01, percent-clipped=1.0 2024-08-17 22:58:12,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3560920.0, ans=0.125 2024-08-17 22:58:52,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3561220.0, ans=0.125 2024-08-17 22:59:12,762 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3561320.0, ans=0.2 2024-08-17 22:59:15,114 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 13000, loss[loss=0.08609, beats_loss=0.013, ecapa_loss=0.0001301, whisper_loss=0.07178, over 22122.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01054, ecapa_loss=0.000148, whisper_loss=0.09003, over 3853426.85 frames. ], batch size: 89, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:59:17,999 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 17 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-17 22:59:19,699 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-17 22:59:25,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3561420.0, ans=0.125 2024-08-17 22:59:48,017 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 20 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-17 22:59:50,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3561620.0, ans=0.125 2024-08-17 22:59:51,456 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=6.83 vs. limit=12.0 2024-08-17 23:00:07,748 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.44 vs. limit=15.0 2024-08-17 23:00:28,999 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 13050, loss[loss=0.09221, beats_loss=0.01187, ecapa_loss=0.0001736, whisper_loss=0.0786, over 15946.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01054, ecapa_loss=0.0001474, whisper_loss=0.09016, over 3827968.45 frames. ], batch size: 66, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:00:34,992 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.770e+01 2.290e+01 2.565e+01 2.843e+01 4.740e+01, threshold=5.131e+01, percent-clipped=0.0 2024-08-17 23:00:45,926 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-17 23:01:01,962 WARNING [optim.py:496] (0/4) Scaling gradients by 0.08320149034261703, model_norm_threshold=51.3093376159668 2024-08-17 23:01:02,131 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.121e+04, grad_sumsq=1.818e+04, orig_rms_sq=3.366e+00 2024-08-17 23:01:02,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3562120.0, ans=0.125 2024-08-17 23:01:17,385 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-17 23:01:19,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3562220.0, ans=0.125 2024-08-17 23:01:20,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3562220.0, ans=0.125 2024-08-17 23:01:26,810 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.55 vs. limit=22.5 2024-08-17 23:01:44,177 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 13100, loss[loss=0.1062, beats_loss=0.007658, ecapa_loss=0.0001832, whisper_loss=0.09671, over 22327.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01057, ecapa_loss=0.0001472, whisper_loss=0.0898, over 3853999.45 frames. ], batch size: 90, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:01:48,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3562420.0, ans=0.0 2024-08-17 23:01:51,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3562420.0, ans=0.2 2024-08-17 23:01:51,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3562420.0, ans=0.0 2024-08-17 23:01:55,683 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.45 vs. limit=10.0 2024-08-17 23:01:56,981 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2024-08-17 23:01:59,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3562520.0, ans=0.1 2024-08-17 23:02:31,748 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 20 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-17 23:02:36,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3562720.0, ans=0.125 2024-08-17 23:02:56,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3562820.0, ans=0.125 2024-08-17 23:03:00,803 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 13150, loss[loss=0.1175, beats_loss=0.008364, ecapa_loss=0.0001725, whisper_loss=0.1074, over 21399.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01047, ecapa_loss=0.0001474, whisper_loss=0.09022, over 3816654.34 frames. ], batch size: 87, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:03:06,819 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.009e+01 2.414e+01 2.698e+01 3.144e+01 6.167e+02, threshold=5.396e+01, percent-clipped=2.0 2024-08-17 23:03:21,296 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-17 23:03:31,904 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-17 23:03:33,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3563120.0, ans=0.0 2024-08-17 23:03:39,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3563120.0, ans=0.0 2024-08-17 23:03:46,234 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 16 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-17 23:03:46,931 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.82 vs. limit=10.0 2024-08-17 23:04:02,878 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 31 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-17 23:04:15,579 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 13200, loss[loss=0.1212, beats_loss=0.009925, ecapa_loss=0.0001445, whisper_loss=0.1099, over 19072.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01048, ecapa_loss=0.0001473, whisper_loss=0.09033, over 3806509.83 frames. ], batch size: 74, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:04:26,389 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.26 vs. limit=10.0 2024-08-17 23:04:27,818 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 36 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-17 23:04:30,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3563520.0, ans=0.0 2024-08-17 23:04:43,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3563620.0, ans=0.125 2024-08-17 23:04:45,329 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.54 vs. limit=15.0 2024-08-17 23:04:53,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3563620.0, ans=0.125 2024-08-17 23:05:06,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3563720.0, ans=0.2 2024-08-17 23:05:13,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3563820.0, ans=0.125 2024-08-17 23:05:19,666 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 35 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-17 23:05:28,622 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 13250, loss[loss=0.08156, beats_loss=0.01182, ecapa_loss=0.0001551, whisper_loss=0.06819, over 15304.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01047, ecapa_loss=0.0001479, whisper_loss=0.09017, over 3799205.49 frames. ], batch size: 62, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:05:30,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3563920.0, ans=0.125 2024-08-17 23:05:34,393 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.895e+01 2.345e+01 2.575e+01 2.975e+01 4.743e+02, threshold=5.149e+01, percent-clipped=2.0 2024-08-17 23:05:34,565 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 33 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-17 23:05:41,781 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 23 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-17 23:05:46,475 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 48 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-17 23:06:20,623 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-17 23:06:31,831 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.94 vs. limit=12.0 2024-08-17 23:06:33,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3564320.0, ans=0.125 2024-08-17 23:06:39,306 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 13300, loss[loss=0.09735, beats_loss=0.01108, ecapa_loss=0.0001312, whisper_loss=0.08496, over 22265.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01044, ecapa_loss=0.0001473, whisper_loss=0.09073, over 3843917.67 frames. ], batch size: 88, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:06:48,305 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-17 23:06:49,558 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 28 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-17 23:06:58,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3564520.0, ans=6.0 2024-08-17 23:07:02,753 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2024-08-17 23:07:13,525 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-17 23:07:13,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3564620.0, ans=0.0 2024-08-17 23:07:18,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3564620.0, ans=0.1 2024-08-17 23:07:23,323 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 32 from Vox, 32 fro AS 2024-08-17 23:07:42,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3564820.0, ans=0.0 2024-08-17 23:07:48,108 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 13350, loss[loss=0.111, beats_loss=0.01042, ecapa_loss=0.0001403, whisper_loss=0.09919, over 21749.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01038, ecapa_loss=0.0001473, whisper_loss=0.0912, over 3853625.27 frames. ], batch size: 87, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:07:49,472 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 19 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-17 23:07:53,622 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.397e+01 2.708e+01 2.975e+01 4.671e+01, threshold=5.415e+01, percent-clipped=0.0 2024-08-17 23:07:55,778 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.31 vs. limit=15.0 2024-08-17 23:08:04,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3565020.0, ans=0.125 2024-08-17 23:08:14,457 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-17 23:08:26,315 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 18 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-17 23:08:30,990 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-17 23:08:47,031 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-17 23:08:49,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3565320.0, ans=0.2 2024-08-17 23:08:55,779 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 13400, loss[loss=0.0908, beats_loss=0.01306, ecapa_loss=0.0001608, whisper_loss=0.07614, over 22480.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01045, ecapa_loss=0.0001465, whisper_loss=0.09084, over 3863732.18 frames. ], batch size: 95, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:08:57,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3565420.0, ans=0.0 2024-08-17 23:09:08,801 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.07 vs. limit=6.0 2024-08-17 23:09:09,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3565520.0, ans=0.2 2024-08-17 23:09:19,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3565520.0, ans=0.125 2024-08-17 23:09:34,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3565620.0, ans=0.0 2024-08-17 23:09:39,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3565720.0, ans=0.125 2024-08-17 23:10:05,687 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 13450, loss[loss=0.1242, beats_loss=0.008051, ecapa_loss=0.0001502, whisper_loss=0.1147, over 19938.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01043, ecapa_loss=0.0001474, whisper_loss=0.09124, over 3894575.02 frames. ], batch size: 77, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:10:10,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3565920.0, ans=0.125 2024-08-17 23:10:11,360 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.425e+01 2.663e+01 2.956e+01 3.669e+02, threshold=5.327e+01, percent-clipped=2.0 2024-08-17 23:10:11,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3565920.0, ans=0.2 2024-08-17 23:10:16,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3565920.0, ans=0.2 2024-08-17 23:10:26,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3566020.0, ans=0.125 2024-08-17 23:10:35,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3566120.0, ans=0.2 2024-08-17 23:10:42,418 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 17 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-17 23:10:55,378 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-17 23:10:58,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3566220.0, ans=0.1 2024-08-17 23:11:06,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3566320.0, ans=0.1 2024-08-17 23:11:08,674 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 19 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-17 23:11:14,045 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 13500, loss[loss=0.09658, beats_loss=0.01146, ecapa_loss=0.0001175, whisper_loss=0.08395, over 22197.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01048, ecapa_loss=0.0001478, whisper_loss=0.09067, over 3886657.81 frames. ], batch size: 87, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:11:20,580 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 15 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-17 23:11:24,762 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3566420.0, ans=0.07 2024-08-17 23:11:33,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3566520.0, ans=0.125 2024-08-17 23:11:51,917 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-17 23:11:59,887 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-17 23:12:11,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3566820.0, ans=0.125 2024-08-17 23:12:17,272 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 15 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-17 23:12:21,095 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 13550, loss[loss=0.1031, beats_loss=0.01064, ecapa_loss=0.0001612, whisper_loss=0.0908, over 20181.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01054, ecapa_loss=0.0001482, whisper_loss=0.09018, over 3871693.27 frames. ], batch size: 80, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:12:26,141 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.373e+01 2.638e+01 2.818e+01 4.102e+01, threshold=5.277e+01, percent-clipped=0.0 2024-08-17 23:12:26,836 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.87 vs. limit=6.0 2024-08-17 23:12:59,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3567120.0, ans=0.0 2024-08-17 23:13:12,661 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 10 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-17 23:13:18,687 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.81 vs. limit=15.0 2024-08-17 23:13:19,913 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.72 vs. limit=22.5 2024-08-17 23:13:29,213 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 13600, loss[loss=0.1172, beats_loss=0.01086, ecapa_loss=0.0001436, whisper_loss=0.1049, over 21283.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01064, ecapa_loss=0.0001481, whisper_loss=0.08969, over 3878550.07 frames. ], batch size: 83, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:13:29,571 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3567420.0, ans=0.1 2024-08-17 23:13:32,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3567420.0, ans=0.95 2024-08-17 23:13:38,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3567420.0, ans=0.125 2024-08-17 23:13:43,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3567520.0, ans=0.125 2024-08-17 23:13:46,991 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.67 vs. limit=15.0 2024-08-17 23:13:57,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3567620.0, ans=0.125 2024-08-17 23:14:03,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3567620.0, ans=0.1 2024-08-17 23:14:08,745 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 12 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-17 23:14:10,286 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 33 from Vox, 31 fro AS 2024-08-17 23:14:17,425 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 26 from LS+wenet, 15 from Vox, 16 fro AS 2024-08-17 23:14:21,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3567720.0, ans=0.1 2024-08-17 23:14:30,935 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 20 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-17 23:14:34,173 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.79 vs. limit=15.0 2024-08-17 23:14:40,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3567920.0, ans=0.0 2024-08-17 23:14:40,971 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 13650, loss[loss=0.0837, beats_loss=0.01171, ecapa_loss=0.0001381, whisper_loss=0.0706, over 18096.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01069, ecapa_loss=0.0001489, whisper_loss=0.08943, over 3872752.30 frames. ], batch size: 72, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:14:41,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3567920.0, ans=0.125 2024-08-17 23:14:47,032 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.376e+01 2.690e+01 3.104e+01 4.136e+01, threshold=5.380e+01, percent-clipped=0.0 2024-08-17 23:14:55,359 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.60 vs. limit=15.0 2024-08-17 23:14:55,844 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 18 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-17 23:15:02,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3568020.0, ans=0.0 2024-08-17 23:15:31,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3568220.0, ans=0.07 2024-08-17 23:15:35,452 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.89 vs. limit=12.0 2024-08-17 23:15:36,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3568320.0, ans=0.1 2024-08-17 23:15:53,902 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 13700, loss[loss=0.1208, beats_loss=0.00769, ecapa_loss=0.0001799, whisper_loss=0.1113, over 22604.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01061, ecapa_loss=0.0001497, whisper_loss=0.08993, over 3859315.49 frames. ], batch size: 89, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:16:22,612 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 20 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-17 23:16:24,184 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3568620.0, ans=0.125 2024-08-17 23:17:04,166 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 13750, loss[loss=0.09718, beats_loss=0.01155, ecapa_loss=0.0001286, whisper_loss=0.08434, over 17073.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01063, ecapa_loss=0.0001488, whisper_loss=0.08965, over 3839134.07 frames. ], batch size: 70, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:17:04,268 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 14 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-17 23:17:10,315 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.371e+01 2.635e+01 2.890e+01 4.017e+01, threshold=5.270e+01, percent-clipped=0.0 2024-08-17 23:17:44,223 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.82 vs. limit=10.0 2024-08-17 23:18:13,897 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 13800, loss[loss=0.09492, beats_loss=0.008557, ecapa_loss=0.0001888, whisper_loss=0.08447, over 17030.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01059, ecapa_loss=0.0001483, whisper_loss=0.08959, over 3843242.19 frames. ], batch size: 71, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:18:15,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3569420.0, ans=0.125 2024-08-17 23:18:16,820 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.94 vs. limit=15.0 2024-08-17 23:18:25,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3569420.0, ans=0.0 2024-08-17 23:18:26,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3569520.0, ans=0.0 2024-08-17 23:18:33,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3569520.0, ans=0.125 2024-08-17 23:18:39,754 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.72 vs. limit=22.5 2024-08-17 23:18:40,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3569620.0, ans=0.1 2024-08-17 23:19:21,475 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 13850, loss[loss=0.121, beats_loss=0.009653, ecapa_loss=0.0001311, whisper_loss=0.11, over 23768.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01053, ecapa_loss=0.0001481, whisper_loss=0.08999, over 3844987.55 frames. ], batch size: 92, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:19:26,981 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.402e+01 2.666e+01 2.960e+01 4.114e+01, threshold=5.333e+01, percent-clipped=0.0 2024-08-17 23:19:30,105 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 24 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-17 23:19:53,209 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 23 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-17 23:19:55,899 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 13 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-17 23:20:02,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3570220.0, ans=0.2 2024-08-17 23:20:25,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3570320.0, ans=0.2 2024-08-17 23:20:27,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=3570320.0, ans=0.5 2024-08-17 23:20:29,635 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 13900, loss[loss=0.1124, beats_loss=0.01023, ecapa_loss=0.0001624, whisper_loss=0.1005, over 22300.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01059, ecapa_loss=0.0001477, whisper_loss=0.08954, over 3854859.37 frames. ], batch size: 93, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:20:50,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3570520.0, ans=0.125 2024-08-17 23:20:51,774 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-17 23:21:08,932 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.21 vs. limit=15.0 2024-08-17 23:21:14,551 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=18.25 vs. limit=15.0 2024-08-17 23:21:28,347 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 34 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-17 23:21:29,972 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-17 23:21:34,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3570820.0, ans=0.125 2024-08-17 23:21:38,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3570920.0, ans=0.0 2024-08-17 23:21:39,454 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 13950, loss[loss=0.09938, beats_loss=0.01038, ecapa_loss=0.0001393, whisper_loss=0.08761, over 15384.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0106, ecapa_loss=0.0001474, whisper_loss=0.08957, over 3883139.70 frames. ], batch size: 61, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:21:45,226 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.372e+01 2.631e+01 3.013e+01 8.564e+01, threshold=5.263e+01, percent-clipped=2.0 2024-08-17 23:21:45,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3570920.0, ans=0.125 2024-08-17 23:21:57,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3571020.0, ans=0.125 2024-08-17 23:22:00,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3571020.0, ans=0.07 2024-08-17 23:22:01,093 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-17 23:22:11,981 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.29 vs. limit=15.0 2024-08-17 23:22:20,078 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 28 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-17 23:22:21,279 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 24 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-17 23:22:22,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3571220.0, ans=0.0 2024-08-17 23:22:31,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3571220.0, ans=0.2 2024-08-17 23:22:35,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3571320.0, ans=0.125 2024-08-17 23:22:47,140 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=10.07 vs. limit=12.0 2024-08-17 23:22:50,768 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 14000, loss[loss=0.1012, beats_loss=0.01096, ecapa_loss=0.0001438, whisper_loss=0.08875, over 20243.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01056, ecapa_loss=0.0001463, whisper_loss=0.09029, over 3865923.07 frames. ], batch size: 81, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:22:55,291 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 11 from Vox, 34 fro AS 2024-08-17 23:23:10,189 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.45 vs. limit=10.0 2024-08-17 23:23:15,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3571520.0, ans=0.125 2024-08-17 23:23:17,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3571520.0, ans=0.1 2024-08-17 23:23:57,176 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-17 23:24:00,114 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3571920.0, ans=0.0 2024-08-17 23:24:01,024 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 14050, loss[loss=0.1147, beats_loss=0.0111, ecapa_loss=0.0001483, whisper_loss=0.1021, over 23744.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0106, ecapa_loss=0.0001443, whisper_loss=0.09018, over 3884349.24 frames. ], batch size: 93, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:24:06,127 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.345e+01 2.533e+01 2.843e+01 6.962e+01, threshold=5.065e+01, percent-clipped=1.0 2024-08-17 23:24:34,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3572120.0, ans=0.125 2024-08-17 23:24:37,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3572120.0, ans=0.0 2024-08-17 23:24:46,299 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.12 vs. limit=15.0 2024-08-17 23:24:59,889 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-17 23:25:05,141 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 16 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-17 23:25:09,204 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 14100, loss[loss=0.08969, beats_loss=0.01386, ecapa_loss=0.0001162, whisper_loss=0.07466, over 22758.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01067, ecapa_loss=0.0001443, whisper_loss=0.09022, over 3873673.18 frames. ], batch size: 91, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:25:12,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3572420.0, ans=0.0 2024-08-17 23:25:53,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3572720.0, ans=0.0 2024-08-17 23:26:11,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3572820.0, ans=0.125 2024-08-17 23:26:14,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3572820.0, ans=0.125 2024-08-17 23:26:17,692 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 14150, loss[loss=0.1297, beats_loss=0.008594, ecapa_loss=0.0001317, whisper_loss=0.1198, over 20119.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01067, ecapa_loss=0.0001441, whisper_loss=0.09045, over 3901039.29 frames. ], batch size: 75, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:26:21,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3572920.0, ans=0.0 2024-08-17 23:26:22,843 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.750e+01 2.363e+01 2.612e+01 2.970e+01 1.774e+02, threshold=5.225e+01, percent-clipped=3.0 2024-08-17 23:26:39,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3573020.0, ans=0.025 2024-08-17 23:26:45,444 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.75 vs. limit=15.0 2024-08-17 23:27:11,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3573320.0, ans=0.125 2024-08-17 23:27:26,585 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 14200, loss[loss=0.1105, beats_loss=0.01027, ecapa_loss=0.0001708, whisper_loss=0.09855, over 22559.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01066, ecapa_loss=0.0001442, whisper_loss=0.09051, over 3940081.38 frames. ], batch size: 93, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:27:30,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3573420.0, ans=0.0 2024-08-17 23:27:33,805 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-17 23:27:33,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3573420.0, ans=0.0 2024-08-17 23:27:38,429 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.38 vs. limit=15.0 2024-08-17 23:27:43,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3573520.0, ans=0.0 2024-08-17 23:27:45,892 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-17 23:28:28,206 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.57 vs. limit=15.0 2024-08-17 23:28:33,350 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 14250, loss[loss=0.09084, beats_loss=0.01327, ecapa_loss=0.0001075, whisper_loss=0.07649, over 15171.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01068, ecapa_loss=0.0001433, whisper_loss=0.09051, over 3920789.83 frames. ], batch size: 59, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:28:33,534 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-17 23:28:37,131 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.68 vs. limit=15.0 2024-08-17 23:28:38,813 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.386e+01 2.591e+01 2.986e+01 7.501e+01, threshold=5.182e+01, percent-clipped=1.0 2024-08-17 23:28:38,937 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 25 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-17 23:28:41,313 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.50 vs. limit=10.0 2024-08-17 23:28:58,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=3574020.0, ans=0.05 2024-08-17 23:29:02,127 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 42 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-17 23:29:20,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3574220.0, ans=0.125 2024-08-17 23:29:23,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3574220.0, ans=0.0 2024-08-17 23:29:34,140 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 31 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-17 23:29:42,734 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 14300, loss[loss=0.1087, beats_loss=0.009212, ecapa_loss=0.0001855, whisper_loss=0.09763, over 19579.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01067, ecapa_loss=0.0001438, whisper_loss=0.09061, over 3949167.55 frames. ], batch size: 84, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:29:42,830 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 30 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-17 23:29:47,313 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-17 23:29:52,727 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 24 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-17 23:29:56,452 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.22 vs. limit=15.0 2024-08-17 23:30:07,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3574520.0, ans=0.125 2024-08-17 23:30:08,063 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 14 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-17 23:30:18,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3574620.0, ans=0.125 2024-08-17 23:30:21,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3574620.0, ans=0.0 2024-08-17 23:30:30,862 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.98 vs. limit=12.0 2024-08-17 23:30:33,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3574720.0, ans=0.125 2024-08-17 23:30:45,532 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-17 23:30:51,930 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.14 vs. limit=15.0 2024-08-17 23:30:52,347 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 14350, loss[loss=0.09538, beats_loss=0.0118, ecapa_loss=0.000141, whisper_loss=0.08217, over 18231.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01075, ecapa_loss=0.0001429, whisper_loss=0.09004, over 3943520.91 frames. ], batch size: 74, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:30:57,415 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.260e+01 2.482e+01 2.823e+01 6.177e+01, threshold=4.963e+01, percent-clipped=1.0 2024-08-17 23:31:16,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3575020.0, ans=0.125 2024-08-17 23:31:20,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3575120.0, ans=0.125 2024-08-17 23:31:22,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3575120.0, ans=0.0 2024-08-17 23:31:33,739 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 15 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-17 23:31:34,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3575220.0, ans=0.125 2024-08-17 23:31:41,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3575220.0, ans=0.0 2024-08-17 23:31:48,693 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.83 vs. limit=22.5 2024-08-17 23:32:00,187 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 27 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-17 23:32:01,448 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 14400, loss[loss=0.1213, beats_loss=0.009447, ecapa_loss=0.0001197, whisper_loss=0.1106, over 18568.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01063, ecapa_loss=0.0001445, whisper_loss=0.09072, over 3926167.67 frames. ], batch size: 69, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:32:14,706 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 22 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-17 23:32:23,729 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.19 vs. limit=12.0 2024-08-17 23:32:29,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3575620.0, ans=0.1 2024-08-17 23:32:31,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3575620.0, ans=0.0 2024-08-17 23:32:32,039 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.07 vs. limit=22.5 2024-08-17 23:32:40,542 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 19 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-17 23:32:56,215 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 22 from LS+wenet, 15 from Vox, 52 fro AS 2024-08-17 23:33:15,034 INFO [train_multi_KD3.py:1116] (0/4) Epoch 24, batch 14450, loss[loss=0.09383, beats_loss=0.01209, ecapa_loss=0.0001316, whisper_loss=0.08042, over 14587.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01062, ecapa_loss=0.0001452, whisper_loss=0.09108, over 3911539.54 frames. ], batch size: 58, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:33:18,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3575920.0, ans=0.125 2024-08-17 23:33:20,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3575920.0, ans=0.125 2024-08-17 23:33:21,140 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.457e+01 2.673e+01 3.011e+01 5.974e+01, threshold=5.346e+01, percent-clipped=2.0 2024-08-17 23:33:50,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3576120.0, ans=0.125 2024-08-17 23:33:57,604 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2024-08-17 23:33:58,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3576220.0, ans=0.1 2024-08-17 23:34:03,727 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-17 23:34:13,312 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/epoch-24.pt 2024-08-17 23:34:55,209 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 0, loss[loss=0.08456, beats_loss=0.008767, ecapa_loss=0.0001703, whisper_loss=0.07409, over 16155.00 frames. ], tot_loss[loss=0.08456, beats_loss=0.008767, ecapa_loss=0.0001703, whisper_loss=0.07409, over 16155.00 frames. ], batch size: 62, lr: 2.48e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:34:55,211 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-17 23:35:34,972 INFO [train_multi_KD3.py:1149] (0/4) Epoch 25, validation on ASR_libri: loss=0.253, beats_loss=0, ecapa_loss=0.000529, whisper_loss=0.2477, over 922467.00 frames. 2024-08-17 23:35:49,882 INFO [train_multi_KD3.py:1149] (0/4) Epoch 25, validation on SV_voxceleb1: loss=0.004106, beats_loss=0, ecapa_loss=0.0004106, whisper_loss=0, over 939242.00 frames. 2024-08-17 23:37:32,537 INFO [train_multi_KD3.py:1149] (0/4) Epoch 25, validation on AT_audioset: loss=0.02333, beats_loss=0.02333, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-17 23:37:32,541 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32389MB 2024-08-17 23:37:34,390 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=8.076e-02 2024-08-17 23:37:56,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3576420.0, ans=0.1 2024-08-17 23:37:56,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3576420.0, ans=0.0 2024-08-17 23:38:15,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3576420.0, ans=0.1 2024-08-17 23:38:17,863 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 18 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-17 23:38:18,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3576420.0, ans=0.125 2024-08-17 23:38:27,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3576520.0, ans=0.125 2024-08-17 23:38:30,121 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=6.040e-01 2024-08-17 23:39:05,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3576620.0, ans=0.125 2024-08-17 23:39:21,236 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.41 vs. limit=15.0 2024-08-17 23:39:32,414 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 50, loss[loss=0.09621, beats_loss=0.01173, ecapa_loss=0.0001034, whisper_loss=0.08345, over 17244.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.009516, ecapa_loss=0.0001493, whisper_loss=0.09154, over 880794.86 frames. ], batch size: 68, lr: 2.48e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:39:35,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3576820.0, ans=0.0 2024-08-17 23:39:51,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3576820.0, ans=0.125 2024-08-17 23:40:04,010 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.415e+01 2.699e+01 3.079e+01 5.308e+01, threshold=5.398e+01, percent-clipped=0.0 2024-08-17 23:40:16,265 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.80 vs. limit=15.0 2024-08-17 23:40:18,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3577020.0, ans=0.1 2024-08-17 23:40:48,599 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-17 23:41:03,489 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 23:41:21,773 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 100, loss[loss=0.08112, beats_loss=0.009867, ecapa_loss=0.0001115, whisper_loss=0.07013, over 16661.00 frames. ], tot_loss[loss=0.09957, beats_loss=0.009511, ecapa_loss=0.0001474, whisper_loss=0.08858, over 1524823.75 frames. ], batch size: 62, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:41:36,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3577320.0, ans=0.0 2024-08-17 23:41:38,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3577320.0, ans=0.0 2024-08-17 23:41:38,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3577320.0, ans=0.025 2024-08-17 23:41:51,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3577420.0, ans=0.125 2024-08-17 23:42:02,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3577420.0, ans=0.0 2024-08-17 23:42:06,144 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 14 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-17 23:42:23,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3577520.0, ans=0.125 2024-08-17 23:42:29,539 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-17 23:42:36,362 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 23:43:04,531 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 150, loss[loss=0.1042, beats_loss=0.01089, ecapa_loss=0.0001622, whisper_loss=0.09173, over 22176.00 frames. ], tot_loss[loss=0.0993, beats_loss=0.009637, ecapa_loss=0.0001492, whisper_loss=0.08817, over 2066706.01 frames. ], batch size: 89, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:43:06,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3577820.0, ans=0.125 2024-08-17 23:43:11,460 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=7.761e+00 2024-08-17 23:43:11,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3577820.0, ans=0.1 2024-08-17 23:43:15,620 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.18 vs. limit=10.0 2024-08-17 23:43:16,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3577820.0, ans=0.125 2024-08-17 23:43:27,248 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.000e+01 2.543e+01 2.785e+01 3.052e+01 4.688e+01, threshold=5.571e+01, percent-clipped=0.0 2024-08-17 23:44:15,770 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 26 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-17 23:44:18,897 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-17 23:44:22,633 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2024-08-17 23:44:23,170 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 200, loss[loss=0.109, beats_loss=0.008938, ecapa_loss=0.0001439, whisper_loss=0.09859, over 21728.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.009751, ecapa_loss=0.0001503, whisper_loss=0.08981, over 2488408.74 frames. ], batch size: 84, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:44:37,530 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 30 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-17 23:44:39,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3578420.0, ans=0.1 2024-08-17 23:44:40,655 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.78 vs. limit=15.0 2024-08-17 23:44:45,499 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 10 from Vox, 35 fro AS 2024-08-17 23:44:45,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3578420.0, ans=0.2 2024-08-17 23:44:46,830 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 34 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-17 23:44:46,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3578420.0, ans=0.2 2024-08-17 23:44:50,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=3578520.0, ans=15.0 2024-08-17 23:44:55,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3578520.0, ans=0.125 2024-08-17 23:44:55,834 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.83 vs. limit=15.0 2024-08-17 23:45:05,465 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 27 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-17 23:45:05,994 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.49 vs. limit=10.0 2024-08-17 23:45:11,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3578620.0, ans=0.04949747468305833 2024-08-17 23:45:33,524 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 250, loss[loss=0.09613, beats_loss=0.01043, ecapa_loss=0.0001315, whisper_loss=0.08439, over 13988.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.009848, ecapa_loss=0.000149, whisper_loss=0.09082, over 2780813.39 frames. ], batch size: 54, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:45:34,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3578820.0, ans=0.07 2024-08-17 23:45:38,272 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 25 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-17 23:45:40,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3578820.0, ans=0.1 2024-08-17 23:45:44,522 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.17 vs. limit=12.0 2024-08-17 23:45:49,703 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-17 23:45:53,631 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.761e+01 2.355e+01 2.707e+01 2.999e+01 3.108e+02, threshold=5.414e+01, percent-clipped=1.0 2024-08-17 23:46:01,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3579020.0, ans=0.1 2024-08-17 23:46:38,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3579220.0, ans=0.125 2024-08-17 23:46:41,977 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 300, loss[loss=0.08336, beats_loss=0.01121, ecapa_loss=0.0001526, whisper_loss=0.07062, over 16157.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01002, ecapa_loss=0.0001488, whisper_loss=0.09022, over 3008208.95 frames. ], batch size: 66, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:47:07,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3579520.0, ans=0.05 2024-08-17 23:47:16,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3579520.0, ans=0.125 2024-08-17 23:47:33,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3579620.0, ans=0.0 2024-08-17 23:47:48,833 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 350, loss[loss=0.1033, beats_loss=0.01073, ecapa_loss=0.0001694, whisper_loss=0.09086, over 22467.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01024, ecapa_loss=0.0001479, whisper_loss=0.08935, over 3187229.92 frames. ], batch size: 91, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:48:07,195 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.918e+01 2.271e+01 2.566e+01 2.908e+01 1.421e+02, threshold=5.133e+01, percent-clipped=1.0 2024-08-17 23:48:18,162 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-17 23:48:18,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3580020.0, ans=0.0 2024-08-17 23:48:29,001 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-17 23:48:37,248 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 27 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-17 23:48:56,049 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 400, loss[loss=0.09035, beats_loss=0.01087, ecapa_loss=0.0001301, whisper_loss=0.07818, over 15964.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01025, ecapa_loss=0.0001481, whisper_loss=0.08921, over 3323142.63 frames. ], batch size: 59, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:49:01,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3580320.0, ans=0.2 2024-08-17 23:49:15,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3580420.0, ans=0.125 2024-08-17 23:49:21,909 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 37 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-17 23:49:26,853 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-17 23:49:32,896 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.96 vs. limit=22.5 2024-08-17 23:49:38,551 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-17 23:49:51,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=3580720.0, ans=0.05 2024-08-17 23:49:55,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3580720.0, ans=0.2 2024-08-17 23:49:55,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3580720.0, ans=0.0 2024-08-17 23:49:59,160 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 26 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-17 23:50:04,333 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 450, loss[loss=0.1105, beats_loss=0.008916, ecapa_loss=0.0001737, whisper_loss=0.09986, over 20196.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01024, ecapa_loss=0.0001469, whisper_loss=0.0893, over 3421131.55 frames. ], batch size: 84, lr: 2.47e-03, grad_scale: 1.152921504606847e+18 2024-08-17 23:50:08,923 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.78 vs. limit=15.0 2024-08-17 23:50:14,802 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 22 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-17 23:50:22,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3580920.0, ans=0.0 2024-08-17 23:50:23,316 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.243e+01 2.550e+01 2.884e+01 5.686e+01, threshold=5.100e+01, percent-clipped=1.0 2024-08-17 23:50:26,325 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 34 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-17 23:50:29,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3580920.0, ans=0.1 2024-08-17 23:50:35,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3581020.0, ans=0.035 2024-08-17 23:50:45,274 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.95 vs. limit=10.0 2024-08-17 23:50:48,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3581120.0, ans=0.0 2024-08-17 23:51:11,938 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 500, loss[loss=0.086, beats_loss=0.01319, ecapa_loss=0.0001403, whisper_loss=0.07141, over 22142.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.0103, ecapa_loss=0.0001462, whisper_loss=0.08934, over 3511980.93 frames. ], batch size: 93, lr: 2.47e-03, grad_scale: 1.152921504606847e+18 2024-08-17 23:51:28,423 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 19 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-17 23:51:47,565 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 23 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-17 23:51:50,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=3581520.0, ans=0.02 2024-08-17 23:51:50,916 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.47 vs. limit=15.0 2024-08-17 23:52:10,632 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 22 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-17 23:52:17,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3581720.0, ans=0.2 2024-08-17 23:52:18,584 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.17 vs. limit=15.0 2024-08-17 23:52:19,255 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 550, loss[loss=0.09452, beats_loss=0.01226, ecapa_loss=0.0001475, whisper_loss=0.08078, over 22323.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01035, ecapa_loss=0.0001457, whisper_loss=0.08939, over 3583008.00 frames. ], batch size: 93, lr: 2.47e-03, grad_scale: 1.152921504606847e+18 2024-08-17 23:52:30,711 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.47 vs. limit=22.5 2024-08-17 23:52:30,752 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.77 vs. limit=12.0 2024-08-17 23:52:38,330 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.745e+01 2.286e+01 2.510e+01 2.772e+01 4.019e+01, threshold=5.020e+01, percent-clipped=0.0 2024-08-17 23:52:45,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3582020.0, ans=0.0 2024-08-17 23:52:52,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3582020.0, ans=0.1 2024-08-17 23:53:00,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3582120.0, ans=10.0 2024-08-17 23:53:14,167 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-17 23:53:15,648 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 30 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-17 23:53:16,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=3582220.0, ans=22.5 2024-08-17 23:53:23,095 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.65 vs. limit=22.5 2024-08-17 23:53:23,193 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.74 vs. limit=10.0 2024-08-17 23:53:27,376 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 600, loss[loss=0.1221, beats_loss=0.009145, ecapa_loss=0.0001554, whisper_loss=0.1114, over 22674.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01035, ecapa_loss=0.0001461, whisper_loss=0.0898, over 3651108.22 frames. ], batch size: 91, lr: 2.47e-03, grad_scale: 1.152921504606847e+18 2024-08-17 23:53:29,873 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.46 vs. limit=15.0 2024-08-17 23:53:32,303 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.91 vs. limit=22.5 2024-08-17 23:53:36,108 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 25 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-17 23:53:40,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3582420.0, ans=0.0 2024-08-17 23:53:54,992 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 20 from LS+wenet, 23 from Vox, 48 fro AS 2024-08-17 23:54:03,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3582520.0, ans=0.2 2024-08-17 23:54:08,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=3582620.0, ans=0.5 2024-08-17 23:54:10,926 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-17 23:54:26,583 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 17 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-17 23:54:34,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3582820.0, ans=0.1 2024-08-17 23:54:35,278 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 650, loss[loss=0.1074, beats_loss=0.009257, ecapa_loss=0.0001426, whisper_loss=0.09675, over 16196.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0104, ecapa_loss=0.0001453, whisper_loss=0.08955, over 3689743.28 frames. ], batch size: 63, lr: 2.47e-03, grad_scale: 1.152921504606847e+18 2024-08-17 23:54:53,628 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.263e+01 2.531e+01 2.852e+01 5.403e+01, threshold=5.063e+01, percent-clipped=1.0 2024-08-17 23:54:58,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3582920.0, ans=0.1 2024-08-17 23:55:03,055 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 29 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-17 23:55:25,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3583120.0, ans=0.2 2024-08-17 23:55:34,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3583220.0, ans=0.125 2024-08-17 23:55:42,598 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 700, loss[loss=0.09819, beats_loss=0.01146, ecapa_loss=0.0001641, whisper_loss=0.08509, over 15638.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01042, ecapa_loss=0.0001454, whisper_loss=0.08966, over 3701515.94 frames. ], batch size: 64, lr: 2.47e-03, grad_scale: 1.152921504606847e+18 2024-08-17 23:55:48,151 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-17 23:55:51,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3583320.0, ans=0.0 2024-08-17 23:56:00,349 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.735e-02 2024-08-17 23:56:04,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3583420.0, ans=0.0 2024-08-17 23:56:12,628 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 25 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-17 23:56:46,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3583720.0, ans=0.0 2024-08-17 23:56:50,388 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 750, loss[loss=0.1004, beats_loss=0.00989, ecapa_loss=0.00016, whisper_loss=0.08891, over 19251.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01039, ecapa_loss=0.0001449, whisper_loss=0.0894, over 3700430.78 frames. ], batch size: 78, lr: 2.47e-03, grad_scale: 1.152921504606847e+18 2024-08-17 23:56:52,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3583820.0, ans=0.125 2024-08-17 23:56:53,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3583820.0, ans=0.125 2024-08-17 23:56:57,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3583820.0, ans=0.0 2024-08-17 23:57:09,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3583920.0, ans=0.0 2024-08-17 23:57:10,154 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.300e+01 2.481e+01 2.765e+01 4.539e+01, threshold=4.963e+01, percent-clipped=0.0 2024-08-17 23:57:15,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3583920.0, ans=0.125 2024-08-17 23:57:23,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3584020.0, ans=0.0 2024-08-17 23:57:31,322 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 22 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-17 23:57:31,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3584120.0, ans=0.125 2024-08-17 23:57:39,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3584120.0, ans=0.2 2024-08-17 23:57:41,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3584120.0, ans=0.0 2024-08-17 23:57:46,903 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 27 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-17 23:57:58,500 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 800, loss[loss=0.1329, beats_loss=0.008503, ecapa_loss=0.0001105, whisper_loss=0.1233, over 23646.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01039, ecapa_loss=0.0001452, whisper_loss=0.08959, over 3777969.24 frames. ], batch size: 86, lr: 2.47e-03, grad_scale: 1.152921504606847e+18 2024-08-17 23:58:13,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3584420.0, ans=0.125 2024-08-17 23:58:24,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3584520.0, ans=0.0 2024-08-17 23:58:26,055 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 23:58:31,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3584520.0, ans=0.125 2024-08-17 23:58:32,216 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 14 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-17 23:58:34,767 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 24 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-17 23:58:38,477 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 13 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-17 23:58:50,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3584720.0, ans=0.125 2024-08-17 23:58:50,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3584720.0, ans=0.125 2024-08-17 23:59:04,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3584820.0, ans=0.0 2024-08-17 23:59:04,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3584820.0, ans=0.0 2024-08-17 23:59:04,943 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 850, loss[loss=0.1009, beats_loss=0.0108, ecapa_loss=0.0001215, whisper_loss=0.08884, over 18389.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01036, ecapa_loss=0.0001448, whisper_loss=0.08935, over 3812189.30 frames. ], batch size: 69, lr: 2.47e-03, grad_scale: 1.152921504606847e+18 2024-08-17 23:59:05,470 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.37 vs. limit=22.5 2024-08-17 23:59:16,093 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 25 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-17 23:59:17,569 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-17 23:59:24,148 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.705e+01 2.267e+01 2.454e+01 2.766e+01 5.930e+01, threshold=4.908e+01, percent-clipped=1.0 2024-08-17 23:59:26,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3584920.0, ans=0.2 2024-08-17 23:59:27,414 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.51 vs. limit=6.0 2024-08-17 23:59:31,035 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 10 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-17 23:59:53,161 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.92 vs. limit=15.0 2024-08-17 23:59:54,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3585120.0, ans=0.125 2024-08-18 00:00:02,681 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 33 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-18 00:00:13,189 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 900, loss[loss=0.1077, beats_loss=0.009486, ecapa_loss=0.0001508, whisper_loss=0.0967, over 16356.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01039, ecapa_loss=0.0001448, whisper_loss=0.08918, over 3799363.34 frames. ], batch size: 65, lr: 2.47e-03, grad_scale: 1.152921504606847e+18 2024-08-18 00:00:15,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3585320.0, ans=0.07 2024-08-18 00:00:20,850 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.93 vs. limit=12.0 2024-08-18 00:00:24,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3585320.0, ans=0.125 2024-08-18 00:00:33,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3585420.0, ans=0.0 2024-08-18 00:00:38,328 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 27 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-18 00:00:45,081 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 15 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-18 00:00:59,252 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 28 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-18 00:01:03,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3585620.0, ans=0.1 2024-08-18 00:01:03,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3585620.0, ans=0.125 2024-08-18 00:01:06,826 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.31 vs. limit=15.0 2024-08-18 00:01:14,913 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.22 vs. limit=15.0 2024-08-18 00:01:20,412 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 950, loss[loss=0.07961, beats_loss=0.01391, ecapa_loss=0.0001065, whisper_loss=0.06464, over 23106.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01041, ecapa_loss=0.000144, whisper_loss=0.08887, over 3805345.30 frames. ], batch size: 95, lr: 2.47e-03, grad_scale: 1.152921504606847e+18 2024-08-18 00:01:27,491 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 18 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-18 00:01:33,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3585920.0, ans=0.2 2024-08-18 00:01:41,290 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.321e+01 2.549e+01 2.773e+01 6.184e+01, threshold=5.098e+01, percent-clipped=1.0 2024-08-18 00:01:41,444 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-18 00:01:42,913 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 36 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-18 00:01:43,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3585920.0, ans=0.125 2024-08-18 00:01:46,904 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-18 00:02:00,864 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.59 vs. limit=15.0 2024-08-18 00:02:03,775 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.09 vs. limit=15.0 2024-08-18 00:02:14,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3586220.0, ans=0.0 2024-08-18 00:02:20,422 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 32 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-18 00:02:20,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3586220.0, ans=0.125 2024-08-18 00:02:24,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3586220.0, ans=0.2 2024-08-18 00:02:28,152 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 1000, loss[loss=0.117, beats_loss=0.00999, ecapa_loss=0.0001363, whisper_loss=0.1056, over 23120.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01044, ecapa_loss=0.0001432, whisper_loss=0.08898, over 3839505.43 frames. ], batch size: 88, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:02:30,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3586320.0, ans=0.2 2024-08-18 00:02:47,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3586420.0, ans=0.0 2024-08-18 00:02:55,238 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-18 00:03:07,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3586520.0, ans=0.0 2024-08-18 00:03:13,094 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.96 vs. limit=15.0 2024-08-18 00:03:14,412 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-18 00:03:30,724 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.69 vs. limit=15.0 2024-08-18 00:03:36,345 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 1050, loss[loss=0.07248, beats_loss=0.01217, ecapa_loss=0.0001518, whisper_loss=0.05878, over 17316.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01048, ecapa_loss=0.000143, whisper_loss=0.08878, over 3851029.69 frames. ], batch size: 71, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:03:44,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=3586820.0, ans=10.0 2024-08-18 00:03:57,046 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.918e+01 2.345e+01 2.573e+01 2.782e+01 6.018e+01, threshold=5.145e+01, percent-clipped=1.0 2024-08-18 00:03:57,790 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.49 vs. limit=15.0 2024-08-18 00:04:02,076 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 16 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-18 00:04:25,827 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 16 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-18 00:04:38,564 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.71 vs. limit=6.0 2024-08-18 00:04:43,205 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 1100, loss[loss=0.09836, beats_loss=0.01322, ecapa_loss=0.0001233, whisper_loss=0.08391, over 23705.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01052, ecapa_loss=0.000144, whisper_loss=0.08851, over 3848161.67 frames. ], batch size: 90, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:04:46,131 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3587320.0, ans=0.125 2024-08-18 00:05:05,585 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.27 vs. limit=15.0 2024-08-18 00:05:25,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3587620.0, ans=0.0 2024-08-18 00:05:29,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3587620.0, ans=0.125 2024-08-18 00:05:44,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=3587720.0, ans=22.5 2024-08-18 00:05:51,947 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 1150, loss[loss=0.1161, beats_loss=0.00724, ecapa_loss=0.0001274, whisper_loss=0.1076, over 15912.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01045, ecapa_loss=0.0001437, whisper_loss=0.08917, over 3821354.34 frames. ], batch size: 59, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:05:58,910 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 20 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-18 00:06:00,143 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-18 00:06:05,700 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.45 vs. limit=12.0 2024-08-18 00:06:08,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3587920.0, ans=0.0 2024-08-18 00:06:09,708 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 36 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-18 00:06:12,109 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.421e+01 2.640e+01 3.029e+01 4.672e+01, threshold=5.280e+01, percent-clipped=0.0 2024-08-18 00:06:24,045 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 38 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-18 00:06:32,350 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 28 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-18 00:06:35,724 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.80 vs. limit=22.5 2024-08-18 00:06:47,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3588220.0, ans=0.125 2024-08-18 00:06:56,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3588220.0, ans=0.125 2024-08-18 00:07:00,617 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 1200, loss[loss=0.1003, beats_loss=0.01028, ecapa_loss=0.0001224, whisper_loss=0.08882, over 20789.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01044, ecapa_loss=0.0001434, whisper_loss=0.09001, over 3854021.58 frames. ], batch size: 78, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:07:02,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3588320.0, ans=0.125 2024-08-18 00:07:08,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3588320.0, ans=0.125 2024-08-18 00:07:11,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3588320.0, ans=0.125 2024-08-18 00:07:26,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=3588520.0, ans=0.05 2024-08-18 00:07:49,580 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-18 00:07:53,860 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-18 00:07:58,814 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 23 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-18 00:08:04,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3588720.0, ans=0.2 2024-08-18 00:08:10,063 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 1250, loss[loss=0.1126, beats_loss=0.01212, ecapa_loss=0.0001211, whisper_loss=0.09923, over 19686.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01049, ecapa_loss=0.0001419, whisper_loss=0.09005, over 3853503.52 frames. ], batch size: 78, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:08:30,306 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.695e+01 2.265e+01 2.496e+01 2.765e+01 1.417e+02, threshold=4.991e+01, percent-clipped=1.0 2024-08-18 00:08:45,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3589020.0, ans=0.125 2024-08-18 00:08:55,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3589120.0, ans=0.0 2024-08-18 00:08:56,380 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 19 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-18 00:08:58,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3589120.0, ans=0.015 2024-08-18 00:09:06,293 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-18 00:09:08,648 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-18 00:09:17,043 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 1300, loss[loss=0.09301, beats_loss=0.009597, ecapa_loss=0.0001438, whisper_loss=0.08197, over 16369.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0105, ecapa_loss=0.0001431, whisper_loss=0.08959, over 3857088.82 frames. ], batch size: 61, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:09:41,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3589420.0, ans=0.07 2024-08-18 00:09:56,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3589620.0, ans=0.125 2024-08-18 00:10:09,426 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-18 00:10:16,444 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.13 vs. limit=10.0 2024-08-18 00:10:24,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3589720.0, ans=0.09899494936611666 2024-08-18 00:10:26,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3589720.0, ans=0.125 2024-08-18 00:10:30,872 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 1350, loss[loss=0.09578, beats_loss=0.01052, ecapa_loss=0.0001321, whisper_loss=0.08394, over 16326.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01054, ecapa_loss=0.0001426, whisper_loss=0.08949, over 3834593.27 frames. ], batch size: 62, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:10:37,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3589820.0, ans=0.1 2024-08-18 00:10:40,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3589820.0, ans=0.1 2024-08-18 00:10:43,508 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=18.17 vs. limit=15.0 2024-08-18 00:10:50,433 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.937e+01 2.253e+01 2.540e+01 2.866e+01 1.653e+02, threshold=5.079e+01, percent-clipped=1.0 2024-08-18 00:10:59,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3590020.0, ans=0.125 2024-08-18 00:11:02,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3590020.0, ans=0.125 2024-08-18 00:11:05,844 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 18 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-18 00:11:17,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3590120.0, ans=0.125 2024-08-18 00:11:21,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3590120.0, ans=0.2 2024-08-18 00:11:40,126 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 1400, loss[loss=0.09213, beats_loss=0.009826, ecapa_loss=0.0001152, whisper_loss=0.08115, over 19567.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01054, ecapa_loss=0.0001427, whisper_loss=0.08882, over 3816484.26 frames. ], batch size: 74, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:11:50,972 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.75 vs. limit=12.0 2024-08-18 00:11:52,260 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.23 vs. limit=6.0 2024-08-18 00:11:54,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3590420.0, ans=0.2 2024-08-18 00:11:57,359 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-18 00:11:57,648 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 00:11:59,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3590420.0, ans=0.1 2024-08-18 00:12:03,346 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 23 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-18 00:12:13,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3590520.0, ans=0.0 2024-08-18 00:12:17,597 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 19 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-18 00:12:25,339 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 00:12:26,514 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=5.173e-03 2024-08-18 00:12:51,445 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 1450, loss[loss=0.09283, beats_loss=0.01034, ecapa_loss=0.0001531, whisper_loss=0.08096, over 19013.00 frames. ], tot_loss[loss=0.09999, beats_loss=0.01049, ecapa_loss=0.0001433, whisper_loss=0.08807, over 3793389.11 frames. ], batch size: 75, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:13:11,918 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3590820.0, ans=0.125 2024-08-18 00:13:23,796 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.734e+01 2.250e+01 2.508e+01 2.687e+01 4.238e+01, threshold=5.016e+01, percent-clipped=0.0 2024-08-18 00:13:27,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3590920.0, ans=0.125 2024-08-18 00:14:03,066 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 15 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-18 00:14:12,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3591220.0, ans=0.2 2024-08-18 00:14:35,486 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 1500, loss[loss=0.08299, beats_loss=0.012, ecapa_loss=0.0001462, whisper_loss=0.06953, over 14780.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01053, ecapa_loss=0.0001418, whisper_loss=0.08816, over 3772013.10 frames. ], batch size: 60, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:14:43,790 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 20 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-18 00:14:58,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3591420.0, ans=0.0 2024-08-18 00:15:03,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3591420.0, ans=0.125 2024-08-18 00:15:14,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3591420.0, ans=0.125 2024-08-18 00:15:17,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3591420.0, ans=0.1 2024-08-18 00:15:45,264 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 17 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-18 00:16:17,628 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.18 vs. limit=22.5 2024-08-18 00:16:19,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3591720.0, ans=0.125 2024-08-18 00:16:27,734 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 1550, loss[loss=0.1068, beats_loss=0.009375, ecapa_loss=0.0001604, whisper_loss=0.09578, over 14461.00 frames. ], tot_loss[loss=0.09989, beats_loss=0.01058, ecapa_loss=0.0001416, whisper_loss=0.08789, over 3779530.39 frames. ], batch size: 57, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:16:55,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3591920.0, ans=0.125 2024-08-18 00:17:02,555 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.260e+01 2.619e+01 2.888e+01 4.592e+01, threshold=5.239e+01, percent-clipped=0.0 2024-08-18 00:17:07,136 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.012e+00 2024-08-18 00:17:10,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3592020.0, ans=0.04949747468305833 2024-08-18 00:17:24,095 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 30 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-18 00:17:57,139 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 31 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-18 00:18:10,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3592220.0, ans=0.0 2024-08-18 00:18:16,539 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 1600, loss[loss=0.09337, beats_loss=0.0128, ecapa_loss=9.51e-05, whisper_loss=0.07962, over 18765.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01052, ecapa_loss=0.0001418, whisper_loss=0.08852, over 3795052.10 frames. ], batch size: 73, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:18:33,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3592320.0, ans=0.035 2024-08-18 00:18:45,561 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.18 vs. limit=12.0 2024-08-18 00:18:50,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3592420.0, ans=0.125 2024-08-18 00:18:55,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3592520.0, ans=0.0 2024-08-18 00:19:30,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3592720.0, ans=0.125 2024-08-18 00:19:38,056 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.57 vs. limit=10.0 2024-08-18 00:19:38,403 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 1650, loss[loss=0.1016, beats_loss=0.0088, ecapa_loss=0.0001469, whisper_loss=0.09132, over 18912.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01044, ecapa_loss=0.0001429, whisper_loss=0.08876, over 3794042.54 frames. ], batch size: 74, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:19:45,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3592820.0, ans=0.125 2024-08-18 00:19:58,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3592920.0, ans=0.2 2024-08-18 00:19:59,577 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.229e+01 2.477e+01 2.865e+01 9.039e+01, threshold=4.953e+01, percent-clipped=1.0 2024-08-18 00:20:00,482 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.79 vs. limit=12.0 2024-08-18 00:20:02,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3592920.0, ans=0.1 2024-08-18 00:20:10,625 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-18 00:20:20,683 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-18 00:20:31,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3593120.0, ans=0.1 2024-08-18 00:20:34,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3593220.0, ans=0.04949747468305833 2024-08-18 00:20:37,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3593220.0, ans=0.125 2024-08-18 00:20:37,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3593220.0, ans=0.1 2024-08-18 00:20:42,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3593220.0, ans=0.125 2024-08-18 00:20:43,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3593220.0, ans=0.125 2024-08-18 00:20:47,737 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 1700, loss[loss=0.1224, beats_loss=0.009817, ecapa_loss=0.0001238, whisper_loss=0.1113, over 23749.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01049, ecapa_loss=0.0001428, whisper_loss=0.08875, over 3790959.98 frames. ], batch size: 90, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:20:49,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3593320.0, ans=0.95 2024-08-18 00:20:53,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3593320.0, ans=0.125 2024-08-18 00:20:55,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3593320.0, ans=0.0 2024-08-18 00:20:57,532 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 32 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-18 00:20:59,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3593320.0, ans=0.0 2024-08-18 00:21:03,180 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.56 vs. limit=15.0 2024-08-18 00:21:21,833 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.74 vs. limit=10.0 2024-08-18 00:21:30,415 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 29 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-18 00:21:30,900 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.63 vs. limit=15.0 2024-08-18 00:21:34,536 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-18 00:21:51,076 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-18 00:21:54,626 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 1750, loss[loss=0.1321, beats_loss=0.006905, ecapa_loss=0.0001683, whisper_loss=0.1235, over 23217.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01047, ecapa_loss=0.0001435, whisper_loss=0.08885, over 3807466.48 frames. ], batch size: 91, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:21:56,225 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-18 00:22:03,264 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 22 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-18 00:22:15,133 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.504e+01 2.766e+01 3.067e+01 1.559e+02, threshold=5.531e+01, percent-clipped=2.0 2024-08-18 00:22:20,999 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 16 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-18 00:22:32,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3594020.0, ans=0.125 2024-08-18 00:22:36,291 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 19 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-18 00:22:53,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3594220.0, ans=0.1 2024-08-18 00:23:02,372 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 1800, loss[loss=0.1021, beats_loss=0.009845, ecapa_loss=0.0001449, whisper_loss=0.09082, over 23323.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.0105, ecapa_loss=0.0001422, whisper_loss=0.08893, over 3848832.29 frames. ], batch size: 91, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:23:12,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3594320.0, ans=0.2 2024-08-18 00:23:12,427 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.21 vs. limit=12.0 2024-08-18 00:23:16,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3594420.0, ans=0.125 2024-08-18 00:23:24,207 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-18 00:23:27,176 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-18 00:23:29,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3594520.0, ans=0.1 2024-08-18 00:23:31,797 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 18 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-18 00:23:39,766 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-18 00:23:45,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3594620.0, ans=0.1 2024-08-18 00:23:56,192 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 19 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-18 00:24:01,317 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 27 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-18 00:24:09,353 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 1850, loss[loss=0.07507, beats_loss=0.01463, ecapa_loss=9.822e-05, whisper_loss=0.05946, over 18627.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01049, ecapa_loss=0.0001428, whisper_loss=0.0886, over 3830034.42 frames. ], batch size: 73, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:24:14,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3594820.0, ans=10.0 2024-08-18 00:24:29,356 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.939e+01 2.383e+01 2.607e+01 2.993e+01 6.397e+01, threshold=5.213e+01, percent-clipped=1.0 2024-08-18 00:24:36,102 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 14 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-18 00:24:37,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3595020.0, ans=0.125 2024-08-18 00:24:55,849 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-18 00:25:02,710 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-18 00:25:03,392 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.74 vs. limit=12.0 2024-08-18 00:25:08,519 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3595220.0, ans=0.0 2024-08-18 00:25:12,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3595220.0, ans=0.0 2024-08-18 00:25:14,956 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-18 00:25:17,660 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 1900, loss[loss=0.1218, beats_loss=0.00891, ecapa_loss=0.0001731, whisper_loss=0.1111, over 19187.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01048, ecapa_loss=0.0001441, whisper_loss=0.08852, over 3830905.58 frames. ], batch size: 78, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:25:17,821 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 17 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-18 00:25:37,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3595420.0, ans=0.125 2024-08-18 00:25:38,522 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.29 vs. limit=15.0 2024-08-18 00:25:49,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3595520.0, ans=0.0 2024-08-18 00:25:55,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3595520.0, ans=0.125 2024-08-18 00:25:59,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3595620.0, ans=0.2 2024-08-18 00:26:03,070 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.05 vs. limit=15.0 2024-08-18 00:26:03,982 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 18 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-18 00:26:04,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3595620.0, ans=0.2 2024-08-18 00:26:22,673 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 25 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-18 00:26:23,766 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 1950, loss[loss=0.105, beats_loss=0.01089, ecapa_loss=0.0001047, whisper_loss=0.09302, over 19082.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01052, ecapa_loss=0.0001436, whisper_loss=0.08822, over 3811032.20 frames. ], batch size: 72, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:26:36,067 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-18 00:26:44,849 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.310e+01 2.471e+01 2.786e+01 3.161e+02, threshold=4.942e+01, percent-clipped=2.0 2024-08-18 00:26:47,557 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-18 00:26:58,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3596020.0, ans=0.04949747468305833 2024-08-18 00:27:24,500 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.58 vs. limit=15.0 2024-08-18 00:27:27,981 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 15 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-18 00:27:32,030 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 2000, loss[loss=0.09191, beats_loss=0.01266, ecapa_loss=0.0001251, whisper_loss=0.078, over 17420.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01051, ecapa_loss=0.0001437, whisper_loss=0.08876, over 3830097.66 frames. ], batch size: 70, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:27:42,722 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-18 00:28:00,760 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.58 vs. limit=15.0 2024-08-18 00:28:08,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3596520.0, ans=0.125 2024-08-18 00:28:20,235 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-18 00:28:21,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3596620.0, ans=0.0 2024-08-18 00:28:24,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3596620.0, ans=0.125 2024-08-18 00:28:32,772 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 20 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-18 00:28:33,475 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.24 vs. limit=15.0 2024-08-18 00:28:34,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3596720.0, ans=0.125 2024-08-18 00:28:40,591 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 2050, loss[loss=0.1004, beats_loss=0.009902, ecapa_loss=0.0001833, whisper_loss=0.08865, over 20172.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01052, ecapa_loss=0.0001436, whisper_loss=0.0888, over 3836873.25 frames. ], batch size: 84, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:28:45,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3596820.0, ans=0.125 2024-08-18 00:29:00,281 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.895e+01 2.408e+01 2.657e+01 2.912e+01 3.324e+02, threshold=5.315e+01, percent-clipped=5.0 2024-08-18 00:29:10,237 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.28 vs. limit=6.0 2024-08-18 00:29:20,021 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-18 00:29:40,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3597220.0, ans=0.1 2024-08-18 00:29:41,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3597220.0, ans=0.2 2024-08-18 00:29:44,263 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-18 00:29:46,863 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 2100, loss[loss=0.092, beats_loss=0.009349, ecapa_loss=0.0001476, whisper_loss=0.08117, over 22211.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01064, ecapa_loss=0.0001426, whisper_loss=0.0884, over 3854987.49 frames. ], batch size: 91, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:29:52,795 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.57 vs. limit=12.0 2024-08-18 00:30:00,015 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.16 vs. limit=15.0 2024-08-18 00:30:08,051 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3597420.0, ans=0.1 2024-08-18 00:30:11,669 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 21 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-18 00:30:14,846 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.74 vs. limit=15.0 2024-08-18 00:30:24,792 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.04 vs. limit=15.0 2024-08-18 00:30:50,988 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 2150, loss[loss=0.09607, beats_loss=0.01145, ecapa_loss=0.0001442, whisper_loss=0.08318, over 21723.00 frames. ], tot_loss[loss=0.101, beats_loss=0.0106, ecapa_loss=0.0001423, whisper_loss=0.08897, over 3860726.84 frames. ], batch size: 91, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:30:51,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3597820.0, ans=0.0 2024-08-18 00:30:52,385 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 11 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-18 00:30:53,592 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 34 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-18 00:30:57,314 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-18 00:30:58,336 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 27 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-18 00:31:03,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3597920.0, ans=0.125 2024-08-18 00:31:09,705 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.640e+01 2.312e+01 2.569e+01 2.886e+01 3.776e+01, threshold=5.138e+01, percent-clipped=0.0 2024-08-18 00:31:09,946 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-18 00:31:15,152 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 19 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-18 00:31:15,672 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.16 vs. limit=15.0 2024-08-18 00:31:20,218 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 00:31:27,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3598120.0, ans=0.125 2024-08-18 00:31:30,234 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 32 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-18 00:31:33,867 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-18 00:31:40,053 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-18 00:31:46,380 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-18 00:31:53,920 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 2200, loss[loss=0.1129, beats_loss=0.01052, ecapa_loss=0.0001399, whisper_loss=0.1009, over 21988.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01055, ecapa_loss=0.0001423, whisper_loss=0.09009, over 3847344.56 frames. ], batch size: 85, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:32:01,433 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 19 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-18 00:32:30,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3598620.0, ans=0.1 2024-08-18 00:32:37,689 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-18 00:32:46,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3598720.0, ans=0.0 2024-08-18 00:32:56,459 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 2250, loss[loss=0.08712, beats_loss=0.01233, ecapa_loss=0.0001495, whisper_loss=0.07329, over 22724.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01066, ecapa_loss=0.0001428, whisper_loss=0.09017, over 3876428.02 frames. ], batch size: 93, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:33:05,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3598820.0, ans=0.125 2024-08-18 00:33:14,783 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.985e+01 2.384e+01 2.636e+01 2.985e+01 4.342e+01, threshold=5.271e+01, percent-clipped=0.0 2024-08-18 00:33:58,316 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 2300, loss[loss=0.09266, beats_loss=0.01091, ecapa_loss=0.0001484, whisper_loss=0.08026, over 18148.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01067, ecapa_loss=0.0001432, whisper_loss=0.09013, over 3871713.92 frames. ], batch size: 73, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:33:58,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3599320.0, ans=0.2 2024-08-18 00:34:20,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3599420.0, ans=0.125 2024-08-18 00:34:21,676 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 33 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-18 00:34:33,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3599520.0, ans=0.07 2024-08-18 00:34:36,357 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 34 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-18 00:34:42,229 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2024-08-18 00:34:43,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=3599620.0, ans=0.1 2024-08-18 00:34:44,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3599620.0, ans=0.125 2024-08-18 00:34:46,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3599620.0, ans=0.1 2024-08-18 00:34:47,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3599620.0, ans=0.0 2024-08-18 00:34:49,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3599720.0, ans=0.125 2024-08-18 00:34:51,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3599720.0, ans=0.125 2024-08-18 00:35:00,030 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-18 00:35:02,201 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 2350, loss[loss=0.0983, beats_loss=0.01032, ecapa_loss=0.0001274, whisper_loss=0.08671, over 18268.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01056, ecapa_loss=0.0001445, whisper_loss=0.09033, over 3859526.20 frames. ], batch size: 72, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:35:11,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3599820.0, ans=0.2 2024-08-18 00:35:21,239 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.661e+01 2.275e+01 2.503e+01 2.910e+01 3.749e+01, threshold=5.007e+01, percent-clipped=0.0 2024-08-18 00:35:23,902 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-360000.pt 2024-08-18 00:35:27,594 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 30 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-18 00:35:53,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3600120.0, ans=0.125 2024-08-18 00:35:53,721 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.59 vs. limit=22.5 2024-08-18 00:35:54,202 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-18 00:36:07,434 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.04 vs. limit=15.0 2024-08-18 00:36:07,929 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 2400, loss[loss=0.0861, beats_loss=0.01048, ecapa_loss=0.000151, whisper_loss=0.07411, over 17580.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01059, ecapa_loss=0.0001439, whisper_loss=0.09043, over 3839101.44 frames. ], batch size: 71, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:36:21,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3600420.0, ans=0.1 2024-08-18 00:36:26,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3600420.0, ans=0.125 2024-08-18 00:36:39,495 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 31 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-18 00:36:42,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3600520.0, ans=0.1 2024-08-18 00:36:50,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3600620.0, ans=0.125 2024-08-18 00:36:53,061 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 24 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-18 00:37:03,227 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 15 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-18 00:37:10,452 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 2450, loss[loss=0.09704, beats_loss=0.01114, ecapa_loss=0.0001515, whisper_loss=0.08438, over 22180.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01063, ecapa_loss=0.0001438, whisper_loss=0.08955, over 3839163.83 frames. ], batch size: 93, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:37:28,846 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.721e+01 2.316e+01 2.565e+01 2.962e+01 6.294e+01, threshold=5.130e+01, percent-clipped=1.0 2024-08-18 00:37:31,529 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 25 from LS+wenet, 8 from Vox, 26 fro AS 2024-08-18 00:37:40,536 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.93 vs. limit=15.0 2024-08-18 00:37:50,818 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 16 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-18 00:37:51,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3601120.0, ans=0.125 2024-08-18 00:38:04,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3601220.0, ans=0.1 2024-08-18 00:38:09,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3601220.0, ans=0.2 2024-08-18 00:38:13,059 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 2500, loss[loss=0.09635, beats_loss=0.01104, ecapa_loss=0.0001421, whisper_loss=0.08388, over 15416.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01057, ecapa_loss=0.0001441, whisper_loss=0.08995, over 3826445.42 frames. ], batch size: 64, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:38:14,637 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 20 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-18 00:38:28,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3601420.0, ans=0.125 2024-08-18 00:38:33,148 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-18 00:38:33,820 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.55 vs. limit=10.0 2024-08-18 00:38:57,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3601620.0, ans=0.0 2024-08-18 00:39:03,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3601720.0, ans=0.0 2024-08-18 00:39:04,407 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-18 00:39:04,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3601720.0, ans=0.125 2024-08-18 00:39:06,793 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-18 00:39:07,318 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.97 vs. limit=15.0 2024-08-18 00:39:09,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3601720.0, ans=0.0 2024-08-18 00:39:09,644 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.34 vs. limit=15.0 2024-08-18 00:39:15,431 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 2550, loss[loss=0.1282, beats_loss=0.008787, ecapa_loss=0.000163, whisper_loss=0.1178, over 17322.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01058, ecapa_loss=0.0001443, whisper_loss=0.09002, over 3853351.36 frames. ], batch size: 67, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:39:19,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3601820.0, ans=0.125 2024-08-18 00:39:19,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3601820.0, ans=0.0 2024-08-18 00:39:23,094 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 30 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-18 00:39:30,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3601920.0, ans=0.0 2024-08-18 00:39:34,103 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.759e+01 2.261e+01 2.549e+01 2.807e+01 3.668e+01, threshold=5.099e+01, percent-clipped=0.0 2024-08-18 00:39:49,465 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-18 00:39:52,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3602120.0, ans=0.0 2024-08-18 00:40:01,059 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-18 00:40:04,649 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 17 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-18 00:40:05,889 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 32 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-18 00:40:06,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3602220.0, ans=0.125 2024-08-18 00:40:18,409 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 2600, loss[loss=0.1253, beats_loss=0.008106, ecapa_loss=0.0001487, whisper_loss=0.1157, over 23629.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01052, ecapa_loss=0.0001453, whisper_loss=0.09075, over 3879951.05 frames. ], batch size: 90, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:40:22,353 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 36 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-18 00:40:27,356 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 17 from LS+wenet, 8 from Vox, 32 fro AS 2024-08-18 00:40:38,351 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 27 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-18 00:40:50,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3602520.0, ans=0.0 2024-08-18 00:41:15,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3602720.0, ans=0.0 2024-08-18 00:41:16,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3602720.0, ans=0.1 2024-08-18 00:41:21,184 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 2650, loss[loss=0.1051, beats_loss=0.00944, ecapa_loss=0.0001277, whisper_loss=0.0944, over 15304.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01046, ecapa_loss=0.0001452, whisper_loss=0.09078, over 3855585.68 frames. ], batch size: 56, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:41:27,526 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-18 00:41:39,777 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.278e+01 2.614e+01 3.095e+01 1.399e+02, threshold=5.228e+01, percent-clipped=3.0 2024-08-18 00:41:50,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3603020.0, ans=0.0 2024-08-18 00:42:05,111 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-18 00:42:15,162 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 00:42:20,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3603220.0, ans=0.0 2024-08-18 00:42:23,492 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 2700, loss[loss=0.1146, beats_loss=0.0116, ecapa_loss=0.000123, whisper_loss=0.1018, over 23051.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0104, ecapa_loss=0.0001458, whisper_loss=0.0918, over 3882584.68 frames. ], batch size: 88, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:42:30,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3603320.0, ans=0.125 2024-08-18 00:42:34,178 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3603320.0, ans=0.125 2024-08-18 00:43:05,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3603620.0, ans=0.0 2024-08-18 00:43:07,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3603620.0, ans=0.125 2024-08-18 00:43:25,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3603820.0, ans=0.0 2024-08-18 00:43:26,132 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 2750, loss[loss=0.108, beats_loss=0.0106, ecapa_loss=0.0001735, whisper_loss=0.09566, over 12714.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0105, ecapa_loss=0.000145, whisper_loss=0.0909, over 3850961.22 frames. ], batch size: 53, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:43:30,034 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-18 00:43:30,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3603820.0, ans=0.125 2024-08-18 00:43:44,890 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.850e+01 2.315e+01 2.507e+01 2.815e+01 4.002e+01, threshold=5.013e+01, percent-clipped=0.0 2024-08-18 00:43:47,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3603920.0, ans=0.125 2024-08-18 00:44:04,207 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 30 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-18 00:44:05,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3604120.0, ans=0.1 2024-08-18 00:44:13,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3604120.0, ans=0.07 2024-08-18 00:44:14,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3604120.0, ans=0.2 2024-08-18 00:44:15,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3604220.0, ans=0.125 2024-08-18 00:44:18,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3604220.0, ans=0.0 2024-08-18 00:44:19,769 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-18 00:44:28,773 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 15 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-18 00:44:29,959 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 2800, loss[loss=0.09714, beats_loss=0.01169, ecapa_loss=0.0001322, whisper_loss=0.08413, over 13491.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01049, ecapa_loss=0.0001446, whisper_loss=0.09126, over 3861329.54 frames. ], batch size: 53, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:44:40,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3604320.0, ans=0.125 2024-08-18 00:44:50,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3604420.0, ans=0.125 2024-08-18 00:44:53,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3604420.0, ans=0.1 2024-08-18 00:44:57,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3604520.0, ans=0.0 2024-08-18 00:45:01,799 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 37 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-18 00:45:04,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3604520.0, ans=0.125 2024-08-18 00:45:08,591 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 22 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-18 00:45:14,678 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 35 from LS+wenet, 14 from Vox, 43 fro AS 2024-08-18 00:45:34,909 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 2850, loss[loss=0.09258, beats_loss=0.01136, ecapa_loss=0.0001145, whisper_loss=0.08008, over 16123.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01058, ecapa_loss=0.0001447, whisper_loss=0.09087, over 3832627.59 frames. ], batch size: 62, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:45:38,736 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-18 00:45:53,444 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-18 00:45:54,727 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.370e+01 2.583e+01 2.901e+01 2.742e+02, threshold=5.166e+01, percent-clipped=3.0 2024-08-18 00:45:58,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3604920.0, ans=0.125 2024-08-18 00:46:06,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3605020.0, ans=0.5 2024-08-18 00:46:31,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3605220.0, ans=0.0 2024-08-18 00:46:32,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3605220.0, ans=0.1 2024-08-18 00:46:34,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3605220.0, ans=0.125 2024-08-18 00:46:40,325 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 2900, loss[loss=0.1193, beats_loss=0.007506, ecapa_loss=0.0001803, whisper_loss=0.11, over 20936.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01048, ecapa_loss=0.000147, whisper_loss=0.09151, over 3831465.09 frames. ], batch size: 86, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:46:49,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3605320.0, ans=0.125 2024-08-18 00:46:49,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3605320.0, ans=0.125 2024-08-18 00:46:54,807 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 19 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-18 00:46:57,469 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 30 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-18 00:46:58,699 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 13 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-18 00:47:17,505 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-18 00:47:20,940 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 33 from LS+wenet, 25 from Vox, 19 fro AS 2024-08-18 00:47:22,665 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-18 00:47:37,816 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-18 00:47:43,586 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.56 vs. limit=15.0 2024-08-18 00:47:45,557 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 2950, loss[loss=0.08763, beats_loss=0.01072, ecapa_loss=0.0001271, whisper_loss=0.07564, over 21649.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01034, ecapa_loss=0.0001491, whisper_loss=0.09183, over 3854234.14 frames. ], batch size: 86, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:47:51,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3605820.0, ans=0.0 2024-08-18 00:47:54,332 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 30 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-18 00:47:58,633 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.66 vs. limit=15.0 2024-08-18 00:48:04,354 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.015e+01 2.406e+01 2.598e+01 2.862e+01 3.819e+01, threshold=5.197e+01, percent-clipped=0.0 2024-08-18 00:48:09,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3606020.0, ans=0.0 2024-08-18 00:48:31,596 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-18 00:48:33,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3606120.0, ans=0.0 2024-08-18 00:48:39,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3606220.0, ans=0.1 2024-08-18 00:48:47,608 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 3000, loss[loss=0.1029, beats_loss=0.009638, ecapa_loss=0.0001418, whisper_loss=0.09184, over 19637.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01045, ecapa_loss=0.0001479, whisper_loss=0.09117, over 3871110.86 frames. ], batch size: 77, lr: 2.46e-03, grad_scale: 1.152921504606847e+18 2024-08-18 00:48:47,610 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-18 00:49:20,068 INFO [train_multi_KD3.py:1149] (0/4) Epoch 25, validation on ASR_libri: loss=0.2529, beats_loss=0, ecapa_loss=0.0005235, whisper_loss=0.2477, over 922467.00 frames. 2024-08-18 00:49:35,446 INFO [train_multi_KD3.py:1149] (0/4) Epoch 25, validation on SV_voxceleb1: loss=0.004164, beats_loss=0, ecapa_loss=0.0004164, whisper_loss=0, over 939242.00 frames. 2024-08-18 00:51:10,578 INFO [train_multi_KD3.py:1149] (0/4) Epoch 25, validation on AT_audioset: loss=0.02327, beats_loss=0.02327, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 00:51:10,582 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32389MB 2024-08-18 00:51:20,024 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.05 vs. limit=22.5 2024-08-18 00:51:20,609 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 38 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-18 00:51:26,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3606420.0, ans=0.1 2024-08-18 00:51:29,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3606420.0, ans=0.0 2024-08-18 00:51:41,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3606520.0, ans=0.125 2024-08-18 00:51:47,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3606620.0, ans=0.0 2024-08-18 00:51:50,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3606620.0, ans=0.125 2024-08-18 00:51:55,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3606620.0, ans=0.0 2024-08-18 00:52:14,897 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 3050, loss[loss=0.09017, beats_loss=0.009969, ecapa_loss=0.0001504, whisper_loss=0.0787, over 20697.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01046, ecapa_loss=0.0001478, whisper_loss=0.09127, over 3879171.61 frames. ], batch size: 86, lr: 2.46e-03, grad_scale: 1.152921504606847e+18 2024-08-18 00:52:15,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3606820.0, ans=0.125 2024-08-18 00:52:16,096 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-18 00:52:17,378 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 25 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-18 00:52:33,630 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.357e+01 2.578e+01 2.867e+01 5.974e+01, threshold=5.156e+01, percent-clipped=1.0 2024-08-18 00:52:51,398 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 30 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-18 00:52:56,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3607120.0, ans=0.1 2024-08-18 00:53:06,696 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 21 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-18 00:53:16,319 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.75 vs. limit=15.0 2024-08-18 00:53:18,088 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 3100, loss[loss=0.1124, beats_loss=0.009953, ecapa_loss=0.0001117, whisper_loss=0.1013, over 18207.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01048, ecapa_loss=0.000148, whisper_loss=0.09151, over 3923577.97 frames. ], batch size: 67, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:53:28,133 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 32 from Vox, 27 fro AS 2024-08-18 00:53:45,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3607520.0, ans=0.0 2024-08-18 00:53:54,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3607620.0, ans=0.125 2024-08-18 00:54:06,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3607620.0, ans=0.2 2024-08-18 00:54:08,197 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-18 00:54:19,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3607820.0, ans=0.125 2024-08-18 00:54:20,764 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 3150, loss[loss=0.0853, beats_loss=0.01388, ecapa_loss=0.0001173, whisper_loss=0.07025, over 22720.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01057, ecapa_loss=0.0001475, whisper_loss=0.09135, over 3914664.56 frames. ], batch size: 93, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:54:27,430 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 36 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-18 00:54:41,088 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.301e+01 2.605e+01 2.918e+01 7.177e+01, threshold=5.210e+01, percent-clipped=2.0 2024-08-18 00:54:41,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3607920.0, ans=0.0 2024-08-18 00:54:54,132 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-18 00:54:56,954 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 32 from LS+wenet, 11 from Vox, 43 fro AS 2024-08-18 00:55:05,885 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 16 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-18 00:55:17,691 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.25 vs. limit=15.0 2024-08-18 00:55:19,729 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-18 00:55:24,618 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 3200, loss[loss=0.1009, beats_loss=0.01038, ecapa_loss=0.0001402, whisper_loss=0.08912, over 22469.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01056, ecapa_loss=0.0001474, whisper_loss=0.09104, over 3917549.80 frames. ], batch size: 90, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:55:32,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3608320.0, ans=0.07 2024-08-18 00:55:42,601 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 18 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-18 00:56:05,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3608620.0, ans=0.125 2024-08-18 00:56:22,378 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=7.666e+00 2024-08-18 00:56:28,448 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 3250, loss[loss=0.1061, beats_loss=0.01115, ecapa_loss=0.0001478, whisper_loss=0.09343, over 19935.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01056, ecapa_loss=0.0001474, whisper_loss=0.09106, over 3911619.26 frames. ], batch size: 78, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:56:38,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3608820.0, ans=0.0 2024-08-18 00:56:38,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3608820.0, ans=0.125 2024-08-18 00:56:41,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3608920.0, ans=0.2 2024-08-18 00:56:48,231 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.222e+01 2.522e+01 2.821e+01 3.582e+01, threshold=5.044e+01, percent-clipped=0.0 2024-08-18 00:57:01,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3609020.0, ans=0.0 2024-08-18 00:57:01,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3609020.0, ans=0.5 2024-08-18 00:57:20,752 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2024-08-18 00:57:27,225 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.98 vs. limit=15.0 2024-08-18 00:57:30,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3609320.0, ans=0.0 2024-08-18 00:57:30,630 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.82 vs. limit=10.0 2024-08-18 00:57:31,057 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 3300, loss[loss=0.09929, beats_loss=0.01077, ecapa_loss=0.0001397, whisper_loss=0.08713, over 22567.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01064, ecapa_loss=0.0001469, whisper_loss=0.0905, over 3894291.93 frames. ], batch size: 90, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:57:31,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3609320.0, ans=0.035 2024-08-18 00:57:34,296 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.61 vs. limit=12.0 2024-08-18 00:57:41,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3609320.0, ans=0.1 2024-08-18 00:57:46,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3609420.0, ans=0.1 2024-08-18 00:58:33,277 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 3350, loss[loss=0.07052, beats_loss=0.01328, ecapa_loss=0.0001187, whisper_loss=0.05606, over 17850.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01063, ecapa_loss=0.0001464, whisper_loss=0.09052, over 3905681.62 frames. ], batch size: 73, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:58:33,476 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 26 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-18 00:58:35,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3609820.0, ans=0.125 2024-08-18 00:58:38,411 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-18 00:58:53,215 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.316e+01 2.495e+01 2.843e+01 4.202e+01, threshold=4.990e+01, percent-clipped=0.0 2024-08-18 00:58:53,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3609920.0, ans=0.1 2024-08-18 00:59:04,697 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.97 vs. limit=15.0 2024-08-18 00:59:10,132 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-18 00:59:16,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3610120.0, ans=0.0 2024-08-18 00:59:36,520 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 3400, loss[loss=0.08618, beats_loss=0.01328, ecapa_loss=0.0001312, whisper_loss=0.07159, over 15194.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01061, ecapa_loss=0.0001458, whisper_loss=0.09041, over 3920687.48 frames. ], batch size: 63, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:59:41,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3610320.0, ans=0.125 2024-08-18 00:59:44,975 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.32 vs. limit=6.0 2024-08-18 00:59:52,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=3610420.0, ans=10.0 2024-08-18 00:59:59,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3610420.0, ans=0.125 2024-08-18 01:00:17,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=3610620.0, ans=0.05 2024-08-18 01:00:28,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3610720.0, ans=0.1 2024-08-18 01:00:31,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3610720.0, ans=0.05 2024-08-18 01:00:39,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=3610820.0, ans=15.0 2024-08-18 01:00:39,938 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 3450, loss[loss=0.1092, beats_loss=0.0101, ecapa_loss=0.0001519, whisper_loss=0.09755, over 15902.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01056, ecapa_loss=0.0001458, whisper_loss=0.09051, over 3912941.46 frames. ], batch size: 62, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:00:44,138 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.392e-01 2024-08-18 01:00:50,059 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 28 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-18 01:01:00,144 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.883e+01 2.316e+01 2.519e+01 2.832e+01 7.230e+01, threshold=5.039e+01, percent-clipped=2.0 2024-08-18 01:01:03,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3610920.0, ans=0.125 2024-08-18 01:01:07,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3611020.0, ans=0.0 2024-08-18 01:01:12,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3611020.0, ans=0.125 2024-08-18 01:01:21,722 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-18 01:01:33,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3611220.0, ans=0.1 2024-08-18 01:01:34,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3611220.0, ans=0.0 2024-08-18 01:01:37,548 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.352e+01 2024-08-18 01:01:41,571 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.74 vs. limit=15.0 2024-08-18 01:01:43,650 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 3500, loss[loss=0.1016, beats_loss=0.01346, ecapa_loss=0.0001381, whisper_loss=0.08678, over 15655.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01052, ecapa_loss=0.0001463, whisper_loss=0.09064, over 3900934.70 frames. ], batch size: 64, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:01:53,134 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.00 vs. limit=15.0 2024-08-18 01:02:00,528 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 25 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-18 01:02:04,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3611420.0, ans=0.0 2024-08-18 01:02:19,274 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.02 vs. limit=5.0 2024-08-18 01:02:25,184 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3611620.0, ans=0.1 2024-08-18 01:02:36,651 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-18 01:02:41,507 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 23 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-18 01:02:48,014 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 3550, loss[loss=0.1042, beats_loss=0.01058, ecapa_loss=0.0001433, whisper_loss=0.09218, over 14152.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01057, ecapa_loss=0.0001472, whisper_loss=0.08965, over 3885119.74 frames. ], batch size: 55, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:02:48,268 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-18 01:02:53,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3611820.0, ans=0.0 2024-08-18 01:02:56,556 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-18 01:03:00,773 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.36 vs. limit=15.0 2024-08-18 01:03:01,467 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 14 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-18 01:03:09,428 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.733e+01 2.315e+01 2.541e+01 2.759e+01 3.730e+01, threshold=5.082e+01, percent-clipped=0.0 2024-08-18 01:03:19,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3612020.0, ans=0.1 2024-08-18 01:03:24,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3612020.0, ans=0.125 2024-08-18 01:03:27,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3612120.0, ans=0.125 2024-08-18 01:03:29,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3612120.0, ans=0.2 2024-08-18 01:03:32,342 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.42 vs. limit=15.0 2024-08-18 01:03:34,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3612120.0, ans=0.0 2024-08-18 01:03:43,631 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 17 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-18 01:03:43,943 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3612220.0, ans=0.1 2024-08-18 01:03:50,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3612220.0, ans=0.125 2024-08-18 01:03:54,333 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 3600, loss[loss=0.1057, beats_loss=0.009022, ecapa_loss=0.0001173, whisper_loss=0.09547, over 15764.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01047, ecapa_loss=0.0001477, whisper_loss=0.09023, over 3864270.98 frames. ], batch size: 58, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:03:57,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3612320.0, ans=0.05 2024-08-18 01:04:12,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3612420.0, ans=0.2 2024-08-18 01:04:26,541 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-18 01:04:26,806 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.712e+00 2024-08-18 01:04:37,462 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 26 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-18 01:05:02,471 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 3650, loss[loss=0.08989, beats_loss=0.01324, ecapa_loss=0.0001529, whisper_loss=0.07512, over 22103.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01049, ecapa_loss=0.000147, whisper_loss=0.08956, over 3835850.66 frames. ], batch size: 91, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:05:17,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3612920.0, ans=0.1 2024-08-18 01:05:24,126 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.387e+01 2.671e+01 3.011e+01 4.673e+01, threshold=5.342e+01, percent-clipped=0.0 2024-08-18 01:05:57,503 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 15 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-18 01:06:01,426 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 14 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-18 01:06:05,385 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 18 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-18 01:06:08,649 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.81 vs. limit=15.0 2024-08-18 01:06:10,508 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 3700, loss[loss=0.0922, beats_loss=0.01045, ecapa_loss=0.0001369, whisper_loss=0.08038, over 14909.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01059, ecapa_loss=0.0001459, whisper_loss=0.08918, over 3816008.84 frames. ], batch size: 57, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:06:17,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3613320.0, ans=0.0 2024-08-18 01:06:22,913 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-18 01:06:48,030 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2024-08-18 01:06:50,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3613620.0, ans=0.2 2024-08-18 01:06:54,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3613620.0, ans=0.125 2024-08-18 01:06:57,260 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 31 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-18 01:06:58,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3613620.0, ans=0.125 2024-08-18 01:07:13,256 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 20 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-18 01:07:14,659 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-18 01:07:18,717 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 3750, loss[loss=0.0976, beats_loss=0.009952, ecapa_loss=0.0002001, whisper_loss=0.08565, over 16613.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01063, ecapa_loss=0.0001457, whisper_loss=0.08924, over 3828201.88 frames. ], batch size: 71, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:07:40,268 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.244e+01 2.423e+01 2.765e+01 4.056e+01, threshold=4.845e+01, percent-clipped=0.0 2024-08-18 01:07:42,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3613920.0, ans=0.2 2024-08-18 01:07:45,106 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-18 01:07:51,258 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 23 from LS+wenet, 32 from Vox, 37 fro AS 2024-08-18 01:07:54,181 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 01:07:58,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3614120.0, ans=0.2 2024-08-18 01:08:09,305 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 33 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-18 01:08:11,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3614120.0, ans=0.0 2024-08-18 01:08:23,245 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 19 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-18 01:08:27,583 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 3800, loss[loss=0.115, beats_loss=0.01072, ecapa_loss=0.0001159, whisper_loss=0.1031, over 19263.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01068, ecapa_loss=0.0001466, whisper_loss=0.08985, over 3818370.58 frames. ], batch size: 71, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:08:31,372 WARNING [optim.py:496] (0/4) Scaling gradients by 0.09450822323560715, model_norm_threshold=48.454036712646484 2024-08-18 01:08:31,544 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.766e+04, grad_sumsq=3.766e+04, orig_rms_sq=1.000e+00 2024-08-18 01:08:34,425 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 26 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-18 01:08:35,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3614320.0, ans=0.1 2024-08-18 01:09:36,976 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 3850, loss[loss=0.1058, beats_loss=0.008922, ecapa_loss=0.0002159, whisper_loss=0.09472, over 17054.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01062, ecapa_loss=0.0001468, whisper_loss=0.09021, over 3819367.53 frames. ], batch size: 73, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:09:59,342 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.260e+01 2.585e+01 2.853e+01 5.127e+02, threshold=5.170e+01, percent-clipped=2.0 2024-08-18 01:10:09,069 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-18 01:10:13,432 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-18 01:10:36,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=3615220.0, ans=0.05 2024-08-18 01:10:46,187 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 3900, loss[loss=0.1133, beats_loss=0.008208, ecapa_loss=0.0001538, whisper_loss=0.1036, over 21781.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01057, ecapa_loss=0.0001469, whisper_loss=0.0908, over 3845398.34 frames. ], batch size: 87, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:10:48,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3615320.0, ans=0.0 2024-08-18 01:10:56,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3615320.0, ans=0.2 2024-08-18 01:11:31,512 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 24 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-18 01:11:43,547 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.46 vs. limit=15.0 2024-08-18 01:11:55,243 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 3950, loss[loss=0.08894, beats_loss=0.01184, ecapa_loss=0.0001703, whisper_loss=0.0754, over 16021.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01058, ecapa_loss=0.000148, whisper_loss=0.09053, over 3850532.81 frames. ], batch size: 68, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:12:00,644 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.85 vs. limit=15.0 2024-08-18 01:12:00,743 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.63 vs. limit=15.0 2024-08-18 01:12:01,286 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-18 01:12:03,838 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 20 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-18 01:12:15,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3615920.0, ans=0.125 2024-08-18 01:12:17,820 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.347e+01 2.555e+01 2.976e+01 4.736e+01, threshold=5.111e+01, percent-clipped=0.0 2024-08-18 01:12:18,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3615920.0, ans=0.125 2024-08-18 01:12:23,449 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 25 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-18 01:12:26,340 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-18 01:12:31,805 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 22 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-18 01:12:33,118 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 29 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-18 01:12:33,624 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.59 vs. limit=10.0 2024-08-18 01:13:04,525 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 4000, loss[loss=0.1141, beats_loss=0.009778, ecapa_loss=0.0001499, whisper_loss=0.1028, over 20240.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01055, ecapa_loss=0.0001481, whisper_loss=0.09052, over 3866855.73 frames. ], batch size: 77, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:13:06,537 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2024-08-18 01:13:06,651 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.14 vs. limit=12.0 2024-08-18 01:13:10,485 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.50 vs. limit=15.0 2024-08-18 01:13:19,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3616420.0, ans=0.5 2024-08-18 01:13:23,495 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.09 vs. limit=15.0 2024-08-18 01:13:26,730 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2024-08-18 01:13:30,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3616420.0, ans=0.2 2024-08-18 01:13:31,258 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.82 vs. limit=15.0 2024-08-18 01:13:34,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3616520.0, ans=0.2 2024-08-18 01:13:36,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3616520.0, ans=0.1 2024-08-18 01:13:52,157 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 20 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-18 01:13:56,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3616620.0, ans=0.1 2024-08-18 01:14:05,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3616720.0, ans=0.2 2024-08-18 01:14:13,792 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 4050, loss[loss=0.08952, beats_loss=0.01157, ecapa_loss=0.0001349, whisper_loss=0.0766, over 22169.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01041, ecapa_loss=0.000149, whisper_loss=0.09156, over 3872266.21 frames. ], batch size: 88, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:14:36,061 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.283e+01 2.581e+01 2.870e+01 1.380e+02, threshold=5.162e+01, percent-clipped=1.0 2024-08-18 01:14:36,795 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.14 vs. limit=22.5 2024-08-18 01:14:39,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3616920.0, ans=0.1 2024-08-18 01:14:43,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3617020.0, ans=0.0 2024-08-18 01:15:03,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3617120.0, ans=0.2 2024-08-18 01:15:09,554 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.94 vs. limit=15.0 2024-08-18 01:15:12,343 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.46 vs. limit=15.0 2024-08-18 01:15:23,100 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 4100, loss[loss=0.09668, beats_loss=0.01084, ecapa_loss=0.0001593, whisper_loss=0.08425, over 22387.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01038, ecapa_loss=0.0001489, whisper_loss=0.09187, over 3873634.12 frames. ], batch size: 93, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:15:26,706 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 20 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-18 01:15:47,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3617420.0, ans=0.125 2024-08-18 01:15:51,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3617520.0, ans=0.125 2024-08-18 01:15:51,734 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.41 vs. limit=10.0 2024-08-18 01:15:55,896 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.32 vs. limit=12.0 2024-08-18 01:16:04,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3617620.0, ans=0.2 2024-08-18 01:16:09,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3617620.0, ans=0.125 2024-08-18 01:16:10,373 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 28 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-18 01:16:25,120 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-18 01:16:30,624 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 4150, loss[loss=0.08298, beats_loss=0.01049, ecapa_loss=0.0001302, whisper_loss=0.07118, over 16438.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01048, ecapa_loss=0.0001487, whisper_loss=0.09065, over 3861691.02 frames. ], batch size: 65, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:16:34,396 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 19 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-18 01:16:45,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3617920.0, ans=0.2 2024-08-18 01:16:47,967 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-18 01:16:49,437 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 16 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-18 01:16:51,736 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.642e+01 2.314e+01 2.519e+01 2.790e+01 4.414e+01, threshold=5.038e+01, percent-clipped=0.0 2024-08-18 01:17:22,486 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 19 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-18 01:17:23,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3618220.0, ans=0.125 2024-08-18 01:17:25,689 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.90 vs. limit=22.5 2024-08-18 01:17:35,859 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 22 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-18 01:17:37,122 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 4200, loss[loss=0.09044, beats_loss=0.00981, ecapa_loss=0.0001412, whisper_loss=0.07922, over 20273.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01049, ecapa_loss=0.0001479, whisper_loss=0.09044, over 3892327.93 frames. ], batch size: 83, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:17:45,575 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2024-08-18 01:17:47,788 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 20 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-18 01:17:52,986 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 28 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-18 01:17:57,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3618420.0, ans=0.0 2024-08-18 01:18:09,541 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.62 vs. limit=15.0 2024-08-18 01:18:23,654 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 16 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-18 01:18:23,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3618620.0, ans=0.0 2024-08-18 01:18:30,578 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.72 vs. limit=15.0 2024-08-18 01:18:34,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3618720.0, ans=0.0 2024-08-18 01:18:36,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3618720.0, ans=0.0 2024-08-18 01:18:42,599 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 4250, loss[loss=0.08661, beats_loss=0.01273, ecapa_loss=0.000145, whisper_loss=0.07244, over 22625.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01052, ecapa_loss=0.0001474, whisper_loss=0.09013, over 3905317.29 frames. ], batch size: 94, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:18:44,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3618820.0, ans=0.125 2024-08-18 01:18:44,428 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.94 vs. limit=15.0 2024-08-18 01:18:49,319 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-18 01:18:51,842 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 21 from LS+wenet, 26 from Vox, 15 fro AS 2024-08-18 01:19:03,388 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.262e+01 2.529e+01 2.791e+01 4.848e+01, threshold=5.057e+01, percent-clipped=0.0 2024-08-18 01:19:18,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3619020.0, ans=0.1 2024-08-18 01:19:20,411 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 30 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-18 01:19:30,932 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.015e+05 2024-08-18 01:19:40,384 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.49 vs. limit=12.0 2024-08-18 01:19:48,223 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 4300, loss[loss=0.09606, beats_loss=0.01024, ecapa_loss=0.0001531, whisper_loss=0.08429, over 20572.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01051, ecapa_loss=0.0001479, whisper_loss=0.08979, over 3892156.91 frames. ], batch size: 84, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:19:49,798 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 16 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-18 01:19:51,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3619320.0, ans=0.1 2024-08-18 01:19:58,954 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-18 01:20:06,790 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.52 vs. limit=12.0 2024-08-18 01:20:12,906 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 21 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-18 01:20:13,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3619520.0, ans=0.125 2024-08-18 01:20:38,163 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-18 01:20:42,043 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 26 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-18 01:20:43,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3619720.0, ans=0.125 2024-08-18 01:20:43,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3619720.0, ans=0.0 2024-08-18 01:20:47,416 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.87 vs. limit=15.0 2024-08-18 01:20:52,152 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 4350, loss[loss=0.1218, beats_loss=0.008981, ecapa_loss=0.0001433, whisper_loss=0.1114, over 24718.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01041, ecapa_loss=0.0001472, whisper_loss=0.08964, over 3862481.85 frames. ], batch size: 95, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:20:52,902 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.88 vs. limit=15.0 2024-08-18 01:20:54,157 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.78 vs. limit=12.0 2024-08-18 01:21:10,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3619920.0, ans=0.0 2024-08-18 01:21:11,830 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-18 01:21:12,852 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.260e+01 2.529e+01 2.654e+01 4.063e+01, threshold=5.059e+01, percent-clipped=0.0 2024-08-18 01:21:15,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3619920.0, ans=0.1 2024-08-18 01:21:18,223 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 24 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-18 01:21:26,059 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 01:21:35,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3620120.0, ans=0.0 2024-08-18 01:21:37,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3620120.0, ans=0.125 2024-08-18 01:21:41,660 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-18 01:21:44,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3620220.0, ans=0.125 2024-08-18 01:21:56,896 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.29 vs. limit=15.0 2024-08-18 01:21:57,554 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 4400, loss[loss=0.08531, beats_loss=0.01264, ecapa_loss=0.0001411, whisper_loss=0.07126, over 21034.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01048, ecapa_loss=0.0001476, whisper_loss=0.08944, over 3858747.71 frames. ], batch size: 88, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:21:57,671 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-18 01:22:16,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3620420.0, ans=0.1 2024-08-18 01:22:22,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3620420.0, ans=0.09899494936611666 2024-08-18 01:22:25,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3620520.0, ans=0.0 2024-08-18 01:22:44,468 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.98 vs. limit=10.0 2024-08-18 01:23:08,309 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 4450, loss[loss=0.1258, beats_loss=0.01113, ecapa_loss=0.0001546, whisper_loss=0.1131, over 15210.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01046, ecapa_loss=0.0001488, whisper_loss=0.08941, over 3844539.20 frames. ], batch size: 60, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:23:10,060 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-18 01:23:15,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3620820.0, ans=0.0 2024-08-18 01:23:17,425 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.33 vs. limit=22.5 2024-08-18 01:23:26,274 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 28 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-18 01:23:31,461 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.820e+01 2.280e+01 2.529e+01 2.933e+01 4.954e+01, threshold=5.058e+01, percent-clipped=0.0 2024-08-18 01:23:42,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3621020.0, ans=0.0 2024-08-18 01:23:45,679 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 25 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-18 01:23:54,354 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 14 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-18 01:24:15,452 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3621220.0, ans=0.125 2024-08-18 01:24:20,855 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 4500, loss[loss=0.09824, beats_loss=0.01043, ecapa_loss=0.0001323, whisper_loss=0.08649, over 22505.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01049, ecapa_loss=0.0001476, whisper_loss=0.08881, over 3853406.30 frames. ], batch size: 88, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:24:32,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3621320.0, ans=0.5 2024-08-18 01:24:36,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3621420.0, ans=0.1 2024-08-18 01:24:39,191 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.59 vs. limit=15.0 2024-08-18 01:24:48,900 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 24 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-18 01:24:52,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3621520.0, ans=0.1 2024-08-18 01:25:15,073 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.13 vs. limit=22.5 2024-08-18 01:25:25,032 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.40 vs. limit=15.0 2024-08-18 01:25:32,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3621720.0, ans=0.125 2024-08-18 01:25:36,859 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 4550, loss[loss=0.1036, beats_loss=0.008343, ecapa_loss=0.0001837, whisper_loss=0.09344, over 17737.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.0104, ecapa_loss=0.0001495, whisper_loss=0.08904, over 3852022.95 frames. ], batch size: 71, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:25:38,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3621820.0, ans=0.0 2024-08-18 01:25:44,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3621820.0, ans=0.05 2024-08-18 01:25:45,119 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.50 vs. limit=15.0 2024-08-18 01:25:49,126 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 27 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-18 01:26:02,121 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.248e+01 2.517e+01 2.867e+01 6.444e+01, threshold=5.035e+01, percent-clipped=1.0 2024-08-18 01:26:10,497 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.31 vs. limit=15.0 2024-08-18 01:26:40,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3622120.0, ans=0.125 2024-08-18 01:26:44,288 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 19 from LS+wenet, 25 from Vox, 49 fro AS 2024-08-18 01:26:44,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3622120.0, ans=0.0 2024-08-18 01:27:02,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3622220.0, ans=0.2 2024-08-18 01:27:05,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3622220.0, ans=0.125 2024-08-18 01:27:11,263 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 4600, loss[loss=0.09542, beats_loss=0.008545, ecapa_loss=0.0001423, whisper_loss=0.08545, over 14738.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01046, ecapa_loss=0.0001488, whisper_loss=0.0887, over 3849126.70 frames. ], batch size: 56, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:27:26,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3622320.0, ans=0.2 2024-08-18 01:27:49,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3622420.0, ans=0.0 2024-08-18 01:28:34,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3622720.0, ans=0.125 2024-08-18 01:28:42,528 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 4650, loss[loss=0.1037, beats_loss=0.01152, ecapa_loss=0.0001592, whisper_loss=0.09058, over 22395.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01047, ecapa_loss=0.0001492, whisper_loss=0.08896, over 3892349.00 frames. ], batch size: 95, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:28:42,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3622820.0, ans=0.125 2024-08-18 01:28:46,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3622820.0, ans=0.0 2024-08-18 01:28:58,210 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.63 vs. limit=10.0 2024-08-18 01:29:01,952 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 24 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-18 01:29:05,784 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.266e+01 2.454e+01 2.690e+01 4.860e+01, threshold=4.909e+01, percent-clipped=0.0 2024-08-18 01:29:12,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3623020.0, ans=0.2 2024-08-18 01:29:16,309 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-18 01:29:16,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3623020.0, ans=0.125 2024-08-18 01:29:23,721 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-18 01:29:35,929 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 21 from LS+wenet, 25 from Vox, 47 fro AS 2024-08-18 01:29:38,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3623220.0, ans=0.0 2024-08-18 01:29:52,353 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 22 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-18 01:29:55,231 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 4700, loss[loss=0.1135, beats_loss=0.01122, ecapa_loss=0.0001092, whisper_loss=0.1012, over 22829.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01058, ecapa_loss=0.0001487, whisper_loss=0.0895, over 3913975.19 frames. ], batch size: 86, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:29:55,394 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 19 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-18 01:29:58,285 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 22 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-18 01:29:59,750 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 34 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-18 01:30:04,571 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.20 vs. limit=15.0 2024-08-18 01:30:05,437 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3623320.0, ans=0.0 2024-08-18 01:30:09,139 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.42 vs. limit=15.0 2024-08-18 01:30:11,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3623420.0, ans=0.125 2024-08-18 01:30:38,007 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-18 01:31:06,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3623720.0, ans=0.125 2024-08-18 01:31:08,277 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 4750, loss[loss=0.08384, beats_loss=0.006976, ecapa_loss=0.0001965, whisper_loss=0.0749, over 12731.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01052, ecapa_loss=0.0001487, whisper_loss=0.08987, over 3901548.67 frames. ], batch size: 53, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:31:20,681 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-18 01:31:35,827 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.329e+01 2.503e+01 2.951e+01 3.896e+01, threshold=5.007e+01, percent-clipped=0.0 2024-08-18 01:31:39,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3623920.0, ans=0.0 2024-08-18 01:31:53,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3624020.0, ans=0.125 2024-08-18 01:32:05,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3624120.0, ans=0.1 2024-08-18 01:32:28,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3624220.0, ans=0.125 2024-08-18 01:32:37,359 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 4800, loss[loss=0.09221, beats_loss=0.0114, ecapa_loss=0.0001343, whisper_loss=0.07947, over 19971.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01057, ecapa_loss=0.0001475, whisper_loss=0.08946, over 3912580.74 frames. ], batch size: 84, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:32:47,581 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 27 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-18 01:32:51,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3624320.0, ans=0.1 2024-08-18 01:32:54,863 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.73 vs. limit=15.0 2024-08-18 01:32:58,088 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 26 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-18 01:33:05,951 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.10 vs. limit=15.0 2024-08-18 01:33:43,860 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-18 01:33:49,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3624620.0, ans=0.09899494936611666 2024-08-18 01:34:10,451 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 4850, loss[loss=0.1135, beats_loss=0.009328, ecapa_loss=0.0001535, whisper_loss=0.1026, over 21570.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01055, ecapa_loss=0.0001476, whisper_loss=0.09026, over 3932540.15 frames. ], batch size: 84, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:34:12,382 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 25 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-18 01:34:30,588 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 01:34:33,210 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 27 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-18 01:34:33,631 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.05 vs. limit=15.0 2024-08-18 01:34:34,384 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.805e+01 2.307e+01 2.618e+01 2.911e+01 3.524e+01, threshold=5.235e+01, percent-clipped=0.0 2024-08-18 01:34:34,950 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.23 vs. limit=15.0 2024-08-18 01:34:42,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3625020.0, ans=0.1 2024-08-18 01:34:52,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3625020.0, ans=0.125 2024-08-18 01:34:56,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=3625120.0, ans=22.5 2024-08-18 01:34:58,601 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-18 01:34:58,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3625120.0, ans=0.1 2024-08-18 01:35:03,445 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-18 01:35:22,256 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 17 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-18 01:35:23,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3625320.0, ans=0.125 2024-08-18 01:35:24,641 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 4900, loss[loss=0.1006, beats_loss=0.01214, ecapa_loss=0.0001239, whisper_loss=0.08724, over 19980.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01053, ecapa_loss=0.0001472, whisper_loss=0.09076, over 3938035.63 frames. ], batch size: 77, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:35:41,249 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 13 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-18 01:35:53,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3625420.0, ans=0.0 2024-08-18 01:35:59,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3625520.0, ans=0.0 2024-08-18 01:36:38,450 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.87 vs. limit=22.5 2024-08-18 01:36:43,827 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.18 vs. limit=15.0 2024-08-18 01:36:46,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3625720.0, ans=0.0 2024-08-18 01:36:51,413 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 4950, loss[loss=0.09962, beats_loss=0.009442, ecapa_loss=0.0001428, whisper_loss=0.08875, over 16414.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0106, ecapa_loss=0.0001472, whisper_loss=0.0904, over 3908670.14 frames. ], batch size: 62, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:36:57,492 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 20 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-18 01:36:59,393 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 01:37:03,838 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 9 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-18 01:37:19,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3625920.0, ans=0.1 2024-08-18 01:37:20,308 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.328e+01 2.566e+01 3.021e+01 2.036e+02, threshold=5.131e+01, percent-clipped=1.0 2024-08-18 01:37:29,793 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 22 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-18 01:37:30,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3626020.0, ans=0.0 2024-08-18 01:37:31,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3626020.0, ans=0.125 2024-08-18 01:37:42,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3626020.0, ans=0.0 2024-08-18 01:38:24,938 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 5000, loss[loss=0.09831, beats_loss=0.009145, ecapa_loss=0.0001644, whisper_loss=0.08752, over 13903.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01058, ecapa_loss=0.0001479, whisper_loss=0.0899, over 3887106.15 frames. ], batch size: 55, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:38:25,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3626320.0, ans=0.125 2024-08-18 01:38:25,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3626320.0, ans=0.1 2024-08-18 01:38:32,454 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.91 vs. limit=22.5 2024-08-18 01:38:46,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3626420.0, ans=0.0 2024-08-18 01:38:47,693 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 16 from LS+wenet, 33 from Vox, 35 fro AS 2024-08-18 01:38:47,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3626420.0, ans=0.0 2024-08-18 01:38:53,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3626420.0, ans=0.1 2024-08-18 01:38:54,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3626420.0, ans=0.125 2024-08-18 01:38:55,144 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.36 vs. limit=15.0 2024-08-18 01:38:57,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3626520.0, ans=0.07 2024-08-18 01:39:03,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3626520.0, ans=0.0 2024-08-18 01:39:08,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3626520.0, ans=0.125 2024-08-18 01:39:08,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3626520.0, ans=0.1 2024-08-18 01:39:10,108 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 01:39:10,367 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.52 vs. limit=15.0 2024-08-18 01:39:38,863 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 5050, loss[loss=0.09386, beats_loss=0.01105, ecapa_loss=0.000135, whisper_loss=0.08145, over 20285.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01063, ecapa_loss=0.0001471, whisper_loss=0.08982, over 3878197.25 frames. ], batch size: 82, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:39:51,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3626920.0, ans=0.0 2024-08-18 01:40:01,602 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.612e+01 2.360e+01 2.516e+01 2.885e+01 7.843e+01, threshold=5.031e+01, percent-clipped=2.0 2024-08-18 01:40:32,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3627120.0, ans=0.0 2024-08-18 01:40:35,115 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 32 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-18 01:40:41,393 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 39 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-18 01:40:43,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3627220.0, ans=0.05 2024-08-18 01:40:45,120 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 24 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-18 01:40:54,604 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 5100, loss[loss=0.1069, beats_loss=0.009945, ecapa_loss=0.0001498, whisper_loss=0.09548, over 22054.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01062, ecapa_loss=0.0001472, whisper_loss=0.09063, over 3916085.14 frames. ], batch size: 88, lr: 2.46e-03, grad_scale: 1.152921504606847e+18 2024-08-18 01:41:15,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3627420.0, ans=0.125 2024-08-18 01:41:24,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3627420.0, ans=0.125 2024-08-18 01:41:47,071 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 20 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-18 01:41:51,565 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.59 vs. limit=15.0 2024-08-18 01:41:56,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3627620.0, ans=0.0 2024-08-18 01:42:07,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3627720.0, ans=0.1 2024-08-18 01:42:17,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3627720.0, ans=0.125 2024-08-18 01:42:20,120 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 5150, loss[loss=0.1133, beats_loss=0.01155, ecapa_loss=0.0001479, whisper_loss=0.1003, over 23232.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01057, ecapa_loss=0.0001468, whisper_loss=0.09048, over 3920177.62 frames. ], batch size: 93, lr: 2.46e-03, grad_scale: 1.152921504606847e+18 2024-08-18 01:42:26,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3627820.0, ans=0.1 2024-08-18 01:42:39,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3627920.0, ans=0.125 2024-08-18 01:42:46,840 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.928e+01 2.463e+01 2.680e+01 3.103e+01 6.728e+01, threshold=5.360e+01, percent-clipped=1.0 2024-08-18 01:42:53,801 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.15 vs. limit=10.0 2024-08-18 01:42:58,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3628020.0, ans=0.0 2024-08-18 01:43:13,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3628120.0, ans=0.125 2024-08-18 01:43:21,981 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 14 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-18 01:43:23,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3628220.0, ans=0.05 2024-08-18 01:43:28,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3628220.0, ans=0.125 2024-08-18 01:43:32,057 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 5200, loss[loss=0.1133, beats_loss=0.009747, ecapa_loss=0.000157, whisper_loss=0.102, over 21743.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01068, ecapa_loss=0.0001461, whisper_loss=0.08935, over 3871105.69 frames. ], batch size: 85, lr: 2.46e-03, grad_scale: 1.152921504606847e+18 2024-08-18 01:43:46,109 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 16 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-18 01:43:48,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3628420.0, ans=0.2 2024-08-18 01:44:02,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3628520.0, ans=0.0 2024-08-18 01:44:02,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3628520.0, ans=0.125 2024-08-18 01:44:12,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3628620.0, ans=0.125 2024-08-18 01:44:15,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3628620.0, ans=0.125 2024-08-18 01:44:16,794 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.32 vs. limit=15.0 2024-08-18 01:44:27,000 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 18 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-18 01:44:32,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3628720.0, ans=0.125 2024-08-18 01:44:35,679 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 5250, loss[loss=0.09855, beats_loss=0.01161, ecapa_loss=0.0001165, whisper_loss=0.08577, over 17819.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01057, ecapa_loss=0.0001466, whisper_loss=0.08962, over 3868937.97 frames. ], batch size: 70, lr: 2.46e-03, grad_scale: 1.152921504606847e+18 2024-08-18 01:44:41,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3628820.0, ans=0.0 2024-08-18 01:44:43,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3628820.0, ans=0.125 2024-08-18 01:44:48,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3628920.0, ans=0.2 2024-08-18 01:44:56,616 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.248e+01 2.479e+01 2.831e+01 4.103e+01, threshold=4.957e+01, percent-clipped=0.0 2024-08-18 01:45:16,345 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-18 01:45:28,256 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=16.07 vs. limit=15.0 2024-08-18 01:45:35,539 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-18 01:45:40,404 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 5300, loss[loss=0.09816, beats_loss=0.01126, ecapa_loss=0.0001371, whisper_loss=0.08553, over 21283.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01057, ecapa_loss=0.0001462, whisper_loss=0.08934, over 3855715.94 frames. ], batch size: 88, lr: 2.46e-03, grad_scale: 1.152921504606847e+18 2024-08-18 01:45:40,493 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 27 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-18 01:45:42,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3629320.0, ans=0.125 2024-08-18 01:45:45,845 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 16 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-18 01:46:00,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3629420.0, ans=0.125 2024-08-18 01:46:23,960 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-18 01:46:54,269 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 5350, loss[loss=0.1056, beats_loss=0.01127, ecapa_loss=0.0001121, whisper_loss=0.09324, over 23644.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01055, ecapa_loss=0.0001458, whisper_loss=0.08946, over 3864041.34 frames. ], batch size: 91, lr: 2.46e-03, grad_scale: 1.152921504606847e+18 2024-08-18 01:46:55,701 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 31 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-18 01:47:12,393 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 21 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-18 01:47:17,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3629920.0, ans=0.125 2024-08-18 01:47:20,943 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.419e+01 2.661e+01 2.975e+01 6.061e+01, threshold=5.322e+01, percent-clipped=1.0 2024-08-18 01:47:38,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3630020.0, ans=0.125 2024-08-18 01:48:04,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3630220.0, ans=0.2 2024-08-18 01:48:08,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3630220.0, ans=0.125 2024-08-18 01:48:19,659 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 5400, loss[loss=0.09548, beats_loss=0.007352, ecapa_loss=0.0001993, whisper_loss=0.08614, over 13132.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01047, ecapa_loss=0.0001472, whisper_loss=0.08991, over 3853261.31 frames. ], batch size: 56, lr: 2.46e-03, grad_scale: 1.152921504606847e+18 2024-08-18 01:48:19,805 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 33 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-18 01:48:25,067 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.58 vs. limit=15.0 2024-08-18 01:48:27,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3630320.0, ans=0.125 2024-08-18 01:49:09,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3630520.0, ans=0.2 2024-08-18 01:49:09,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3630520.0, ans=0.1 2024-08-18 01:49:37,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3630720.0, ans=0.125 2024-08-18 01:49:55,478 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.32 vs. limit=15.0 2024-08-18 01:50:00,819 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 5450, loss[loss=0.1092, beats_loss=0.01038, ecapa_loss=0.0001848, whisper_loss=0.09702, over 18507.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01042, ecapa_loss=0.0001467, whisper_loss=0.09073, over 3861565.59 frames. ], batch size: 78, lr: 2.46e-03, grad_scale: 1.152921504606847e+18 2024-08-18 01:50:07,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3630820.0, ans=0.125 2024-08-18 01:50:15,021 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 29 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-18 01:50:15,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3630820.0, ans=0.0 2024-08-18 01:50:41,906 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.974e+01 2.356e+01 2.667e+01 2.958e+01 1.634e+02, threshold=5.334e+01, percent-clipped=1.0 2024-08-18 01:50:52,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3631020.0, ans=0.1 2024-08-18 01:51:10,993 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 20 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-18 01:51:40,833 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.47 vs. limit=10.0 2024-08-18 01:51:42,592 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 15 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-18 01:51:54,995 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 21 from LS+wenet, 31 from Vox, 41 fro AS 2024-08-18 01:52:01,703 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 5500, loss[loss=0.1156, beats_loss=0.01119, ecapa_loss=0.0001625, whisper_loss=0.1028, over 21170.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01046, ecapa_loss=0.0001478, whisper_loss=0.0909, over 3861327.64 frames. ], batch size: 89, lr: 2.46e-03, grad_scale: 1.152921504606847e+18 2024-08-18 01:52:20,598 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-18 01:52:27,226 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 33 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-18 01:52:34,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3631420.0, ans=0.2 2024-08-18 01:52:40,373 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-18 01:52:43,891 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-18 01:53:00,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3631620.0, ans=0.125 2024-08-18 01:53:05,598 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.00 vs. limit=15.0 2024-08-18 01:53:35,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3631720.0, ans=0.125 2024-08-18 01:53:51,783 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 5550, loss[loss=0.08325, beats_loss=0.01253, ecapa_loss=0.0001539, whisper_loss=0.06919, over 13418.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01044, ecapa_loss=0.0001476, whisper_loss=0.09116, over 3874798.10 frames. ], batch size: 56, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:54:06,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3631820.0, ans=0.0 2024-08-18 01:54:14,217 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-18 01:54:18,554 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-18 01:54:33,492 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.313e+01 2.540e+01 2.843e+01 1.440e+02, threshold=5.080e+01, percent-clipped=2.0 2024-08-18 01:54:33,865 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 16 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-18 01:55:01,645 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.62 vs. limit=15.0 2024-08-18 01:55:18,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3632120.0, ans=0.1 2024-08-18 01:55:51,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3632320.0, ans=0.07 2024-08-18 01:55:52,277 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 5600, loss[loss=0.09904, beats_loss=0.01156, ecapa_loss=0.0001242, whisper_loss=0.08623, over 22004.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01047, ecapa_loss=0.0001471, whisper_loss=0.09065, over 3886757.44 frames. ], batch size: 87, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:55:54,096 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 20 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-18 01:55:54,768 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.14 vs. limit=15.0 2024-08-18 01:56:03,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3632320.0, ans=0.0 2024-08-18 01:56:26,390 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 21 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-18 01:56:46,360 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 36 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-18 01:57:01,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3632720.0, ans=0.0 2024-08-18 01:57:10,356 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 5650, loss[loss=0.08688, beats_loss=0.01587, ecapa_loss=9.212e-05, whisper_loss=0.07009, over 15065.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01055, ecapa_loss=0.0001465, whisper_loss=0.09007, over 3896457.54 frames. ], batch size: 57, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:57:20,352 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 30 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-18 01:57:23,487 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.49 vs. limit=10.0 2024-08-18 01:57:28,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3632920.0, ans=0.2 2024-08-18 01:57:35,818 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.365e+01 2.651e+01 2.943e+01 2.212e+02, threshold=5.303e+01, percent-clipped=3.0 2024-08-18 01:58:07,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3633120.0, ans=0.0 2024-08-18 01:58:08,914 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 41 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-18 01:58:12,003 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-18 01:58:17,356 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-18 01:58:23,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3633220.0, ans=0.125 2024-08-18 01:58:27,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3633220.0, ans=0.125 2024-08-18 01:58:27,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3633220.0, ans=0.0 2024-08-18 01:58:30,351 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 5700, loss[loss=0.09198, beats_loss=0.01205, ecapa_loss=0.0001372, whisper_loss=0.07856, over 21671.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01049, ecapa_loss=0.0001475, whisper_loss=0.09093, over 3903488.52 frames. ], batch size: 89, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:58:33,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3633320.0, ans=0.125 2024-08-18 01:58:35,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3633320.0, ans=0.125 2024-08-18 01:58:35,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3633320.0, ans=0.125 2024-08-18 01:58:45,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3633420.0, ans=0.0 2024-08-18 01:58:51,087 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 22 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-18 01:59:01,043 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.69 vs. limit=15.0 2024-08-18 01:59:07,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3633520.0, ans=0.2 2024-08-18 01:59:34,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3633720.0, ans=0.125 2024-08-18 01:59:36,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3633720.0, ans=0.125 2024-08-18 01:59:53,172 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 5750, loss[loss=0.08103, beats_loss=0.01366, ecapa_loss=0.0001228, whisper_loss=0.06614, over 16269.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01056, ecapa_loss=0.0001467, whisper_loss=0.09028, over 3880246.35 frames. ], batch size: 66, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:59:54,824 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 35 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-18 02:00:03,842 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 02:00:05,823 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 24 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-18 02:00:18,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3633920.0, ans=0.07 2024-08-18 02:00:19,679 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.317e+01 2.629e+01 2.885e+01 4.138e+01, threshold=5.257e+01, percent-clipped=0.0 2024-08-18 02:00:20,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3633920.0, ans=0.125 2024-08-18 02:00:28,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3634020.0, ans=0.025 2024-08-18 02:00:39,909 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 29 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-18 02:00:45,356 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-18 02:00:58,084 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 22 from LS+wenet, 32 from Vox, 38 fro AS 2024-08-18 02:01:02,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3634220.0, ans=0.0 2024-08-18 02:01:08,524 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 5800, loss[loss=0.1046, beats_loss=0.01176, ecapa_loss=0.0001574, whisper_loss=0.09123, over 21888.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01052, ecapa_loss=0.0001479, whisper_loss=0.09006, over 3885452.27 frames. ], batch size: 90, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:01:11,610 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-18 02:01:17,484 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.45 vs. limit=15.0 2024-08-18 02:01:38,535 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.50 vs. limit=15.0 2024-08-18 02:01:49,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3634520.0, ans=0.0 2024-08-18 02:01:52,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3634520.0, ans=0.125 2024-08-18 02:01:54,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3634620.0, ans=0.125 2024-08-18 02:01:57,668 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 16 from LS+wenet, 31 from Vox, 28 fro AS 2024-08-18 02:02:11,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3634720.0, ans=0.0 2024-08-18 02:02:20,472 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 36 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-18 02:02:27,861 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 5850, loss[loss=0.1016, beats_loss=0.0114, ecapa_loss=0.0001592, whisper_loss=0.08859, over 22456.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01052, ecapa_loss=0.0001487, whisper_loss=0.08977, over 3893747.28 frames. ], batch size: 92, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:02:28,371 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-18 02:02:42,687 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-18 02:02:47,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3634920.0, ans=0.2 2024-08-18 02:02:55,246 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.380e+01 2.623e+01 2.964e+01 4.271e+01, threshold=5.246e+01, percent-clipped=0.0 2024-08-18 02:03:04,306 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 16 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-18 02:03:15,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3635120.0, ans=0.125 2024-08-18 02:03:18,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3635120.0, ans=0.125 2024-08-18 02:03:19,535 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-18 02:03:41,473 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 5900, loss[loss=0.1061, beats_loss=0.01083, ecapa_loss=0.0001332, whisper_loss=0.09395, over 21787.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01046, ecapa_loss=0.0001484, whisper_loss=0.09032, over 3870389.92 frames. ], batch size: 86, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:04:00,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3635420.0, ans=0.125 2024-08-18 02:04:07,328 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 19 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-18 02:04:19,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3635520.0, ans=0.0 2024-08-18 02:04:21,858 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-18 02:04:34,245 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 13 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-18 02:04:49,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3635720.0, ans=0.0 2024-08-18 02:04:55,743 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 5950, loss[loss=0.111, beats_loss=0.009807, ecapa_loss=0.0001629, whisper_loss=0.0996, over 19516.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01062, ecapa_loss=0.000147, whisper_loss=0.08948, over 3908229.36 frames. ], batch size: 78, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:05:11,733 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 16 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-18 02:05:20,721 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-18 02:05:21,148 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.43 vs. limit=22.5 2024-08-18 02:05:21,851 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.770e+01 2.282e+01 2.567e+01 2.844e+01 4.690e+01, threshold=5.134e+01, percent-clipped=0.0 2024-08-18 02:05:29,892 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 15 from LS+wenet, 35 from Vox, 30 fro AS 2024-08-18 02:05:36,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3636020.0, ans=0.0 2024-08-18 02:05:42,494 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 19 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-18 02:05:51,890 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.35 vs. limit=15.0 2024-08-18 02:05:56,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3636220.0, ans=0.09899494936611666 2024-08-18 02:05:59,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3636220.0, ans=0.0 2024-08-18 02:06:13,468 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 6000, loss[loss=0.07333, beats_loss=0.01107, ecapa_loss=0.0002021, whisper_loss=0.06024, over 14889.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01063, ecapa_loss=0.0001476, whisper_loss=0.08956, over 3896156.57 frames. ], batch size: 63, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:06:13,470 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-18 02:06:47,667 INFO [train_multi_KD3.py:1149] (0/4) Epoch 25, validation on ASR_libri: loss=0.2524, beats_loss=0, ecapa_loss=0.0005148, whisper_loss=0.2472, over 922467.00 frames. 2024-08-18 02:07:03,284 INFO [train_multi_KD3.py:1149] (0/4) Epoch 25, validation on SV_voxceleb1: loss=0.004108, beats_loss=0, ecapa_loss=0.0004108, whisper_loss=0, over 939242.00 frames. 2024-08-18 02:08:43,085 INFO [train_multi_KD3.py:1149] (0/4) Epoch 25, validation on AT_audioset: loss=0.02328, beats_loss=0.02328, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 02:08:43,092 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32389MB 2024-08-18 02:08:53,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3636320.0, ans=0.125 2024-08-18 02:09:02,613 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-18 02:09:07,607 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 35 from Vox, 31 fro AS 2024-08-18 02:09:17,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3636520.0, ans=0.125 2024-08-18 02:09:25,804 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 19 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-18 02:09:27,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=3636620.0, ans=22.5 2024-08-18 02:09:46,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3636820.0, ans=0.125 2024-08-18 02:09:47,647 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 6050, loss[loss=0.1082, beats_loss=0.01059, ecapa_loss=0.0001384, whisper_loss=0.09626, over 22715.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0106, ecapa_loss=0.0001469, whisper_loss=0.08979, over 3880833.82 frames. ], batch size: 89, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:09:47,793 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 29 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-18 02:09:50,629 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 19 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-18 02:09:57,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3636820.0, ans=0.125 2024-08-18 02:10:09,397 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.386e+01 2.689e+01 3.046e+01 1.667e+02, threshold=5.379e+01, percent-clipped=1.0 2024-08-18 02:10:09,918 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3636920.0, ans=10.0 2024-08-18 02:10:24,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3637020.0, ans=0.2 2024-08-18 02:10:25,138 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 16 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-18 02:10:33,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3637120.0, ans=0.125 2024-08-18 02:10:39,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3637220.0, ans=0.125 2024-08-18 02:10:52,444 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 6100, loss[loss=0.09798, beats_loss=0.01057, ecapa_loss=0.0001412, whisper_loss=0.086, over 22377.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01058, ecapa_loss=0.0001471, whisper_loss=0.08983, over 3864176.51 frames. ], batch size: 88, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:10:58,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3637320.0, ans=0.2 2024-08-18 02:11:24,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3637520.0, ans=0.125 2024-08-18 02:11:28,502 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 02:11:31,965 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 14 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-18 02:11:33,249 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-18 02:11:34,349 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 33 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-18 02:11:41,640 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-18 02:11:43,449 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.46 vs. limit=22.5 2024-08-18 02:11:45,292 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-18 02:11:45,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3637720.0, ans=0.2 2024-08-18 02:11:49,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3637720.0, ans=0.125 2024-08-18 02:11:50,848 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-18 02:11:55,594 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 6150, loss[loss=0.1089, beats_loss=0.009728, ecapa_loss=0.0001488, whisper_loss=0.09765, over 17894.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01052, ecapa_loss=0.0001472, whisper_loss=0.09047, over 3891465.52 frames. ], batch size: 72, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:11:57,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3637820.0, ans=0.0 2024-08-18 02:12:02,056 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 35 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-18 02:12:07,079 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 37 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-18 02:12:11,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=3637920.0, ans=0.5 2024-08-18 02:12:16,889 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.229e+01 2.428e+01 2.694e+01 3.816e+01, threshold=4.856e+01, percent-clipped=0.0 2024-08-18 02:12:21,226 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 20 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-18 02:12:30,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3638020.0, ans=0.125 2024-08-18 02:12:31,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3638020.0, ans=0.125 2024-08-18 02:12:36,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3638120.0, ans=0.07 2024-08-18 02:12:45,387 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 28 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-18 02:12:57,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3638220.0, ans=0.0 2024-08-18 02:12:59,356 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 6200, loss[loss=0.1045, beats_loss=0.0105, ecapa_loss=0.000146, whisper_loss=0.09258, over 22546.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01051, ecapa_loss=0.0001474, whisper_loss=0.09068, over 3890818.26 frames. ], batch size: 92, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:13:12,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=3638420.0, ans=0.1 2024-08-18 02:13:42,152 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.18 vs. limit=15.0 2024-08-18 02:13:45,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3638620.0, ans=0.1 2024-08-18 02:13:51,883 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3638720.0, ans=0.1 2024-08-18 02:14:02,843 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 6250, loss[loss=0.09444, beats_loss=0.01256, ecapa_loss=0.0001332, whisper_loss=0.08055, over 23145.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01048, ecapa_loss=0.0001477, whisper_loss=0.09141, over 3922412.29 frames. ], batch size: 95, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:14:05,960 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-18 02:14:08,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3638820.0, ans=0.125 2024-08-18 02:14:12,164 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-18 02:14:15,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3638920.0, ans=0.0 2024-08-18 02:14:24,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3638920.0, ans=0.1 2024-08-18 02:14:24,934 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.392e+01 2.630e+01 2.962e+01 1.833e+02, threshold=5.259e+01, percent-clipped=3.0 2024-08-18 02:14:32,332 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 25 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-18 02:14:43,124 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.96 vs. limit=22.5 2024-08-18 02:14:50,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3639120.0, ans=0.1 2024-08-18 02:15:04,012 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 13 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-18 02:15:04,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3639220.0, ans=0.125 2024-08-18 02:15:06,353 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 6300, loss[loss=0.1161, beats_loss=0.01014, ecapa_loss=0.000179, whisper_loss=0.1042, over 21393.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01045, ecapa_loss=0.0001474, whisper_loss=0.09201, over 3929276.28 frames. ], batch size: 88, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:15:13,283 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.79 vs. limit=15.0 2024-08-18 02:15:39,811 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-18 02:15:44,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3639620.0, ans=0.0 2024-08-18 02:15:52,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3639620.0, ans=0.125 2024-08-18 02:15:53,828 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 23 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-18 02:16:06,928 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-18 02:16:10,547 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 6350, loss[loss=0.09509, beats_loss=0.01147, ecapa_loss=0.0001413, whisper_loss=0.08221, over 19643.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01051, ecapa_loss=0.000147, whisper_loss=0.09146, over 3902620.86 frames. ], batch size: 75, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:16:18,005 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 25 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-18 02:16:25,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3639920.0, ans=0.0 2024-08-18 02:16:25,305 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.85 vs. limit=15.0 2024-08-18 02:16:32,122 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.636e+01 2.370e+01 2.587e+01 3.086e+01 3.331e+02, threshold=5.174e+01, percent-clipped=2.0 2024-08-18 02:16:32,395 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-364000.pt 2024-08-18 02:16:41,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3640020.0, ans=0.0 2024-08-18 02:16:43,572 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.74 vs. limit=15.0 2024-08-18 02:17:04,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3640220.0, ans=0.0 2024-08-18 02:17:09,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3640220.0, ans=0.125 2024-08-18 02:17:14,962 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3640220.0, ans=0.025 2024-08-18 02:17:17,030 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 6400, loss[loss=0.09305, beats_loss=0.01134, ecapa_loss=0.0001397, whisper_loss=0.08032, over 17250.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01053, ecapa_loss=0.0001469, whisper_loss=0.09109, over 3904429.89 frames. ], batch size: 68, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:17:17,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3640320.0, ans=0.125 2024-08-18 02:17:28,718 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 25 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-18 02:17:54,593 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-18 02:18:10,114 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3640720.0, ans=0.125 2024-08-18 02:18:19,972 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 6450, loss[loss=0.103, beats_loss=0.01052, ecapa_loss=0.0001264, whisper_loss=0.09124, over 14640.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01058, ecapa_loss=0.0001471, whisper_loss=0.09065, over 3895616.83 frames. ], batch size: 58, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:18:20,137 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 19 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-18 02:18:22,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3640820.0, ans=0.125 2024-08-18 02:18:29,239 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 24 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-18 02:18:41,561 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.367e+01 2.593e+01 3.031e+01 7.766e+01, threshold=5.185e+01, percent-clipped=4.0 2024-08-18 02:18:42,208 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.78 vs. limit=12.0 2024-08-18 02:18:48,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3641020.0, ans=0.125 2024-08-18 02:18:54,166 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 27 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-18 02:18:55,665 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.518e+01 2024-08-18 02:18:56,791 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-18 02:19:18,501 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.44 vs. limit=15.0 2024-08-18 02:19:21,063 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.17 vs. limit=15.0 2024-08-18 02:19:21,831 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 22 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-18 02:19:22,879 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 6500, loss[loss=0.08976, beats_loss=0.01394, ecapa_loss=0.0001224, whisper_loss=0.0746, over 20456.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01056, ecapa_loss=0.0001464, whisper_loss=0.09139, over 3936015.14 frames. ], batch size: 81, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:19:34,570 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 13 from Vox, 49 fro AS 2024-08-18 02:19:36,156 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 02:19:36,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3641420.0, ans=0.1 2024-08-18 02:19:44,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3641420.0, ans=10.0 2024-08-18 02:19:50,607 WARNING [optim.py:496] (0/4) Scaling gradients by 0.09095240384340286, model_norm_threshold=51.85380935668945 2024-08-18 02:19:50,779 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.1.norm.log_scale with proportion 0.12, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.024e+04, grad_sumsq=4.024e+04, orig_rms_sq=1.000e+00 2024-08-18 02:20:05,966 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 20 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-18 02:20:08,700 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 23 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-18 02:20:11,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3641620.0, ans=0.0 2024-08-18 02:20:15,195 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 29 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-18 02:20:17,595 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 18 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-18 02:20:20,179 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-18 02:20:26,008 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 6550, loss[loss=0.1051, beats_loss=0.009946, ecapa_loss=0.0001205, whisper_loss=0.09394, over 22128.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01066, ecapa_loss=0.0001457, whisper_loss=0.09176, over 3955852.86 frames. ], batch size: 87, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:20:35,648 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.80 vs. limit=15.0 2024-08-18 02:20:38,944 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 20 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-18 02:20:39,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3641920.0, ans=0.1 2024-08-18 02:20:40,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3641920.0, ans=0.95 2024-08-18 02:20:44,365 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.93 vs. limit=6.0 2024-08-18 02:20:47,247 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.617e+01 2.352e+01 2.620e+01 3.012e+01 5.701e+02, threshold=5.240e+01, percent-clipped=4.0 2024-08-18 02:20:52,804 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-18 02:21:13,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3642120.0, ans=0.0 2024-08-18 02:21:15,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3642220.0, ans=0.125 2024-08-18 02:21:22,944 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-18 02:21:23,351 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.60 vs. limit=15.0 2024-08-18 02:21:29,280 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 6600, loss[loss=0.1229, beats_loss=0.01086, ecapa_loss=0.0001626, whisper_loss=0.1105, over 22772.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01051, ecapa_loss=0.0001473, whisper_loss=0.09243, over 3982779.64 frames. ], batch size: 92, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:21:31,313 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.57 vs. limit=15.0 2024-08-18 02:21:40,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3642420.0, ans=0.2 2024-08-18 02:21:52,820 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 10 from LS+wenet, 9 from Vox, 36 fro AS 2024-08-18 02:21:55,204 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-18 02:22:09,622 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-18 02:22:24,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3642720.0, ans=0.125 2024-08-18 02:22:24,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3642720.0, ans=0.2 2024-08-18 02:22:32,352 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 6650, loss[loss=0.1067, beats_loss=0.0105, ecapa_loss=0.0001961, whisper_loss=0.09425, over 16193.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01049, ecapa_loss=0.0001476, whisper_loss=0.09203, over 3929512.94 frames. ], batch size: 69, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:22:51,998 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.20 vs. limit=10.0 2024-08-18 02:22:53,773 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.312e+01 2.682e+01 2.929e+01 4.632e+01, threshold=5.364e+01, percent-clipped=0.0 2024-08-18 02:23:03,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3643020.0, ans=0.0 2024-08-18 02:23:09,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3643120.0, ans=0.2 2024-08-18 02:23:23,347 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-18 02:23:24,849 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-18 02:23:27,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=3643220.0, ans=0.05 2024-08-18 02:23:36,041 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 6700, loss[loss=0.09248, beats_loss=0.01168, ecapa_loss=0.0001296, whisper_loss=0.07951, over 20956.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01052, ecapa_loss=0.0001465, whisper_loss=0.09154, over 3935635.76 frames. ], batch size: 84, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:23:46,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3643320.0, ans=0.1 2024-08-18 02:23:49,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3643420.0, ans=0.2 2024-08-18 02:24:00,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3643520.0, ans=0.125 2024-08-18 02:24:23,382 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 15 from Vox, 50 fro AS 2024-08-18 02:24:39,521 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 6750, loss[loss=0.1168, beats_loss=0.009691, ecapa_loss=0.0001654, whisper_loss=0.1055, over 19219.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01041, ecapa_loss=0.0001483, whisper_loss=0.09222, over 3886279.24 frames. ], batch size: 74, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:24:56,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3643920.0, ans=0.125 2024-08-18 02:24:58,913 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 13 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-18 02:25:01,356 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.309e+01 2.478e+01 2.753e+01 4.386e+01, threshold=4.956e+01, percent-clipped=0.0 2024-08-18 02:25:02,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3643920.0, ans=0.07 2024-08-18 02:25:04,409 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-18 02:25:11,867 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.467e+00 2024-08-18 02:25:31,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3644220.0, ans=0.125 2024-08-18 02:25:43,310 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 6800, loss[loss=0.09421, beats_loss=0.01015, ecapa_loss=0.0001419, whisper_loss=0.08263, over 22723.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01043, ecapa_loss=0.0001482, whisper_loss=0.09119, over 3875826.10 frames. ], batch size: 91, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:25:43,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3644320.0, ans=0.0 2024-08-18 02:25:44,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3644320.0, ans=0.0 2024-08-18 02:25:49,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3644320.0, ans=0.125 2024-08-18 02:25:52,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3644320.0, ans=0.0 2024-08-18 02:25:53,072 WARNING [optim.py:496] (0/4) Scaling gradients by 0.054543543606996536, model_norm_threshold=49.555973052978516 2024-08-18 02:25:53,249 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.17, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.396e+05, grad_sumsq=4.165e+04, orig_rms_sq=3.352e+00 2024-08-18 02:26:03,100 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-18 02:26:03,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3644420.0, ans=0.2 2024-08-18 02:26:25,903 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 29 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-18 02:26:29,590 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 22 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-18 02:26:36,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3644720.0, ans=0.1 2024-08-18 02:26:37,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3644720.0, ans=0.125 2024-08-18 02:26:47,874 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 6850, loss[loss=0.09716, beats_loss=0.01125, ecapa_loss=0.0001243, whisper_loss=0.08466, over 15071.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01047, ecapa_loss=0.0001474, whisper_loss=0.09088, over 3859887.85 frames. ], batch size: 59, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:26:49,363 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 24 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-18 02:27:09,745 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.836e+01 2.331e+01 2.507e+01 2.800e+01 9.086e+02, threshold=5.014e+01, percent-clipped=3.0 2024-08-18 02:27:13,099 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.21 vs. limit=15.0 2024-08-18 02:27:17,283 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 17 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-18 02:27:34,192 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-18 02:27:40,588 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 38 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-18 02:27:43,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3645220.0, ans=0.125 2024-08-18 02:27:44,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3645220.0, ans=0.0 2024-08-18 02:27:51,595 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 6900, loss[loss=0.09549, beats_loss=0.01099, ecapa_loss=0.0001621, whisper_loss=0.08288, over 19305.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01047, ecapa_loss=0.0001471, whisper_loss=0.09169, over 3874792.56 frames. ], batch size: 82, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:27:53,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3645320.0, ans=0.1 2024-08-18 02:27:56,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3645320.0, ans=0.2 2024-08-18 02:27:57,895 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 26 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-18 02:28:03,283 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-18 02:28:14,460 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 19 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-18 02:28:21,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3645520.0, ans=0.0 2024-08-18 02:28:39,626 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-18 02:28:46,311 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-18 02:28:48,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3645720.0, ans=0.125 2024-08-18 02:28:50,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3645720.0, ans=0.125 2024-08-18 02:28:50,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3645720.0, ans=0.05 2024-08-18 02:28:54,602 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 6950, loss[loss=0.1048, beats_loss=0.0109, ecapa_loss=0.0001349, whisper_loss=0.09255, over 18741.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01064, ecapa_loss=0.0001459, whisper_loss=0.09098, over 3882848.63 frames. ], batch size: 72, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:29:05,547 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-18 02:29:11,480 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 24 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-18 02:29:16,088 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.856e+01 2.274e+01 2.573e+01 2.776e+01 4.485e+01, threshold=5.146e+01, percent-clipped=0.0 2024-08-18 02:29:16,521 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 14 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-18 02:29:37,717 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-18 02:29:40,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3646120.0, ans=0.0 2024-08-18 02:29:59,585 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 7000, loss[loss=0.09135, beats_loss=0.01492, ecapa_loss=0.0001028, whisper_loss=0.0754, over 22936.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01057, ecapa_loss=0.0001463, whisper_loss=0.09118, over 3884934.63 frames. ], batch size: 93, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:30:04,909 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 19 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-18 02:30:19,505 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.21 vs. limit=8.0 2024-08-18 02:30:20,117 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-18 02:30:49,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3646620.0, ans=0.035 2024-08-18 02:31:03,215 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 22 from LS+wenet, 14 from Vox, 18 fro AS 2024-08-18 02:31:09,211 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 7050, loss[loss=0.0938, beats_loss=0.008808, ecapa_loss=0.0001252, whisper_loss=0.08374, over 16661.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0106, ecapa_loss=0.0001456, whisper_loss=0.09028, over 3849718.86 frames. ], batch size: 63, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:31:18,859 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 27 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-18 02:31:19,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3646820.0, ans=0.0 2024-08-18 02:31:23,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3646920.0, ans=0.125 2024-08-18 02:31:32,087 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.85 vs. limit=15.0 2024-08-18 02:31:32,738 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.611e+01 2.265e+01 2.505e+01 2.734e+01 3.913e+01, threshold=5.009e+01, percent-clipped=0.0 2024-08-18 02:31:40,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3647020.0, ans=0.125 2024-08-18 02:31:45,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3647020.0, ans=0.0 2024-08-18 02:31:58,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3647120.0, ans=0.05 2024-08-18 02:32:07,656 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-18 02:32:13,626 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 7100, loss[loss=0.1126, beats_loss=0.01116, ecapa_loss=0.0001317, whisper_loss=0.1001, over 21416.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0106, ecapa_loss=0.0001454, whisper_loss=0.09035, over 3847560.54 frames. ], batch size: 86, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:32:18,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3647320.0, ans=0.125 2024-08-18 02:32:20,067 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 20 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-18 02:32:36,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3647420.0, ans=0.1 2024-08-18 02:32:39,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3647520.0, ans=0.0 2024-08-18 02:33:05,379 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-18 02:33:10,224 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-18 02:33:15,158 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 7150, loss[loss=0.09869, beats_loss=0.009873, ecapa_loss=0.0001372, whisper_loss=0.08744, over 23613.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01062, ecapa_loss=0.0001457, whisper_loss=0.09024, over 3861962.65 frames. ], batch size: 91, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:33:17,657 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 15 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-18 02:33:20,100 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-18 02:33:36,648 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.260e+01 2.515e+01 2.745e+01 4.282e+01, threshold=5.030e+01, percent-clipped=0.0 2024-08-18 02:33:37,040 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-18 02:33:43,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3648020.0, ans=0.1 2024-08-18 02:33:52,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3648120.0, ans=0.0 2024-08-18 02:33:54,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3648120.0, ans=0.125 2024-08-18 02:33:55,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3648120.0, ans=0.0 2024-08-18 02:34:11,288 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.907e+00 2024-08-18 02:34:19,718 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 7200, loss[loss=0.09073, beats_loss=0.01242, ecapa_loss=0.000144, whisper_loss=0.07687, over 20910.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01057, ecapa_loss=0.0001468, whisper_loss=0.0906, over 3856728.22 frames. ], batch size: 86, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:34:21,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3648320.0, ans=0.125 2024-08-18 02:34:22,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3648320.0, ans=0.025 2024-08-18 02:34:23,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3648320.0, ans=0.0 2024-08-18 02:34:49,003 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-18 02:34:51,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3648520.0, ans=0.125 2024-08-18 02:35:08,489 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-18 02:35:16,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3648720.0, ans=0.1 2024-08-18 02:35:21,970 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 7250, loss[loss=0.1349, beats_loss=0.00792, ecapa_loss=0.0001502, whisper_loss=0.1255, over 22873.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01051, ecapa_loss=0.0001477, whisper_loss=0.09113, over 3892673.78 frames. ], batch size: 90, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:35:43,429 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.335e+01 2.544e+01 2.816e+01 3.698e+01, threshold=5.087e+01, percent-clipped=0.0 2024-08-18 02:35:54,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3649020.0, ans=0.125 2024-08-18 02:35:58,561 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 27 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-18 02:35:58,869 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3649120.0, ans=0.1 2024-08-18 02:36:04,754 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-18 02:36:11,307 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 24 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-18 02:36:17,398 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 24 from LS+wenet, 22 from Vox, 48 fro AS 2024-08-18 02:36:21,058 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 20 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-18 02:36:22,160 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 17 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-18 02:36:23,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3649320.0, ans=0.07 2024-08-18 02:36:24,295 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 7300, loss[loss=0.0866, beats_loss=0.0132, ecapa_loss=0.0001361, whisper_loss=0.07204, over 21700.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01055, ecapa_loss=0.0001475, whisper_loss=0.09083, over 3880942.47 frames. ], batch size: 94, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:36:24,400 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 34 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-18 02:36:45,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3649420.0, ans=0.2 2024-08-18 02:36:46,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3649420.0, ans=0.125 2024-08-18 02:36:58,539 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 19 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-18 02:37:05,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3649620.0, ans=0.125 2024-08-18 02:37:06,702 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.28 vs. limit=15.0 2024-08-18 02:37:20,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3649720.0, ans=0.0 2024-08-18 02:37:27,378 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 7350, loss[loss=0.106, beats_loss=0.007139, ecapa_loss=0.0001921, whisper_loss=0.09696, over 14919.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01057, ecapa_loss=0.0001473, whisper_loss=0.0902, over 3848312.65 frames. ], batch size: 60, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:37:35,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3649820.0, ans=0.0 2024-08-18 02:37:43,793 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 22 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-18 02:37:48,684 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.729e+01 2.224e+01 2.419e+01 2.653e+01 3.642e+01, threshold=4.838e+01, percent-clipped=0.0 2024-08-18 02:37:49,027 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 23 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-18 02:37:52,499 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 23 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-18 02:38:08,306 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.34 vs. limit=15.0 2024-08-18 02:38:10,416 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.00 vs. limit=6.0 2024-08-18 02:38:11,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3650120.0, ans=0.0 2024-08-18 02:38:29,264 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 7400, loss[loss=0.1082, beats_loss=0.009176, ecapa_loss=0.0001762, whisper_loss=0.09721, over 16298.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01052, ecapa_loss=0.0001481, whisper_loss=0.08994, over 3855465.46 frames. ], batch size: 67, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:38:32,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3650320.0, ans=0.125 2024-08-18 02:38:33,590 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.14 vs. limit=22.5 2024-08-18 02:38:49,357 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 16 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-18 02:38:51,675 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 27 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-18 02:38:59,824 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.60 vs. limit=22.5 2024-08-18 02:39:05,754 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.16 vs. limit=15.0 2024-08-18 02:39:20,290 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 18 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-18 02:39:27,573 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-18 02:39:29,888 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-18 02:39:30,933 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 7450, loss[loss=0.09636, beats_loss=0.009736, ecapa_loss=0.000147, whisper_loss=0.08516, over 21310.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01056, ecapa_loss=0.0001473, whisper_loss=0.08993, over 3876510.00 frames. ], batch size: 89, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:39:43,679 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 14 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-18 02:39:47,207 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-18 02:39:47,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3650920.0, ans=0.125 2024-08-18 02:39:50,822 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 16 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-18 02:39:52,037 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.351e+01 2.534e+01 2.762e+01 3.772e+01, threshold=5.068e+01, percent-clipped=0.0 2024-08-18 02:39:55,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3651020.0, ans=0.07 2024-08-18 02:39:59,731 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-18 02:40:03,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3651020.0, ans=0.1 2024-08-18 02:40:08,123 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 14 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-18 02:40:11,045 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=5.331e+05 2024-08-18 02:40:18,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3651120.0, ans=0.125 2024-08-18 02:40:32,766 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 7500, loss[loss=0.09384, beats_loss=0.01146, ecapa_loss=0.0001351, whisper_loss=0.08103, over 15205.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01051, ecapa_loss=0.0001474, whisper_loss=0.09055, over 3874792.11 frames. ], batch size: 60, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:40:32,931 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 24 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-18 02:40:33,369 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.59 vs. limit=12.0 2024-08-18 02:40:36,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3651320.0, ans=0.125 2024-08-18 02:40:37,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3651320.0, ans=0.125 2024-08-18 02:40:38,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3651320.0, ans=0.1 2024-08-18 02:40:45,554 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-18 02:40:48,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3651420.0, ans=0.0 2024-08-18 02:40:52,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3651420.0, ans=0.0 2024-08-18 02:41:00,686 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-18 02:41:03,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=3651520.0, ans=0.1 2024-08-18 02:41:35,019 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 7550, loss[loss=0.1053, beats_loss=0.009433, ecapa_loss=0.0001258, whisper_loss=0.09456, over 16077.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01049, ecapa_loss=0.0001487, whisper_loss=0.09039, over 3845102.22 frames. ], batch size: 57, lr: 2.45e-03, grad_scale: 1.152921504606847e+18 2024-08-18 02:41:37,756 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 26 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-18 02:41:37,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3651820.0, ans=0.07 2024-08-18 02:41:44,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3651820.0, ans=0.2 2024-08-18 02:41:46,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3651920.0, ans=0.0 2024-08-18 02:41:47,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3651920.0, ans=0.125 2024-08-18 02:41:51,787 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.55 vs. limit=15.0 2024-08-18 02:41:56,106 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.283e+01 2.523e+01 2.754e+01 3.706e+01, threshold=5.046e+01, percent-clipped=0.0 2024-08-18 02:41:56,505 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 20 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-18 02:42:05,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3652020.0, ans=0.0 2024-08-18 02:42:11,120 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 22 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-18 02:42:19,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3652120.0, ans=0.125 2024-08-18 02:42:20,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3652120.0, ans=0.2 2024-08-18 02:42:25,505 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-18 02:42:30,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3652220.0, ans=0.2 2024-08-18 02:42:33,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3652220.0, ans=0.1 2024-08-18 02:42:37,800 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 7600, loss[loss=0.08994, beats_loss=0.01046, ecapa_loss=0.0001281, whisper_loss=0.0782, over 16356.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01039, ecapa_loss=0.0001483, whisper_loss=0.09052, over 3824148.94 frames. ], batch size: 63, lr: 2.45e-03, grad_scale: 1.152921504606847e+18 2024-08-18 02:42:39,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3652320.0, ans=0.125 2024-08-18 02:42:41,738 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 28 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-18 02:42:43,010 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-18 02:42:48,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3652320.0, ans=0.125 2024-08-18 02:42:48,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3652320.0, ans=0.1 2024-08-18 02:42:55,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3652420.0, ans=0.125 2024-08-18 02:43:05,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3652520.0, ans=0.0 2024-08-18 02:43:06,555 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-18 02:43:27,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3652720.0, ans=0.0 2024-08-18 02:43:31,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3652720.0, ans=0.125 2024-08-18 02:43:33,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3652720.0, ans=0.0 2024-08-18 02:43:40,165 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 7650, loss[loss=0.07994, beats_loss=0.01367, ecapa_loss=0.0001228, whisper_loss=0.06504, over 21268.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01044, ecapa_loss=0.000149, whisper_loss=0.08972, over 3858544.60 frames. ], batch size: 87, lr: 2.45e-03, grad_scale: 1.152921504606847e+18 2024-08-18 02:43:41,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3652820.0, ans=0.125 2024-08-18 02:44:01,636 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.389e+01 2.635e+01 3.049e+01 5.266e+01, threshold=5.270e+01, percent-clipped=1.0 2024-08-18 02:44:04,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3653020.0, ans=0.125 2024-08-18 02:44:20,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3653120.0, ans=0.04949747468305833 2024-08-18 02:44:33,219 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 32 from Vox, 32 fro AS 2024-08-18 02:44:43,050 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 7700, loss[loss=0.1023, beats_loss=0.006865, ecapa_loss=0.0001812, whisper_loss=0.09367, over 16316.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01041, ecapa_loss=0.0001481, whisper_loss=0.09013, over 3863632.29 frames. ], batch size: 65, lr: 2.45e-03, grad_scale: 1.152921504606847e+18 2024-08-18 02:44:52,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3653320.0, ans=0.0 2024-08-18 02:45:03,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3653420.0, ans=0.125 2024-08-18 02:45:03,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3653420.0, ans=0.125 2024-08-18 02:45:21,549 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 21 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-18 02:45:37,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3653720.0, ans=0.1 2024-08-18 02:45:41,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3653720.0, ans=0.07 2024-08-18 02:45:44,539 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 7750, loss[loss=0.07391, beats_loss=0.01219, ecapa_loss=0.0001321, whisper_loss=0.06039, over 19132.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01047, ecapa_loss=0.0001463, whisper_loss=0.09005, over 3849808.53 frames. ], batch size: 77, lr: 2.45e-03, grad_scale: 1.152921504606847e+18 2024-08-18 02:45:50,102 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.38 vs. limit=15.0 2024-08-18 02:46:06,083 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.345e+01 2.664e+01 3.102e+01 5.489e+01, threshold=5.327e+01, percent-clipped=1.0 2024-08-18 02:46:09,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3654020.0, ans=0.2 2024-08-18 02:46:13,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3654020.0, ans=0.1 2024-08-18 02:46:14,313 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.77 vs. limit=22.5 2024-08-18 02:46:15,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3654020.0, ans=0.125 2024-08-18 02:46:32,592 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-18 02:46:35,098 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 21 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-18 02:46:47,619 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 7800, loss[loss=0.08987, beats_loss=0.00826, ecapa_loss=0.0001491, whisper_loss=0.08012, over 13680.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0104, ecapa_loss=0.0001471, whisper_loss=0.09023, over 3857135.01 frames. ], batch size: 56, lr: 2.45e-03, grad_scale: 1.152921504606847e+18 2024-08-18 02:46:56,227 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 19 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-18 02:47:00,257 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-18 02:47:05,124 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 38 from LS+wenet, 26 from Vox, 22 fro AS 2024-08-18 02:47:07,726 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-18 02:47:17,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3654520.0, ans=0.125 2024-08-18 02:47:21,827 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.15 vs. limit=15.0 2024-08-18 02:47:24,964 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 35 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-18 02:47:27,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3654620.0, ans=0.125 2024-08-18 02:47:46,069 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 24 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-18 02:47:49,621 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 7850, loss[loss=0.1052, beats_loss=0.008661, ecapa_loss=0.000201, whisper_loss=0.09452, over 19756.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01046, ecapa_loss=0.0001469, whisper_loss=0.09075, over 3865098.78 frames. ], batch size: 84, lr: 2.45e-03, grad_scale: 1.152921504606847e+18 2024-08-18 02:48:01,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3654920.0, ans=0.2 2024-08-18 02:48:07,670 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.80 vs. limit=15.0 2024-08-18 02:48:10,560 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.350e+01 2.619e+01 2.873e+01 2.983e+02, threshold=5.237e+01, percent-clipped=1.0 2024-08-18 02:48:32,502 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.30 vs. limit=15.0 2024-08-18 02:48:46,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3655220.0, ans=0.05 2024-08-18 02:48:51,566 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 7900, loss[loss=0.08747, beats_loss=0.0117, ecapa_loss=0.000127, whisper_loss=0.0745, over 17046.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01042, ecapa_loss=0.0001471, whisper_loss=0.0913, over 3855861.01 frames. ], batch size: 70, lr: 2.45e-03, grad_scale: 1.152921504606847e+18 2024-08-18 02:48:51,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3655320.0, ans=0.0 2024-08-18 02:48:52,945 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-18 02:49:02,163 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.41 vs. limit=22.5 2024-08-18 02:49:04,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3655420.0, ans=0.2 2024-08-18 02:49:05,188 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-18 02:49:18,931 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 23 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-18 02:49:21,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3655520.0, ans=0.0 2024-08-18 02:49:22,210 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.50 vs. limit=15.0 2024-08-18 02:49:33,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3655620.0, ans=0.125 2024-08-18 02:49:47,462 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-18 02:49:49,484 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.01 vs. limit=6.0 2024-08-18 02:49:51,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3655720.0, ans=0.125 2024-08-18 02:49:53,547 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 7950, loss[loss=0.088, beats_loss=0.01204, ecapa_loss=0.000206, whisper_loss=0.0739, over 15814.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01045, ecapa_loss=0.0001467, whisper_loss=0.09089, over 3869096.44 frames. ], batch size: 68, lr: 2.45e-03, grad_scale: 1.152921504606847e+18 2024-08-18 02:49:57,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3655820.0, ans=0.2 2024-08-18 02:50:02,237 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 29 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-18 02:50:10,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3655920.0, ans=0.125 2024-08-18 02:50:14,862 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.726e+01 2.294e+01 2.620e+01 2.907e+01 9.033e+01, threshold=5.239e+01, percent-clipped=1.0 2024-08-18 02:50:40,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3656120.0, ans=0.0 2024-08-18 02:50:44,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3656220.0, ans=0.2 2024-08-18 02:50:46,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3656220.0, ans=0.125 2024-08-18 02:50:46,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3656220.0, ans=0.1 2024-08-18 02:50:53,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3656220.0, ans=0.0 2024-08-18 02:50:55,542 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 8000, loss[loss=0.1168, beats_loss=0.009186, ecapa_loss=0.0001325, whisper_loss=0.1063, over 19421.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01056, ecapa_loss=0.0001464, whisper_loss=0.09065, over 3890985.04 frames. ], batch size: 73, lr: 2.45e-03, grad_scale: 1.152921504606847e+18 2024-08-18 02:51:02,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3656320.0, ans=0.0 2024-08-18 02:51:15,593 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 22 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-18 02:51:20,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3656520.0, ans=0.125 2024-08-18 02:51:27,483 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.27 vs. limit=15.0 2024-08-18 02:51:32,957 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 28 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-18 02:51:34,135 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 13 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-18 02:51:39,985 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.07 vs. limit=15.0 2024-08-18 02:51:57,239 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 8050, loss[loss=0.1107, beats_loss=0.01203, ecapa_loss=0.0001651, whisper_loss=0.097, over 21697.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01053, ecapa_loss=0.0001459, whisper_loss=0.09042, over 3873913.87 frames. ], batch size: 89, lr: 2.45e-03, grad_scale: 1.152921504606847e+18 2024-08-18 02:52:00,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3656820.0, ans=0.125 2024-08-18 02:52:00,481 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2024-08-18 02:52:11,351 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.219e-03 2024-08-18 02:52:14,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3656920.0, ans=0.125 2024-08-18 02:52:19,600 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.387e+01 2.609e+01 2.994e+01 4.076e+01, threshold=5.217e+01, percent-clipped=0.0 2024-08-18 02:52:34,559 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 11 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-18 02:52:37,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3657120.0, ans=0.125 2024-08-18 02:52:39,906 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 30 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-18 02:52:50,060 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-18 02:52:52,422 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-18 02:52:59,774 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 8100, loss[loss=0.1063, beats_loss=0.009591, ecapa_loss=0.0001445, whisper_loss=0.09526, over 21118.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01047, ecapa_loss=0.0001469, whisper_loss=0.09042, over 3885362.35 frames. ], batch size: 84, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:53:01,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3657320.0, ans=0.0 2024-08-18 02:53:03,871 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 18 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-18 02:53:06,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3657320.0, ans=0.125 2024-08-18 02:53:13,763 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.86 vs. limit=22.5 2024-08-18 02:53:15,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3657420.0, ans=0.125 2024-08-18 02:53:26,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3657520.0, ans=0.125 2024-08-18 02:53:37,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3657620.0, ans=0.125 2024-08-18 02:53:38,470 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-18 02:53:50,669 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-18 02:53:52,626 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.13 vs. limit=10.0 2024-08-18 02:53:53,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3657720.0, ans=0.2 2024-08-18 02:54:02,045 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 8150, loss[loss=0.1083, beats_loss=0.009925, ecapa_loss=0.0001593, whisper_loss=0.09682, over 21329.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01038, ecapa_loss=0.0001477, whisper_loss=0.09042, over 3866879.78 frames. ], batch size: 83, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:54:07,496 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.83 vs. limit=15.0 2024-08-18 02:54:13,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3657920.0, ans=0.2 2024-08-18 02:54:24,341 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.796e+01 2.183e+01 2.473e+01 2.825e+01 3.767e+01, threshold=4.945e+01, percent-clipped=0.0 2024-08-18 02:54:27,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3658020.0, ans=0.1 2024-08-18 02:54:33,645 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.56 vs. limit=15.0 2024-08-18 02:54:46,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3658120.0, ans=0.125 2024-08-18 02:54:48,017 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 35 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-18 02:55:01,349 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 37 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-18 02:55:03,648 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 8200, loss[loss=0.1171, beats_loss=0.00892, ecapa_loss=0.0001716, whisper_loss=0.1065, over 18911.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0104, ecapa_loss=0.0001475, whisper_loss=0.09064, over 3896222.68 frames. ], batch size: 77, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:55:15,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3658420.0, ans=0.125 2024-08-18 02:55:36,746 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-18 02:55:53,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3658720.0, ans=0.125 2024-08-18 02:56:05,249 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 8250, loss[loss=0.1207, beats_loss=0.01068, ecapa_loss=0.0001216, whisper_loss=0.1088, over 24407.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01043, ecapa_loss=0.0001467, whisper_loss=0.09075, over 3918914.00 frames. ], batch size: 93, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:56:05,960 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.49 vs. limit=10.0 2024-08-18 02:56:19,923 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 32 from Vox, 33 fro AS 2024-08-18 02:56:27,176 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.707e+01 2.331e+01 2.566e+01 2.988e+01 3.927e+01, threshold=5.131e+01, percent-clipped=0.0 2024-08-18 02:56:32,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3659020.0, ans=0.125 2024-08-18 02:56:52,067 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.23 vs. limit=15.0 2024-08-18 02:57:03,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3659220.0, ans=0.0 2024-08-18 02:57:05,454 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.46 vs. limit=12.0 2024-08-18 02:57:07,217 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 8300, loss[loss=0.08605, beats_loss=0.01035, ecapa_loss=0.0001565, whisper_loss=0.07413, over 21661.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01051, ecapa_loss=0.0001475, whisper_loss=0.08981, over 3908993.99 frames. ], batch size: 89, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:57:20,150 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 21 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-18 02:57:21,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3659420.0, ans=0.0 2024-08-18 02:57:23,989 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.83 vs. limit=15.0 2024-08-18 02:57:24,783 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 23 from LS+wenet, 30 from Vox, 43 fro AS 2024-08-18 02:57:26,218 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.470e+00 2024-08-18 02:57:47,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3659620.0, ans=0.0 2024-08-18 02:57:57,897 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.21 vs. limit=15.0 2024-08-18 02:58:01,134 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-18 02:58:04,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3659720.0, ans=0.125 2024-08-18 02:58:09,756 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 8350, loss[loss=0.1052, beats_loss=0.009766, ecapa_loss=0.000166, whisper_loss=0.09378, over 21951.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01052, ecapa_loss=0.000149, whisper_loss=0.0893, over 3923843.34 frames. ], batch size: 92, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:58:11,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3659820.0, ans=0.1 2024-08-18 02:58:31,164 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-18 02:58:31,435 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 02:58:32,227 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.807e+01 2.239e+01 2.555e+01 2.867e+01 4.693e+01, threshold=5.109e+01, percent-clipped=0.0 2024-08-18 02:58:33,892 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-18 02:58:34,114 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3660020.0, ans=0.125 2024-08-18 02:58:39,921 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-18 02:58:43,998 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 02:58:47,949 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2024-08-18 02:58:49,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3660120.0, ans=0.1 2024-08-18 02:58:53,732 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-18 02:58:55,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3660120.0, ans=0.125 2024-08-18 02:58:57,425 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 16 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-18 02:59:00,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3660220.0, ans=0.0 2024-08-18 02:59:05,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3660220.0, ans=0.0 2024-08-18 02:59:12,361 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 8400, loss[loss=0.09148, beats_loss=0.01248, ecapa_loss=0.0001434, whisper_loss=0.07757, over 22282.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01043, ecapa_loss=0.0001487, whisper_loss=0.09005, over 3925032.01 frames. ], batch size: 93, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:59:18,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3660320.0, ans=0.2 2024-08-18 02:59:25,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3660420.0, ans=6.0 2024-08-18 02:59:28,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3660420.0, ans=0.015 2024-08-18 02:59:31,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3660420.0, ans=0.125 2024-08-18 02:59:33,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3660420.0, ans=0.0 2024-08-18 02:59:43,900 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 21 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-18 02:59:45,193 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 20 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-18 02:59:45,656 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.72 vs. limit=22.5 2024-08-18 03:00:03,457 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-18 03:00:08,193 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-18 03:00:14,076 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 8450, loss[loss=0.123, beats_loss=0.009751, ecapa_loss=0.0001387, whisper_loss=0.1119, over 22373.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01042, ecapa_loss=0.0001483, whisper_loss=0.09071, over 3948282.12 frames. ], batch size: 87, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:00:20,624 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 26 from LS+wenet, 17 from Vox, 51 fro AS 2024-08-18 03:00:29,494 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.24 vs. limit=22.5 2024-08-18 03:00:36,001 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.643e+01 2.348e+01 2.564e+01 2.815e+01 4.784e+01, threshold=5.128e+01, percent-clipped=0.0 2024-08-18 03:00:44,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3661020.0, ans=0.1 2024-08-18 03:01:02,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3661220.0, ans=0.125 2024-08-18 03:01:16,237 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 8500, loss[loss=0.1026, beats_loss=0.01157, ecapa_loss=0.0001445, whisper_loss=0.08962, over 21733.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01048, ecapa_loss=0.000148, whisper_loss=0.09075, over 3955984.06 frames. ], batch size: 92, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:01:17,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3661320.0, ans=0.0 2024-08-18 03:01:17,987 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2024-08-18 03:01:23,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3661320.0, ans=0.125 2024-08-18 03:01:26,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3661320.0, ans=0.2 2024-08-18 03:01:32,914 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.86 vs. limit=15.0 2024-08-18 03:01:33,576 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 35 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-18 03:01:40,519 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=7.549e-02 2024-08-18 03:01:42,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3661520.0, ans=0.125 2024-08-18 03:01:47,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3661520.0, ans=0.125 2024-08-18 03:01:49,973 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 32 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-18 03:01:50,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3661520.0, ans=0.125 2024-08-18 03:01:57,448 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-18 03:02:08,943 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3661720.0, ans=0.2 2024-08-18 03:02:15,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3661720.0, ans=0.125 2024-08-18 03:02:18,320 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 8550, loss[loss=0.09413, beats_loss=0.01221, ecapa_loss=0.0001227, whisper_loss=0.08069, over 23222.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01049, ecapa_loss=0.0001477, whisper_loss=0.09073, over 3938707.35 frames. ], batch size: 93, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:02:26,221 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.45 vs. limit=22.5 2024-08-18 03:02:40,829 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.327e+01 2.493e+01 2.871e+01 1.525e+02, threshold=4.987e+01, percent-clipped=2.0 2024-08-18 03:02:49,820 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.00 vs. limit=22.5 2024-08-18 03:02:50,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3662020.0, ans=0.0 2024-08-18 03:02:59,981 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.111e+01 2024-08-18 03:03:14,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3662220.0, ans=0.1 2024-08-18 03:03:20,359 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 8600, loss[loss=0.1118, beats_loss=0.01068, ecapa_loss=0.0001355, whisper_loss=0.09981, over 22476.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01056, ecapa_loss=0.0001461, whisper_loss=0.09099, over 3939262.90 frames. ], batch size: 86, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:03:31,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3662420.0, ans=0.0 2024-08-18 03:03:42,802 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 20 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-18 03:04:00,364 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-18 03:04:01,604 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 29 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-18 03:04:22,293 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 8650, loss[loss=0.1139, beats_loss=0.009173, ecapa_loss=0.0001438, whisper_loss=0.1033, over 15695.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01058, ecapa_loss=0.0001462, whisper_loss=0.09052, over 3935718.09 frames. ], batch size: 61, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:04:26,191 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-18 03:04:39,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3662920.0, ans=0.125 2024-08-18 03:04:44,736 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.998e+01 2.275e+01 2.600e+01 2.961e+01 1.282e+02, threshold=5.200e+01, percent-clipped=4.0 2024-08-18 03:04:46,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3663020.0, ans=0.125 2024-08-18 03:04:50,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3663020.0, ans=0.125 2024-08-18 03:05:17,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3663220.0, ans=0.125 2024-08-18 03:05:20,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3663220.0, ans=0.1 2024-08-18 03:05:25,013 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 8700, loss[loss=0.1051, beats_loss=0.009859, ecapa_loss=0.0001685, whisper_loss=0.09354, over 20780.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01063, ecapa_loss=0.0001461, whisper_loss=0.08998, over 3927693.15 frames. ], batch size: 84, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:05:27,756 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-18 03:05:35,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3663320.0, ans=0.125 2024-08-18 03:05:41,964 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.87 vs. limit=6.0 2024-08-18 03:05:46,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3663420.0, ans=0.125 2024-08-18 03:05:46,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3663420.0, ans=0.1 2024-08-18 03:05:54,548 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=5.571e+00 2024-08-18 03:06:00,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3663520.0, ans=0.1 2024-08-18 03:06:28,357 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 8750, loss[loss=0.09509, beats_loss=0.01029, ecapa_loss=0.0001546, whisper_loss=0.08325, over 15980.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01063, ecapa_loss=0.0001463, whisper_loss=0.08975, over 3913184.74 frames. ], batch size: 64, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:06:29,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3663820.0, ans=0.125 2024-08-18 03:06:51,113 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.333e+01 2.575e+01 2.893e+01 4.359e+01, threshold=5.151e+01, percent-clipped=0.0 2024-08-18 03:06:57,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3664020.0, ans=0.0 2024-08-18 03:07:25,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3664220.0, ans=0.0 2024-08-18 03:07:31,105 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 8800, loss[loss=0.1091, beats_loss=0.01235, ecapa_loss=0.0001354, whisper_loss=0.09537, over 22909.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01072, ecapa_loss=0.0001462, whisper_loss=0.08969, over 3925579.95 frames. ], batch size: 92, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:07:47,714 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 22 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-18 03:07:48,408 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.96 vs. limit=15.0 2024-08-18 03:07:50,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3664420.0, ans=0.0 2024-08-18 03:08:06,873 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-18 03:08:10,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3664620.0, ans=0.125 2024-08-18 03:08:14,198 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 20 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-18 03:08:35,185 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 8850, loss[loss=0.09797, beats_loss=0.01032, ecapa_loss=0.000154, whisper_loss=0.08612, over 15198.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01074, ecapa_loss=0.0001457, whisper_loss=0.08943, over 3896209.81 frames. ], batch size: 61, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:08:38,285 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.14 vs. limit=22.5 2024-08-18 03:08:47,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3664920.0, ans=0.0 2024-08-18 03:08:58,345 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.273e+01 2.462e+01 2.757e+01 3.654e+01, threshold=4.925e+01, percent-clipped=0.0 2024-08-18 03:09:13,379 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.20 vs. limit=15.0 2024-08-18 03:09:16,995 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 12 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-18 03:09:20,219 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.65 vs. limit=10.0 2024-08-18 03:09:22,296 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 35 from LS+wenet, 14 from Vox, 46 fro AS 2024-08-18 03:09:40,646 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 8900, loss[loss=0.1218, beats_loss=0.009423, ecapa_loss=0.0001346, whisper_loss=0.111, over 15585.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01076, ecapa_loss=0.0001445, whisper_loss=0.08939, over 3857888.59 frames. ], batch size: 58, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:09:49,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3665320.0, ans=0.0 2024-08-18 03:09:56,110 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 32 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-18 03:10:27,821 WARNING [optim.py:496] (0/4) Scaling gradients by 0.04924134910106659, model_norm_threshold=49.24854278564453 2024-08-18 03:10:28,001 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.21, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.092e+05, grad_sumsq=2.092e+05, orig_rms_sq=1.000e+00 2024-08-18 03:10:48,306 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 8950, loss[loss=0.0981, beats_loss=0.01414, ecapa_loss=0.0001175, whisper_loss=0.08279, over 19118.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01076, ecapa_loss=0.0001455, whisper_loss=0.09013, over 3879192.63 frames. ], batch size: 75, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:10:51,392 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-18 03:10:51,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3665820.0, ans=0.1 2024-08-18 03:10:55,242 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-18 03:10:59,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3665820.0, ans=0.125 2024-08-18 03:11:10,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3665920.0, ans=0.0 2024-08-18 03:11:12,379 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.306e+01 2.588e+01 2.937e+01 1.000e+03, threshold=5.176e+01, percent-clipped=1.0 2024-08-18 03:11:29,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3666120.0, ans=0.125 2024-08-18 03:11:32,190 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 14 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-18 03:11:40,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3666220.0, ans=0.0 2024-08-18 03:11:54,329 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 9000, loss[loss=0.09709, beats_loss=0.01399, ecapa_loss=0.000106, whisper_loss=0.08204, over 19654.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01062, ecapa_loss=0.0001465, whisper_loss=0.09118, over 3880878.81 frames. ], batch size: 78, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:11:54,330 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-18 03:12:27,792 INFO [train_multi_KD3.py:1149] (0/4) Epoch 25, validation on ASR_libri: loss=0.254, beats_loss=0, ecapa_loss=0.0005275, whisper_loss=0.2487, over 922467.00 frames. 2024-08-18 03:12:44,061 INFO [train_multi_KD3.py:1149] (0/4) Epoch 25, validation on SV_voxceleb1: loss=0.004142, beats_loss=0, ecapa_loss=0.0004142, whisper_loss=0, over 939242.00 frames. 2024-08-18 03:14:18,500 INFO [train_multi_KD3.py:1149] (0/4) Epoch 25, validation on AT_audioset: loss=0.02319, beats_loss=0.02319, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 03:14:18,507 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32389MB 2024-08-18 03:14:25,846 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.95 vs. limit=15.0 2024-08-18 03:14:33,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3666420.0, ans=0.125 2024-08-18 03:14:58,990 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 03:14:59,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3666520.0, ans=0.125 2024-08-18 03:15:00,596 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.14 vs. limit=22.5 2024-08-18 03:15:03,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3666620.0, ans=0.125 2024-08-18 03:15:09,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3666620.0, ans=0.0 2024-08-18 03:15:14,997 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.12 vs. limit=15.0 2024-08-18 03:15:24,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3666720.0, ans=0.2 2024-08-18 03:15:30,500 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.692e+01 2024-08-18 03:15:32,506 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 9050, loss[loss=0.07664, beats_loss=0.01275, ecapa_loss=0.000158, whisper_loss=0.06231, over 14356.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0106, ecapa_loss=0.0001465, whisper_loss=0.09114, over 3858065.30 frames. ], batch size: 59, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:15:37,088 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 19 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-18 03:15:46,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3666920.0, ans=0.125 2024-08-18 03:15:54,181 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-18 03:15:59,335 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.276e+01 2.505e+01 2.759e+01 3.701e+01, threshold=5.009e+01, percent-clipped=0.0 2024-08-18 03:16:02,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3667020.0, ans=0.125 2024-08-18 03:16:07,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3667020.0, ans=0.05 2024-08-18 03:16:12,111 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 18 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-18 03:16:25,627 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-18 03:16:41,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3667220.0, ans=0.2 2024-08-18 03:16:44,204 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 9100, loss[loss=0.1038, beats_loss=0.00963, ecapa_loss=0.0001482, whisper_loss=0.09265, over 17228.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01049, ecapa_loss=0.0001468, whisper_loss=0.09139, over 3847805.08 frames. ], batch size: 68, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:17:48,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3667720.0, ans=0.2 2024-08-18 03:17:51,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3667720.0, ans=0.2 2024-08-18 03:17:56,921 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 9150, loss[loss=0.1004, beats_loss=0.01301, ecapa_loss=0.0001292, whisper_loss=0.08607, over 15176.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0106, ecapa_loss=0.0001449, whisper_loss=0.09058, over 3838596.03 frames. ], batch size: 60, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:18:22,953 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.335e+01 2.594e+01 2.955e+01 5.294e+01, threshold=5.187e+01, percent-clipped=1.0 2024-08-18 03:18:23,209 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 16 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-18 03:18:43,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3668120.0, ans=0.1 2024-08-18 03:18:44,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3668120.0, ans=0.2 2024-08-18 03:19:06,099 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 17 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-18 03:19:09,846 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 9200, loss[loss=0.07312, beats_loss=0.01207, ecapa_loss=0.0001541, whisper_loss=0.05951, over 16682.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01058, ecapa_loss=0.000145, whisper_loss=0.0903, over 3837982.03 frames. ], batch size: 70, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:19:18,310 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 18 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-18 03:19:24,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3668420.0, ans=0.0 2024-08-18 03:19:26,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3668420.0, ans=0.1 2024-08-18 03:19:47,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3668520.0, ans=0.125 2024-08-18 03:20:05,581 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-18 03:20:23,729 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 9250, loss[loss=0.1113, beats_loss=0.01119, ecapa_loss=0.0001709, whisper_loss=0.09839, over 22433.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01054, ecapa_loss=0.0001456, whisper_loss=0.09114, over 3873632.74 frames. ], batch size: 91, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:20:30,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3668820.0, ans=0.0 2024-08-18 03:20:33,212 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 24 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-18 03:20:37,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3668920.0, ans=0.125 2024-08-18 03:20:51,779 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.747e+01 2.259e+01 2.533e+01 2.843e+01 4.399e+01, threshold=5.065e+01, percent-clipped=0.0 2024-08-18 03:21:04,906 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 30 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-18 03:21:18,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3669120.0, ans=0.015 2024-08-18 03:21:20,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3669120.0, ans=0.125 2024-08-18 03:21:40,416 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 9300, loss[loss=0.08193, beats_loss=0.01143, ecapa_loss=0.000144, whisper_loss=0.06906, over 16199.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01047, ecapa_loss=0.0001453, whisper_loss=0.09182, over 3864779.32 frames. ], batch size: 67, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:21:56,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3669420.0, ans=0.2 2024-08-18 03:22:08,404 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.80 vs. limit=15.0 2024-08-18 03:22:10,657 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-18 03:22:16,349 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2024-08-18 03:22:43,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3669720.0, ans=0.05 2024-08-18 03:22:48,248 WARNING [optim.py:496] (0/4) Scaling gradients by 0.053159911185503006, model_norm_threshold=50.65474319458008 2024-08-18 03:22:48,419 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.22, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.966e+05, grad_sumsq=1.966e+05, orig_rms_sq=1.000e+00 2024-08-18 03:22:50,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3669720.0, ans=0.2 2024-08-18 03:22:56,866 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 9350, loss[loss=0.08567, beats_loss=0.01619, ecapa_loss=0.0001195, whisper_loss=0.06829, over 19618.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01059, ecapa_loss=0.0001454, whisper_loss=0.09075, over 3851969.69 frames. ], batch size: 81, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:22:57,314 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.609e-03 2024-08-18 03:22:58,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3669820.0, ans=0.2 2024-08-18 03:23:01,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3669820.0, ans=0.0 2024-08-18 03:23:05,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3669820.0, ans=0.2 2024-08-18 03:23:24,285 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.389e+01 2.563e+01 2.950e+01 9.529e+02, threshold=5.125e+01, percent-clipped=2.0 2024-08-18 03:23:32,881 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-18 03:23:33,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3670020.0, ans=0.0 2024-08-18 03:23:43,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3670120.0, ans=0.0 2024-08-18 03:23:43,432 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.42 vs. limit=22.5 2024-08-18 03:23:51,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3670120.0, ans=0.0 2024-08-18 03:24:00,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3670220.0, ans=0.1 2024-08-18 03:24:01,859 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 18 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-18 03:24:04,985 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-18 03:24:11,761 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 9400, loss[loss=0.1011, beats_loss=0.009169, ecapa_loss=0.0001464, whisper_loss=0.09049, over 18869.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0106, ecapa_loss=0.0001452, whisper_loss=0.09033, over 3846969.88 frames. ], batch size: 73, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:24:24,932 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3670420.0, ans=0.07 2024-08-18 03:24:51,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3670520.0, ans=0.125 2024-08-18 03:24:57,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3670620.0, ans=0.125 2024-08-18 03:25:02,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3670620.0, ans=0.125 2024-08-18 03:25:04,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3670620.0, ans=0.1 2024-08-18 03:25:26,774 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 9450, loss[loss=0.1033, beats_loss=0.01161, ecapa_loss=0.0001165, whisper_loss=0.09049, over 23470.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01058, ecapa_loss=0.0001451, whisper_loss=0.09053, over 3857846.86 frames. ], batch size: 92, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:25:31,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3670820.0, ans=0.125 2024-08-18 03:25:33,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3670820.0, ans=0.2 2024-08-18 03:25:45,310 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.08 vs. limit=15.0 2024-08-18 03:25:53,977 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-18 03:25:55,276 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.676e+01 2.213e+01 2.450e+01 2.826e+01 4.775e+01, threshold=4.900e+01, percent-clipped=0.0 2024-08-18 03:25:59,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3671020.0, ans=0.04949747468305833 2024-08-18 03:26:00,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3671020.0, ans=0.07 2024-08-18 03:26:19,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=3671120.0, ans=15.0 2024-08-18 03:26:22,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3671120.0, ans=0.0 2024-08-18 03:26:31,049 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.51 vs. limit=15.0 2024-08-18 03:26:32,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3671220.0, ans=0.0 2024-08-18 03:26:45,633 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 9500, loss[loss=0.1024, beats_loss=0.009461, ecapa_loss=0.0001624, whisper_loss=0.09132, over 19859.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01048, ecapa_loss=0.0001459, whisper_loss=0.09117, over 3884409.68 frames. ], batch size: 80, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:27:06,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3671420.0, ans=0.125 2024-08-18 03:27:18,427 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 28 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-18 03:27:33,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3671620.0, ans=0.04949747468305833 2024-08-18 03:27:59,178 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 9550, loss[loss=0.08993, beats_loss=0.01331, ecapa_loss=0.0001483, whisper_loss=0.07514, over 22174.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01043, ecapa_loss=0.0001466, whisper_loss=0.09097, over 3843128.02 frames. ], batch size: 95, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:28:03,765 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-18 03:28:06,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3671820.0, ans=0.125 2024-08-18 03:28:09,513 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 16 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-18 03:28:25,963 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.340e+01 2.613e+01 3.002e+01 5.321e+01, threshold=5.225e+01, percent-clipped=2.0 2024-08-18 03:28:27,368 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-18 03:28:54,273 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-18 03:28:54,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=3672120.0, ans=15.0 2024-08-18 03:29:11,736 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 9600, loss[loss=0.09869, beats_loss=0.01284, ecapa_loss=0.0001109, whisper_loss=0.08474, over 17257.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0104, ecapa_loss=0.0001463, whisper_loss=0.09107, over 3824739.18 frames. ], batch size: 67, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:29:19,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3672320.0, ans=0.125 2024-08-18 03:29:29,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3672420.0, ans=0.2 2024-08-18 03:29:31,835 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.544e+05 2024-08-18 03:29:33,394 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.81 vs. limit=10.0 2024-08-18 03:29:33,492 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.65 vs. limit=15.0 2024-08-18 03:29:51,478 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 22 from LS+wenet, 13 from Vox, 19 fro AS 2024-08-18 03:29:54,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3672520.0, ans=0.125 2024-08-18 03:30:08,273 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 30 from LS+wenet, 13 from Vox, 18 fro AS 2024-08-18 03:30:14,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3672720.0, ans=0.125 2024-08-18 03:30:17,415 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.70 vs. limit=15.0 2024-08-18 03:30:24,981 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 9650, loss[loss=0.08496, beats_loss=0.01, ecapa_loss=0.0001356, whisper_loss=0.0736, over 14834.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01025, ecapa_loss=0.0001469, whisper_loss=0.09164, over 3799348.84 frames. ], batch size: 58, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:30:38,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3672920.0, ans=0.125 2024-08-18 03:30:40,688 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-18 03:30:50,766 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.281e+01 2.485e+01 2.819e+01 4.590e+01, threshold=4.970e+01, percent-clipped=0.0 2024-08-18 03:31:05,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3673020.0, ans=0.125 2024-08-18 03:31:13,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3673120.0, ans=0.125 2024-08-18 03:31:38,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3673220.0, ans=0.0 2024-08-18 03:31:46,276 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 9700, loss[loss=0.1081, beats_loss=0.01155, ecapa_loss=0.0001476, whisper_loss=0.09512, over 21355.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01039, ecapa_loss=0.0001478, whisper_loss=0.09113, over 3849076.73 frames. ], batch size: 87, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:31:47,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3673320.0, ans=0.0 2024-08-18 03:31:56,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.whiten.whitening_limit, batch_count=3673320.0, ans=12.0 2024-08-18 03:31:59,419 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-18 03:32:01,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3673320.0, ans=0.1 2024-08-18 03:32:01,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3673320.0, ans=0.1 2024-08-18 03:32:02,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3673420.0, ans=0.1 2024-08-18 03:32:24,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3673520.0, ans=0.125 2024-08-18 03:32:31,194 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.81 vs. limit=12.0 2024-08-18 03:32:36,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3673520.0, ans=0.125 2024-08-18 03:32:41,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3673620.0, ans=0.0 2024-08-18 03:32:59,598 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 19 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-18 03:33:12,934 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 9750, loss[loss=0.1184, beats_loss=0.01011, ecapa_loss=0.0001247, whisper_loss=0.1071, over 23983.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01043, ecapa_loss=0.0001465, whisper_loss=0.09085, over 3838849.25 frames. ], batch size: 90, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:33:18,607 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 03:33:47,832 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.256e+01 2.570e+01 2.973e+01 3.927e+01, threshold=5.141e+01, percent-clipped=0.0 2024-08-18 03:33:48,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3673920.0, ans=10.0 2024-08-18 03:33:53,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3674020.0, ans=0.0 2024-08-18 03:34:16,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3674120.0, ans=0.0 2024-08-18 03:34:34,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3674220.0, ans=0.125 2024-08-18 03:34:41,973 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 24 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-18 03:34:43,950 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 26 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-18 03:34:55,688 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 9800, loss[loss=0.1078, beats_loss=0.00932, ecapa_loss=0.000162, whisper_loss=0.09683, over 19833.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01038, ecapa_loss=0.0001483, whisper_loss=0.09093, over 3850722.68 frames. ], batch size: 79, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:35:36,044 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 22 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-18 03:35:38,240 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 16 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-18 03:35:40,640 WARNING [optim.py:496] (0/4) Scaling gradients by 0.09654395282268524, model_norm_threshold=51.40534973144531 2024-08-18 03:35:40,808 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.0.self_attn_weights.in_proj.bias with proportion 0.22, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.176e+04, grad_sumsq=6.832e+03, orig_rms_sq=9.039e+00 2024-08-18 03:35:42,225 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.16 vs. limit=6.0 2024-08-18 03:36:15,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3674720.0, ans=0.2 2024-08-18 03:36:29,762 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3674720.0, ans=0.125 2024-08-18 03:36:29,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3674720.0, ans=0.125 2024-08-18 03:36:35,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3674820.0, ans=0.125 2024-08-18 03:36:36,468 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 9850, loss[loss=0.09866, beats_loss=0.009414, ecapa_loss=0.0001975, whisper_loss=0.08727, over 21086.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01047, ecapa_loss=0.0001482, whisper_loss=0.09013, over 3839119.86 frames. ], batch size: 89, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:37:02,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3674920.0, ans=0.125 2024-08-18 03:37:06,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3674920.0, ans=0.0 2024-08-18 03:37:12,183 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.872e+01 2.310e+01 2.559e+01 2.794e+01 5.325e+02, threshold=5.118e+01, percent-clipped=2.0 2024-08-18 03:37:57,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3675220.0, ans=0.1 2024-08-18 03:38:03,727 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.24 vs. limit=15.0 2024-08-18 03:38:21,314 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 9900, loss[loss=0.1234, beats_loss=0.008884, ecapa_loss=0.0001284, whisper_loss=0.1132, over 16507.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01047, ecapa_loss=0.0001477, whisper_loss=0.09082, over 3869560.40 frames. ], batch size: 62, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:38:55,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3675520.0, ans=0.1 2024-08-18 03:38:57,254 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.44 vs. limit=15.0 2024-08-18 03:39:13,737 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-18 03:39:16,979 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-18 03:39:34,816 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 9950, loss[loss=0.08689, beats_loss=0.01095, ecapa_loss=0.0001379, whisper_loss=0.07456, over 19116.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01057, ecapa_loss=0.0001474, whisper_loss=0.09053, over 3873970.09 frames. ], batch size: 74, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:39:49,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3675920.0, ans=0.1 2024-08-18 03:40:00,872 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.255e+01 2.488e+01 2.824e+01 3.882e+01, threshold=4.975e+01, percent-clipped=0.0 2024-08-18 03:40:07,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3676020.0, ans=0.1 2024-08-18 03:40:11,012 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.08 vs. limit=22.5 2024-08-18 03:40:35,317 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.71 vs. limit=5.0 2024-08-18 03:40:35,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3676220.0, ans=0.025 2024-08-18 03:40:46,751 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.88 vs. limit=6.0 2024-08-18 03:40:48,482 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 10000, loss[loss=0.1167, beats_loss=0.009651, ecapa_loss=0.0001804, whisper_loss=0.1053, over 22932.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01049, ecapa_loss=0.0001468, whisper_loss=0.09126, over 3890641.45 frames. ], batch size: 97, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:40:49,414 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.35 vs. limit=15.0 2024-08-18 03:41:03,821 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3676420.0, ans=0.125 2024-08-18 03:41:45,410 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 18 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-18 03:41:59,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3676720.0, ans=0.0 2024-08-18 03:42:00,142 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 33 from LS+wenet, 14 from Vox, 47 fro AS 2024-08-18 03:42:03,548 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 33 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-18 03:42:05,021 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 10050, loss[loss=0.1125, beats_loss=0.01171, ecapa_loss=0.0001306, whisper_loss=0.0995, over 23844.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01054, ecapa_loss=0.0001467, whisper_loss=0.09086, over 3915837.26 frames. ], batch size: 94, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:42:05,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3676820.0, ans=0.125 2024-08-18 03:42:31,486 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.717e+01 2.309e+01 2.495e+01 2.743e+01 6.109e+01, threshold=4.990e+01, percent-clipped=1.0 2024-08-18 03:42:47,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3677120.0, ans=0.0 2024-08-18 03:42:53,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3677120.0, ans=0.125 2024-08-18 03:42:58,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3677120.0, ans=0.125 2024-08-18 03:43:05,167 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-18 03:43:10,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3677220.0, ans=0.2 2024-08-18 03:43:17,838 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 10100, loss[loss=0.1109, beats_loss=0.008278, ecapa_loss=0.0001524, whisper_loss=0.1011, over 16036.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01054, ecapa_loss=0.000147, whisper_loss=0.09093, over 3925281.12 frames. ], batch size: 64, lr: 2.44e-03, grad_scale: 1.152921504606847e+18 2024-08-18 03:43:36,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3677420.0, ans=0.1 2024-08-18 03:43:36,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3677420.0, ans=0.1 2024-08-18 03:43:51,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3677520.0, ans=0.125 2024-08-18 03:44:01,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3677620.0, ans=0.125 2024-08-18 03:44:07,230 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 28 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-18 03:44:16,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3677720.0, ans=0.125 2024-08-18 03:44:19,143 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-18 03:44:27,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3677720.0, ans=0.0 2024-08-18 03:44:32,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3677820.0, ans=0.2 2024-08-18 03:44:34,272 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 10150, loss[loss=0.1148, beats_loss=0.009936, ecapa_loss=0.000158, whisper_loss=0.1033, over 22530.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0107, ecapa_loss=0.0001482, whisper_loss=0.09013, over 3939003.08 frames. ], batch size: 91, lr: 2.44e-03, grad_scale: 1.152921504606847e+18 2024-08-18 03:44:34,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3677820.0, ans=0.125 2024-08-18 03:44:59,668 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.325e+01 2.524e+01 2.849e+01 6.626e+01, threshold=5.048e+01, percent-clipped=1.0 2024-08-18 03:45:02,289 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.42 vs. limit=6.0 2024-08-18 03:45:04,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3678020.0, ans=0.1 2024-08-18 03:45:04,663 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.75 vs. limit=22.5 2024-08-18 03:45:05,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3678020.0, ans=0.1 2024-08-18 03:45:08,502 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 27 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-18 03:45:14,609 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-18 03:45:29,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3678120.0, ans=0.125 2024-08-18 03:45:35,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3678220.0, ans=0.125 2024-08-18 03:45:47,065 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 10200, loss[loss=0.1265, beats_loss=0.007983, ecapa_loss=0.0001432, whisper_loss=0.1171, over 21218.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01057, ecapa_loss=0.0001484, whisper_loss=0.09071, over 3948155.43 frames. ], batch size: 81, lr: 2.44e-03, grad_scale: 1.152921504606847e+18 2024-08-18 03:46:03,037 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-18 03:46:07,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3678420.0, ans=0.0 2024-08-18 03:46:32,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3678620.0, ans=10.0 2024-08-18 03:46:53,809 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-18 03:46:55,064 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 22 from LS+wenet, 27 from Vox, 45 fro AS 2024-08-18 03:46:58,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3678720.0, ans=0.0 2024-08-18 03:47:01,132 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 10250, loss[loss=0.1319, beats_loss=0.008668, ecapa_loss=0.000155, whisper_loss=0.1217, over 19383.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01062, ecapa_loss=0.0001475, whisper_loss=0.09064, over 3949795.29 frames. ], batch size: 73, lr: 2.44e-03, grad_scale: 1.152921504606847e+18 2024-08-18 03:47:11,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3678820.0, ans=0.2 2024-08-18 03:47:11,392 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.276e+01 2024-08-18 03:47:27,725 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.753e+01 2.259e+01 2.500e+01 2.782e+01 3.829e+01, threshold=5.000e+01, percent-clipped=0.0 2024-08-18 03:47:35,328 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-18 03:47:57,895 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 29 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-18 03:48:07,922 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 16 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-18 03:48:12,623 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.05 vs. limit=15.0 2024-08-18 03:48:15,379 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 10300, loss[loss=0.1061, beats_loss=0.009143, ecapa_loss=0.0001367, whisper_loss=0.09557, over 24215.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01061, ecapa_loss=0.0001465, whisper_loss=0.09029, over 3950095.75 frames. ], batch size: 91, lr: 2.44e-03, grad_scale: 1.152921504606847e+18 2024-08-18 03:48:24,480 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 30 from LS+wenet, 10 from Vox, 28 fro AS 2024-08-18 03:48:37,795 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-18 03:48:38,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3679420.0, ans=0.0 2024-08-18 03:48:43,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=3679420.0, ans=0.95 2024-08-18 03:48:43,488 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.263e+01 2024-08-18 03:48:43,738 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.40 vs. limit=15.0 2024-08-18 03:49:10,341 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-18 03:49:12,660 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-18 03:49:19,997 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 23 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-18 03:49:28,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3679820.0, ans=0.0 2024-08-18 03:49:30,239 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 10350, loss[loss=0.1033, beats_loss=0.01042, ecapa_loss=0.0001651, whisper_loss=0.09126, over 17155.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01063, ecapa_loss=0.0001461, whisper_loss=0.08938, over 3921724.14 frames. ], batch size: 73, lr: 2.44e-03, grad_scale: 1.152921504606847e+18 2024-08-18 03:49:33,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3679820.0, ans=0.1 2024-08-18 03:49:40,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3679820.0, ans=0.1 2024-08-18 03:49:42,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3679820.0, ans=0.1 2024-08-18 03:49:46,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3679920.0, ans=0.0 2024-08-18 03:49:46,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3679920.0, ans=0.125 2024-08-18 03:49:55,124 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-368000.pt 2024-08-18 03:49:59,400 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.940e+01 2.361e+01 2.642e+01 2.935e+01 4.206e+01, threshold=5.285e+01, percent-clipped=0.0 2024-08-18 03:49:59,643 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 20 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-18 03:50:03,651 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.46 vs. limit=15.0 2024-08-18 03:50:36,198 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-18 03:50:45,831 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 18 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-18 03:50:48,602 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 10400, loss[loss=0.1107, beats_loss=0.01073, ecapa_loss=0.0001974, whisper_loss=0.09802, over 14404.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01074, ecapa_loss=0.0001442, whisper_loss=0.08881, over 3892517.62 frames. ], batch size: 61, lr: 2.44e-03, grad_scale: 1.152921504606847e+18 2024-08-18 03:50:53,406 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-18 03:50:59,357 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-18 03:51:01,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3680320.0, ans=0.125 2024-08-18 03:51:24,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=3680520.0, ans=0.025 2024-08-18 03:51:31,196 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-18 03:51:36,702 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 27 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-18 03:51:40,252 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-18 03:51:43,278 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 19 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-18 03:51:49,835 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 24 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-18 03:52:02,461 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 10450, loss[loss=0.08789, beats_loss=0.00984, ecapa_loss=0.000178, whisper_loss=0.07627, over 19615.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01071, ecapa_loss=0.0001448, whisper_loss=0.08878, over 3896801.69 frames. ], batch size: 84, lr: 2.44e-03, grad_scale: 1.152921504606847e+18 2024-08-18 03:52:04,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3680820.0, ans=0.0 2024-08-18 03:52:06,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3680820.0, ans=0.2 2024-08-18 03:52:06,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3680820.0, ans=0.125 2024-08-18 03:52:11,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3680820.0, ans=0.2 2024-08-18 03:52:28,739 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.738e+01 2.400e+01 2.629e+01 2.967e+01 1.519e+02, threshold=5.258e+01, percent-clipped=2.0 2024-08-18 03:52:37,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3681020.0, ans=0.2 2024-08-18 03:52:42,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3681020.0, ans=0.125 2024-08-18 03:52:48,805 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.93 vs. limit=15.0 2024-08-18 03:52:58,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3681120.0, ans=10.0 2024-08-18 03:53:04,020 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 30 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-18 03:53:16,361 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 10500, loss[loss=0.09518, beats_loss=0.01166, ecapa_loss=0.0001541, whisper_loss=0.08198, over 22656.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01071, ecapa_loss=0.000145, whisper_loss=0.08951, over 3955310.27 frames. ], batch size: 92, lr: 2.44e-03, grad_scale: 1.152921504606847e+18 2024-08-18 03:53:16,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3681320.0, ans=0.0 2024-08-18 03:53:19,655 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.996e+00 2024-08-18 03:53:20,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3681320.0, ans=0.125 2024-08-18 03:53:42,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3681420.0, ans=0.125 2024-08-18 03:53:47,808 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-18 03:53:52,258 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-18 03:53:55,117 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-18 03:53:57,719 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-18 03:54:15,394 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 41 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-18 03:54:21,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3681720.0, ans=0.125 2024-08-18 03:54:31,623 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 10550, loss[loss=0.126, beats_loss=0.008389, ecapa_loss=0.0001659, whisper_loss=0.1159, over 15141.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01062, ecapa_loss=0.0001461, whisper_loss=0.08998, over 3939721.30 frames. ], batch size: 59, lr: 2.44e-03, grad_scale: 1.152921504606847e+18 2024-08-18 03:54:49,194 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-18 03:54:51,825 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.09 vs. limit=22.5 2024-08-18 03:54:57,721 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.705e+01 2.276e+01 2.579e+01 2.977e+01 5.501e+01, threshold=5.159e+01, percent-clipped=1.0 2024-08-18 03:55:01,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3682020.0, ans=0.1 2024-08-18 03:55:13,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3682020.0, ans=0.1 2024-08-18 03:55:16,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3682120.0, ans=0.125 2024-08-18 03:55:51,063 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 10600, loss[loss=0.0959, beats_loss=0.01125, ecapa_loss=0.0001628, whisper_loss=0.08302, over 18710.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01063, ecapa_loss=0.0001464, whisper_loss=0.0895, over 3939529.38 frames. ], batch size: 79, lr: 2.44e-03, grad_scale: 1.152921504606847e+18 2024-08-18 03:55:51,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3682320.0, ans=0.125 2024-08-18 03:55:52,582 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-18 03:56:09,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3682420.0, ans=0.0 2024-08-18 03:56:14,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3682420.0, ans=0.0 2024-08-18 03:56:24,119 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.61 vs. limit=6.0 2024-08-18 03:56:49,598 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-18 03:56:53,819 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.46 vs. limit=15.0 2024-08-18 03:56:57,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3682720.0, ans=0.0 2024-08-18 03:57:07,019 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 10650, loss[loss=0.1246, beats_loss=0.00726, ecapa_loss=0.0001521, whisper_loss=0.1158, over 19315.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01061, ecapa_loss=0.0001454, whisper_loss=0.08952, over 3901369.19 frames. ], batch size: 74, lr: 2.44e-03, grad_scale: 1.152921504606847e+18 2024-08-18 03:57:17,766 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 23 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-18 03:57:30,985 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 23 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-18 03:57:33,978 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.895e+01 2.262e+01 2.551e+01 2.791e+01 4.688e+01, threshold=5.101e+01, percent-clipped=0.0 2024-08-18 03:58:01,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3683120.0, ans=0.125 2024-08-18 03:58:10,795 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.08 vs. limit=15.0 2024-08-18 03:58:23,110 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 10700, loss[loss=0.09508, beats_loss=0.013, ecapa_loss=0.0001294, whisper_loss=0.08079, over 22569.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01056, ecapa_loss=0.0001452, whisper_loss=0.09008, over 3909767.98 frames. ], batch size: 89, lr: 2.44e-03, grad_scale: 1.152921504606847e+18 2024-08-18 03:58:37,990 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.02 vs. limit=15.0 2024-08-18 03:58:51,312 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.07 vs. limit=15.0 2024-08-18 03:58:52,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3683520.0, ans=0.0 2024-08-18 03:59:00,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3683520.0, ans=0.1 2024-08-18 03:59:34,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3683720.0, ans=0.2 2024-08-18 03:59:36,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3683720.0, ans=0.0 2024-08-18 03:59:38,898 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 10750, loss[loss=0.09855, beats_loss=0.009413, ecapa_loss=0.0001809, whisper_loss=0.08732, over 22192.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01064, ecapa_loss=0.0001447, whisper_loss=0.09037, over 3925435.12 frames. ], batch size: 90, lr: 2.44e-03, grad_scale: 1.152921504606847e+18 2024-08-18 03:59:46,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3683820.0, ans=0.0 2024-08-18 03:59:49,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3683820.0, ans=0.125 2024-08-18 03:59:52,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3683920.0, ans=0.1 2024-08-18 04:00:00,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3683920.0, ans=0.2 2024-08-18 04:00:03,922 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.722e+01 2.308e+01 2.481e+01 2.750e+01 3.346e+01, threshold=4.962e+01, percent-clipped=0.0 2024-08-18 04:00:18,477 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.94 vs. limit=5.0 2024-08-18 04:00:34,048 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-18 04:00:43,197 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.94 vs. limit=15.0 2024-08-18 04:00:53,197 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 10800, loss[loss=0.1067, beats_loss=0.008818, ecapa_loss=0.0001665, whisper_loss=0.09625, over 18206.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01068, ecapa_loss=0.0001451, whisper_loss=0.09025, over 3904636.32 frames. ], batch size: 73, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:00:53,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3684320.0, ans=0.125 2024-08-18 04:00:58,483 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=6.320e+01 2024-08-18 04:01:01,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3684320.0, ans=0.125 2024-08-18 04:01:15,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3684420.0, ans=0.0 2024-08-18 04:01:18,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3684420.0, ans=0.0 2024-08-18 04:01:23,443 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.25 vs. limit=22.5 2024-08-18 04:01:24,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3684520.0, ans=0.125 2024-08-18 04:01:31,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3684520.0, ans=0.125 2024-08-18 04:01:32,535 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 36 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-18 04:01:40,097 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 34 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-18 04:01:43,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn1.whiten.whitening_limit, batch_count=3684620.0, ans=22.5 2024-08-18 04:01:46,039 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-18 04:01:46,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3684620.0, ans=0.1 2024-08-18 04:01:56,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3684720.0, ans=0.1 2024-08-18 04:02:04,876 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.986e+00 2024-08-18 04:02:04,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3684820.0, ans=0.125 2024-08-18 04:02:06,249 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 10850, loss[loss=0.1171, beats_loss=0.01025, ecapa_loss=0.0001749, whisper_loss=0.1051, over 22010.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01064, ecapa_loss=0.0001453, whisper_loss=0.09086, over 3903712.75 frames. ], batch size: 91, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:02:06,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3684820.0, ans=0.125 2024-08-18 04:02:10,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3684820.0, ans=0.1 2024-08-18 04:02:13,411 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-18 04:02:13,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3684820.0, ans=0.0 2024-08-18 04:02:14,523 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 30 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-18 04:02:14,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3684820.0, ans=0.0 2024-08-18 04:02:31,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3684920.0, ans=0.2 2024-08-18 04:02:34,361 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.724e+01 2.394e+01 2.623e+01 3.020e+01 4.318e+02, threshold=5.247e+01, percent-clipped=1.0 2024-08-18 04:02:42,026 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 11 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-18 04:02:45,494 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.40 vs. limit=15.0 2024-08-18 04:02:50,638 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-18 04:03:13,766 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-18 04:03:15,161 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 15 from LS+wenet, 27 from Vox, 46 fro AS 2024-08-18 04:03:19,491 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 10900, loss[loss=0.1049, beats_loss=0.01043, ecapa_loss=0.0001281, whisper_loss=0.09316, over 23911.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01067, ecapa_loss=0.0001453, whisper_loss=0.09061, over 3952010.96 frames. ], batch size: 91, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:03:44,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3685420.0, ans=0.5 2024-08-18 04:03:46,389 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-18 04:03:54,309 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.02 vs. limit=12.0 2024-08-18 04:04:21,538 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 31 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-18 04:04:27,413 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.56 vs. limit=12.0 2024-08-18 04:04:30,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3685820.0, ans=0.2 2024-08-18 04:04:31,607 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 10950, loss[loss=0.1172, beats_loss=0.009138, ecapa_loss=0.0001182, whisper_loss=0.1069, over 17479.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01066, ecapa_loss=0.0001436, whisper_loss=0.09113, over 3947583.11 frames. ], batch size: 67, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:04:35,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3685820.0, ans=0.2 2024-08-18 04:04:40,803 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.94 vs. limit=12.0 2024-08-18 04:04:50,061 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-18 04:04:58,726 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.317e+01 2.614e+01 2.946e+01 3.732e+01, threshold=5.227e+01, percent-clipped=0.0 2024-08-18 04:04:59,496 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.34 vs. limit=15.0 2024-08-18 04:05:04,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3686020.0, ans=0.1 2024-08-18 04:05:05,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3686020.0, ans=0.2 2024-08-18 04:05:17,607 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 29 from LS+wenet, 13 from Vox, 41 fro AS 2024-08-18 04:05:31,015 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.31 vs. limit=15.0 2024-08-18 04:05:39,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3686220.0, ans=0.125 2024-08-18 04:05:43,814 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 11000, loss[loss=0.08305, beats_loss=0.01097, ecapa_loss=0.0001407, whisper_loss=0.07068, over 20375.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01055, ecapa_loss=0.0001448, whisper_loss=0.0913, over 3970370.03 frames. ], batch size: 84, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:05:49,657 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 16 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-18 04:05:50,375 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.88 vs. limit=22.5 2024-08-18 04:06:44,649 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-18 04:06:51,320 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 24 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-18 04:06:59,328 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 11050, loss[loss=0.113, beats_loss=0.008648, ecapa_loss=0.0001703, whisper_loss=0.1027, over 21390.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01047, ecapa_loss=0.0001451, whisper_loss=0.09145, over 3938363.84 frames. ], batch size: 88, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:07:00,905 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 27 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-18 04:07:03,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3686820.0, ans=0.0 2024-08-18 04:07:09,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3686820.0, ans=0.1 2024-08-18 04:07:10,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3686820.0, ans=0.1 2024-08-18 04:07:13,594 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.75 vs. limit=15.0 2024-08-18 04:07:24,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3686920.0, ans=0.0 2024-08-18 04:07:25,547 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.992e+01 2.316e+01 2.430e+01 2.735e+01 3.688e+01, threshold=4.860e+01, percent-clipped=0.0 2024-08-18 04:07:25,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3687020.0, ans=0.2 2024-08-18 04:07:32,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3687020.0, ans=0.1 2024-08-18 04:07:53,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3687220.0, ans=0.2 2024-08-18 04:08:04,305 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 35 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-18 04:08:04,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3687220.0, ans=0.0 2024-08-18 04:08:04,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3687220.0, ans=0.2 2024-08-18 04:08:08,673 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 11100, loss[loss=0.1039, beats_loss=0.009949, ecapa_loss=0.0001631, whisper_loss=0.09232, over 16893.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01043, ecapa_loss=0.000146, whisper_loss=0.09175, over 3929357.63 frames. ], batch size: 68, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:08:11,965 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 04:08:20,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3687320.0, ans=0.125 2024-08-18 04:08:25,012 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.50 vs. limit=6.0 2024-08-18 04:08:27,512 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-18 04:08:33,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3687420.0, ans=0.2 2024-08-18 04:08:38,255 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-18 04:09:04,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3687620.0, ans=0.1 2024-08-18 04:09:05,225 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 23 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-18 04:09:05,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3687620.0, ans=0.125 2024-08-18 04:09:13,862 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 9 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-18 04:09:16,871 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=7.040e+00 2024-08-18 04:09:24,557 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 11150, loss[loss=0.1021, beats_loss=0.009621, ecapa_loss=0.0001928, whisper_loss=0.09058, over 18879.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01046, ecapa_loss=0.0001455, whisper_loss=0.09065, over 3921086.05 frames. ], batch size: 82, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:09:31,143 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-18 04:09:40,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3687920.0, ans=0.1 2024-08-18 04:09:42,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3687920.0, ans=0.2 2024-08-18 04:09:50,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3687920.0, ans=0.2 2024-08-18 04:09:50,489 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.69 vs. limit=22.5 2024-08-18 04:09:53,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3687920.0, ans=0.1 2024-08-18 04:09:53,788 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.407e+01 2.639e+01 3.028e+01 3.278e+02, threshold=5.278e+01, percent-clipped=1.0 2024-08-18 04:10:00,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3688020.0, ans=0.125 2024-08-18 04:10:03,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3688020.0, ans=0.1 2024-08-18 04:10:06,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3688020.0, ans=0.125 2024-08-18 04:10:10,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3688120.0, ans=0.2 2024-08-18 04:10:36,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3688220.0, ans=0.125 2024-08-18 04:10:40,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=3688220.0, ans=0.02 2024-08-18 04:10:43,185 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 11200, loss[loss=0.1062, beats_loss=0.01149, ecapa_loss=0.0001417, whisper_loss=0.09333, over 17627.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01043, ecapa_loss=0.0001463, whisper_loss=0.09072, over 3916679.37 frames. ], batch size: 73, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:10:54,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3688320.0, ans=0.09899494936611666 2024-08-18 04:11:06,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3688420.0, ans=0.125 2024-08-18 04:11:54,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3688720.0, ans=0.2 2024-08-18 04:11:59,201 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 11250, loss[loss=0.1092, beats_loss=0.01038, ecapa_loss=0.0001475, whisper_loss=0.09731, over 21697.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01049, ecapa_loss=0.000147, whisper_loss=0.09031, over 3897906.97 frames. ], batch size: 88, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:12:17,237 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-18 04:12:27,971 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.791e+01 2.315e+01 2.575e+01 2.987e+01 7.220e+01, threshold=5.150e+01, percent-clipped=1.0 2024-08-18 04:12:28,593 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.51 vs. limit=15.0 2024-08-18 04:12:41,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3689020.0, ans=0.125 2024-08-18 04:12:51,992 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.65 vs. limit=15.0 2024-08-18 04:12:54,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3689120.0, ans=0.0 2024-08-18 04:12:54,962 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.36 vs. limit=6.0 2024-08-18 04:13:04,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3689220.0, ans=0.2 2024-08-18 04:13:14,371 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 11300, loss[loss=0.1192, beats_loss=0.0104, ecapa_loss=0.0001427, whisper_loss=0.1074, over 22309.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01043, ecapa_loss=0.0001456, whisper_loss=0.09138, over 3915888.66 frames. ], batch size: 91, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:13:16,237 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.20 vs. limit=15.0 2024-08-18 04:13:21,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3689320.0, ans=0.125 2024-08-18 04:13:22,156 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 26 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-18 04:13:55,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3689520.0, ans=0.5 2024-08-18 04:13:59,871 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.73 vs. limit=12.0 2024-08-18 04:14:04,080 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-18 04:14:07,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3689620.0, ans=0.125 2024-08-18 04:14:13,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3689620.0, ans=0.125 2024-08-18 04:14:16,439 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 18 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-18 04:14:21,228 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-18 04:14:31,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3689720.0, ans=0.2 2024-08-18 04:14:33,650 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 11350, loss[loss=0.1124, beats_loss=0.01252, ecapa_loss=0.0001312, whisper_loss=0.09855, over 22828.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01036, ecapa_loss=0.0001465, whisper_loss=0.09116, over 3917399.70 frames. ], batch size: 91, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:14:42,051 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 21 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-18 04:14:51,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3689920.0, ans=0.0 2024-08-18 04:15:03,615 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+01 2.324e+01 2.496e+01 2.827e+01 4.121e+01, threshold=4.992e+01, percent-clipped=0.0 2024-08-18 04:15:18,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3690120.0, ans=0.04949747468305833 2024-08-18 04:15:24,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3690120.0, ans=0.0 2024-08-18 04:15:38,536 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 29 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-18 04:15:49,489 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 11400, loss[loss=0.109, beats_loss=0.007984, ecapa_loss=0.000161, whisper_loss=0.09937, over 15880.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01033, ecapa_loss=0.0001457, whisper_loss=0.0916, over 3904580.37 frames. ], batch size: 62, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:15:49,674 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-18 04:15:50,238 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.04 vs. limit=15.0 2024-08-18 04:16:11,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3690420.0, ans=0.0 2024-08-18 04:16:19,802 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 22 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-18 04:16:37,462 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-18 04:16:39,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=3690620.0, ans=15.0 2024-08-18 04:16:40,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3690620.0, ans=0.0 2024-08-18 04:17:07,985 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 11450, loss[loss=0.0877, beats_loss=0.009457, ecapa_loss=0.0001525, whisper_loss=0.07672, over 14800.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01038, ecapa_loss=0.000146, whisper_loss=0.09145, over 3881144.06 frames. ], batch size: 57, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:17:09,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3690820.0, ans=0.1 2024-08-18 04:17:14,124 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 18 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-18 04:17:28,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3690920.0, ans=0.125 2024-08-18 04:17:33,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3690920.0, ans=0.2 2024-08-18 04:17:37,494 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.265e+01 2.441e+01 2.752e+01 3.778e+01, threshold=4.882e+01, percent-clipped=0.0 2024-08-18 04:17:45,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3691020.0, ans=0.1 2024-08-18 04:17:48,611 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 38 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-18 04:17:56,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3691120.0, ans=0.0 2024-08-18 04:17:57,656 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-18 04:18:06,216 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 04:18:06,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3691120.0, ans=0.04949747468305833 2024-08-18 04:18:26,249 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 11500, loss[loss=0.08742, beats_loss=0.01112, ecapa_loss=0.0001163, whisper_loss=0.07514, over 16089.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0104, ecapa_loss=0.0001463, whisper_loss=0.09145, over 3895870.04 frames. ], batch size: 63, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:18:32,244 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.63 vs. limit=22.5 2024-08-18 04:18:53,789 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-18 04:18:55,765 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 16 from LS+wenet, 32 from Vox, 42 fro AS 2024-08-18 04:18:59,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3691520.0, ans=0.125 2024-08-18 04:19:29,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3691720.0, ans=0.1 2024-08-18 04:19:42,745 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 11550, loss[loss=0.1021, beats_loss=0.009879, ecapa_loss=0.0001563, whisper_loss=0.09069, over 16710.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01044, ecapa_loss=0.0001464, whisper_loss=0.09052, over 3853128.24 frames. ], batch size: 64, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:19:58,758 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 22 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-18 04:20:05,629 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.62 vs. limit=22.5 2024-08-18 04:20:12,648 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.438e+01 2.681e+01 2.984e+01 2.148e+02, threshold=5.363e+01, percent-clipped=1.0 2024-08-18 04:20:32,294 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 18 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-18 04:20:35,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3692120.0, ans=0.125 2024-08-18 04:20:54,674 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 23 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-18 04:20:56,048 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 11600, loss[loss=0.09261, beats_loss=0.01003, ecapa_loss=0.0001532, whisper_loss=0.08105, over 18629.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01047, ecapa_loss=0.0001462, whisper_loss=0.09005, over 3886491.31 frames. ], batch size: 79, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:21:01,456 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 16 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-18 04:21:08,872 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-18 04:21:23,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3692520.0, ans=0.125 2024-08-18 04:21:23,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3692520.0, ans=0.125 2024-08-18 04:21:23,685 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.40 vs. limit=15.0 2024-08-18 04:21:33,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3692520.0, ans=0.125 2024-08-18 04:21:50,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3692620.0, ans=0.125 2024-08-18 04:21:59,808 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-18 04:22:01,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3692720.0, ans=0.2 2024-08-18 04:22:09,679 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 11650, loss[loss=0.08808, beats_loss=0.01196, ecapa_loss=0.0001514, whisper_loss=0.07461, over 20782.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01052, ecapa_loss=0.0001473, whisper_loss=0.0901, over 3915145.42 frames. ], batch size: 84, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:22:16,518 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-18 04:22:18,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3692820.0, ans=0.2 2024-08-18 04:22:24,956 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-18 04:22:37,300 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+01 2.291e+01 2.477e+01 2.731e+01 1.047e+02, threshold=4.954e+01, percent-clipped=2.0 2024-08-18 04:22:40,433 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 19 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-18 04:22:46,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3693020.0, ans=0.125 2024-08-18 04:22:57,775 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 26 from LS+wenet, 13 from Vox, 17 fro AS 2024-08-18 04:23:04,390 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 25 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-18 04:23:16,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3693220.0, ans=10.0 2024-08-18 04:23:21,296 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 11700, loss[loss=0.1004, beats_loss=0.009927, ecapa_loss=0.0001339, whisper_loss=0.08914, over 23262.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01052, ecapa_loss=0.0001484, whisper_loss=0.09007, over 3905290.69 frames. ], batch size: 89, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:23:21,719 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 21 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-18 04:23:33,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3693420.0, ans=0.125 2024-08-18 04:23:45,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3693420.0, ans=0.125 2024-08-18 04:23:52,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3693520.0, ans=0.2 2024-08-18 04:24:07,058 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 24 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-18 04:24:10,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3693620.0, ans=0.125 2024-08-18 04:24:14,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3693620.0, ans=0.125 2024-08-18 04:24:16,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3693620.0, ans=0.1 2024-08-18 04:24:27,973 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-18 04:24:33,954 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 11750, loss[loss=0.09584, beats_loss=0.01073, ecapa_loss=0.0001594, whisper_loss=0.08352, over 22543.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01054, ecapa_loss=0.0001462, whisper_loss=0.09078, over 3920823.17 frames. ], batch size: 94, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:24:45,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3693820.0, ans=0.0 2024-08-18 04:24:46,797 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 15 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-18 04:25:01,109 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.733e+01 2.257e+01 2.580e+01 2.883e+01 7.198e+01, threshold=5.159e+01, percent-clipped=2.0 2024-08-18 04:25:14,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3694020.0, ans=0.1 2024-08-18 04:25:38,211 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.38 vs. limit=10.0 2024-08-18 04:25:45,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3694320.0, ans=0.2 2024-08-18 04:25:47,026 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 11800, loss[loss=0.102, beats_loss=0.01107, ecapa_loss=0.0001292, whisper_loss=0.08963, over 21904.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01055, ecapa_loss=0.0001458, whisper_loss=0.09096, over 3928684.95 frames. ], batch size: 89, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:26:00,923 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.39 vs. limit=15.0 2024-08-18 04:26:01,472 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-18 04:26:10,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3694420.0, ans=0.125 2024-08-18 04:26:21,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3694520.0, ans=0.0 2024-08-18 04:26:21,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3694520.0, ans=0.0 2024-08-18 04:26:33,331 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 23 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-18 04:26:43,528 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.22 vs. limit=15.0 2024-08-18 04:26:48,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3694720.0, ans=10.0 2024-08-18 04:26:50,919 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 23 from LS+wenet, 24 from Vox, 47 fro AS 2024-08-18 04:26:51,523 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.60 vs. limit=15.0 2024-08-18 04:26:54,444 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 11850, loss[loss=0.1219, beats_loss=0.008549, ecapa_loss=0.0001464, whisper_loss=0.1119, over 23241.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01058, ecapa_loss=0.0001461, whisper_loss=0.09084, over 3955322.53 frames. ], batch size: 88, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:27:19,375 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.719e+01 2.338e+01 2.535e+01 2.911e+01 4.657e+01, threshold=5.070e+01, percent-clipped=0.0 2024-08-18 04:27:19,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3695020.0, ans=0.07 2024-08-18 04:27:25,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3695020.0, ans=0.125 2024-08-18 04:27:43,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3695120.0, ans=0.0 2024-08-18 04:27:50,317 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.67 vs. limit=22.5 2024-08-18 04:27:51,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3695220.0, ans=0.1 2024-08-18 04:28:01,828 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 11900, loss[loss=0.1085, beats_loss=0.009023, ecapa_loss=0.0001545, whisper_loss=0.09797, over 23665.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01056, ecapa_loss=0.0001478, whisper_loss=0.09112, over 3952468.80 frames. ], batch size: 94, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:28:13,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3695320.0, ans=0.2 2024-08-18 04:28:16,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3695420.0, ans=0.025 2024-08-18 04:28:28,784 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 24 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-18 04:28:30,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3695520.0, ans=0.0 2024-08-18 04:28:35,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3695520.0, ans=0.2 2024-08-18 04:28:35,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3695520.0, ans=0.0 2024-08-18 04:28:44,325 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 12 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-18 04:28:56,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3695720.0, ans=0.1 2024-08-18 04:29:00,112 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-18 04:29:07,447 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 11950, loss[loss=0.08408, beats_loss=0.01201, ecapa_loss=0.0001777, whisper_loss=0.07029, over 15510.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01055, ecapa_loss=0.0001483, whisper_loss=0.09072, over 3935963.43 frames. ], batch size: 65, lr: 2.43e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:29:07,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3695820.0, ans=0.125 2024-08-18 04:29:21,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3695920.0, ans=0.125 2024-08-18 04:29:31,520 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.580e+01 2.286e+01 2.533e+01 2.895e+01 4.370e+02, threshold=5.067e+01, percent-clipped=3.0 2024-08-18 04:29:35,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3696020.0, ans=0.125 2024-08-18 04:29:42,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3696020.0, ans=0.07 2024-08-18 04:29:59,198 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 36 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-18 04:30:12,568 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 12000, loss[loss=0.112, beats_loss=0.009295, ecapa_loss=0.0001194, whisper_loss=0.1016, over 21190.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01058, ecapa_loss=0.0001476, whisper_loss=0.09086, over 3944064.78 frames. ], batch size: 79, lr: 2.43e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:30:12,569 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-18 04:30:50,004 INFO [train_multi_KD3.py:1149] (0/4) Epoch 25, validation on ASR_libri: loss=0.2542, beats_loss=0, ecapa_loss=0.0005296, whisper_loss=0.2489, over 922467.00 frames. 2024-08-18 04:31:05,487 INFO [train_multi_KD3.py:1149] (0/4) Epoch 25, validation on SV_voxceleb1: loss=0.004038, beats_loss=0, ecapa_loss=0.0004038, whisper_loss=0, over 939242.00 frames. 2024-08-18 04:32:43,914 INFO [train_multi_KD3.py:1149] (0/4) Epoch 25, validation on AT_audioset: loss=0.02323, beats_loss=0.02323, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 04:32:43,923 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32389MB 2024-08-18 04:32:59,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3696420.0, ans=0.2 2024-08-18 04:33:26,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3696620.0, ans=0.125 2024-08-18 04:33:27,087 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.31 vs. limit=10.0 2024-08-18 04:33:32,954 WARNING [optim.py:496] (0/4) Scaling gradients by 0.09565281867980957, model_norm_threshold=50.66883087158203 2024-08-18 04:33:33,128 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.self_attn_weights.linear_pos.weight with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.699e+04, grad_sumsq=4.144e+03, orig_rms_sq=8.927e+00 2024-08-18 04:33:36,941 WARNING [optim.py:496] (0/4) Scaling gradients by 0.07986252754926682, model_norm_threshold=50.66883087158203 2024-08-18 04:33:37,113 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.12, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.743e+04, grad_sumsq=4.635e+06, orig_rms_sq=1.023e-02 2024-08-18 04:33:37,312 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 27 from LS+wenet, 12 from Vox, 17 fro AS 2024-08-18 04:33:37,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3696620.0, ans=0.1 2024-08-18 04:33:44,453 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 27 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-18 04:33:50,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3696720.0, ans=0.125 2024-08-18 04:33:53,779 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 12050, loss[loss=0.09626, beats_loss=0.01167, ecapa_loss=0.0001591, whisper_loss=0.083, over 21361.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01056, ecapa_loss=0.0001471, whisper_loss=0.09059, over 3922309.55 frames. ], batch size: 92, lr: 2.43e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:33:55,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3696820.0, ans=0.125 2024-08-18 04:34:01,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3696820.0, ans=0.125 2024-08-18 04:34:01,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3696820.0, ans=0.0 2024-08-18 04:34:06,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3696920.0, ans=0.0 2024-08-18 04:34:17,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3696920.0, ans=0.125 2024-08-18 04:34:18,345 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.45 vs. limit=8.0 2024-08-18 04:34:20,202 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.681e+01 2.233e+01 2.436e+01 2.807e+01 6.345e+02, threshold=4.872e+01, percent-clipped=3.0 2024-08-18 04:34:34,857 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.00 vs. limit=6.0 2024-08-18 04:34:37,007 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-18 04:34:38,394 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 19 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-18 04:34:57,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3697220.0, ans=0.125 2024-08-18 04:34:59,829 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 25 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-18 04:35:02,949 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 12100, loss[loss=0.121, beats_loss=0.009636, ecapa_loss=0.0001578, whisper_loss=0.1097, over 24032.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01059, ecapa_loss=0.0001465, whisper_loss=0.09112, over 3933023.23 frames. ], batch size: 93, lr: 2.43e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:35:04,294 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-18 04:35:06,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3697320.0, ans=0.2 2024-08-18 04:35:08,210 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 04:35:16,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3697420.0, ans=0.125 2024-08-18 04:35:18,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3697420.0, ans=0.0 2024-08-18 04:35:19,640 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-18 04:35:36,345 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-18 04:35:48,507 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-18 04:36:08,675 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 12150, loss[loss=0.1004, beats_loss=0.01209, ecapa_loss=0.0001373, whisper_loss=0.08693, over 21938.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01064, ecapa_loss=0.0001463, whisper_loss=0.0909, over 3911958.73 frames. ], batch size: 87, lr: 2.43e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:36:13,879 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 25 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-18 04:36:14,157 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 04:36:16,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3697820.0, ans=0.125 2024-08-18 04:36:26,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3697920.0, ans=0.2 2024-08-18 04:36:27,033 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.32 vs. limit=22.5 2024-08-18 04:36:31,306 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 29 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-18 04:36:32,328 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.238e+01 2.438e+01 2.753e+01 4.276e+01, threshold=4.877e+01, percent-clipped=0.0 2024-08-18 04:36:37,595 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.95 vs. limit=15.0 2024-08-18 04:36:43,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3698020.0, ans=0.1 2024-08-18 04:36:43,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3698020.0, ans=0.0 2024-08-18 04:36:51,988 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 35 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-18 04:36:52,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3698120.0, ans=0.0 2024-08-18 04:36:55,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3698120.0, ans=0.125 2024-08-18 04:37:05,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3698220.0, ans=0.125 2024-08-18 04:37:09,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3698220.0, ans=0.125 2024-08-18 04:37:12,835 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 12200, loss[loss=0.1135, beats_loss=0.00859, ecapa_loss=0.0001393, whisper_loss=0.1035, over 19354.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01061, ecapa_loss=0.0001469, whisper_loss=0.09111, over 3915726.54 frames. ], batch size: 75, lr: 2.43e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:37:29,687 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.45 vs. limit=15.0 2024-08-18 04:37:31,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3698420.0, ans=0.125 2024-08-18 04:37:33,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3698420.0, ans=0.125 2024-08-18 04:37:33,750 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.90 vs. limit=22.5 2024-08-18 04:38:02,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3698720.0, ans=0.1 2024-08-18 04:38:03,614 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 23 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-18 04:38:06,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3698720.0, ans=0.95 2024-08-18 04:38:08,778 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 21 from LS+wenet, 24 from Vox, 51 fro AS 2024-08-18 04:38:15,125 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.20 vs. limit=15.0 2024-08-18 04:38:15,846 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 12250, loss[loss=0.09953, beats_loss=0.01092, ecapa_loss=0.0001765, whisper_loss=0.08685, over 16436.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0107, ecapa_loss=0.0001465, whisper_loss=0.09056, over 3943684.92 frames. ], batch size: 71, lr: 2.43e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:38:17,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3698820.0, ans=0.0 2024-08-18 04:38:22,711 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 29 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-18 04:38:40,009 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.471e+01 2.671e+01 3.119e+01 8.520e+01, threshold=5.341e+01, percent-clipped=2.0 2024-08-18 04:38:58,006 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 30 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-18 04:38:59,290 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-18 04:39:01,988 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 26 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-18 04:39:02,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3699120.0, ans=0.09899494936611666 2024-08-18 04:39:07,703 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.09 vs. limit=15.0 2024-08-18 04:39:08,167 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 17 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-18 04:39:15,519 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-18 04:39:19,482 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 12300, loss[loss=0.1219, beats_loss=0.01122, ecapa_loss=0.0001414, whisper_loss=0.1093, over 21049.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01054, ecapa_loss=0.0001473, whisper_loss=0.09142, over 3922221.06 frames. ], batch size: 80, lr: 2.43e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:39:33,627 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.17 vs. limit=6.0 2024-08-18 04:39:34,465 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-18 04:39:34,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3699420.0, ans=0.125 2024-08-18 04:39:49,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3699520.0, ans=0.125 2024-08-18 04:39:57,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3699620.0, ans=0.125 2024-08-18 04:40:01,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3699620.0, ans=0.0 2024-08-18 04:40:04,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3699620.0, ans=0.0 2024-08-18 04:40:11,506 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-18 04:40:11,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3699720.0, ans=0.125 2024-08-18 04:40:15,706 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 17 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-18 04:40:15,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3699720.0, ans=0.125 2024-08-18 04:40:21,657 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 12350, loss[loss=0.1111, beats_loss=0.01293, ecapa_loss=0.0001077, whisper_loss=0.09707, over 24339.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0105, ecapa_loss=0.0001472, whisper_loss=0.09152, over 3928431.09 frames. ], batch size: 91, lr: 2.43e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:40:21,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3699820.0, ans=0.125 2024-08-18 04:40:44,877 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-18 04:40:45,309 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.26 vs. limit=15.0 2024-08-18 04:40:45,934 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.044e+01 2.495e+01 2.683e+01 3.048e+01 4.110e+01, threshold=5.366e+01, percent-clipped=0.0 2024-08-18 04:40:52,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3700020.0, ans=0.0 2024-08-18 04:40:55,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3700020.0, ans=0.1 2024-08-18 04:41:24,878 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 12400, loss[loss=0.08762, beats_loss=0.00999, ecapa_loss=0.0001611, whisper_loss=0.07602, over 14109.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01047, ecapa_loss=0.0001468, whisper_loss=0.09145, over 3927891.58 frames. ], batch size: 58, lr: 2.43e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:41:38,968 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 36 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-18 04:41:44,762 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 21 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-18 04:41:55,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3700520.0, ans=0.125 2024-08-18 04:41:55,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3700520.0, ans=0.0 2024-08-18 04:41:59,528 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-18 04:42:05,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3700620.0, ans=0.0 2024-08-18 04:42:08,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3700620.0, ans=0.1 2024-08-18 04:42:10,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3700620.0, ans=0.07 2024-08-18 04:42:19,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3700720.0, ans=0.125 2024-08-18 04:42:24,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3700720.0, ans=0.125 2024-08-18 04:42:26,437 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 12450, loss[loss=0.1059, beats_loss=0.0102, ecapa_loss=0.0001173, whisper_loss=0.09453, over 18382.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01033, ecapa_loss=0.0001473, whisper_loss=0.0921, over 3907914.90 frames. ], batch size: 70, lr: 2.43e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:42:26,607 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 24 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-18 04:42:45,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3700920.0, ans=0.125 2024-08-18 04:42:50,282 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.277e+01 2.499e+01 2.886e+01 6.809e+01, threshold=4.997e+01, percent-clipped=1.0 2024-08-18 04:42:54,585 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.32 vs. limit=10.0 2024-08-18 04:43:06,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3701120.0, ans=0.125 2024-08-18 04:43:28,286 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 12500, loss[loss=0.1368, beats_loss=0.005818, ecapa_loss=0.000166, whisper_loss=0.1294, over 18796.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01031, ecapa_loss=0.0001466, whisper_loss=0.09282, over 3924029.86 frames. ], batch size: 72, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:43:30,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3701320.0, ans=0.125 2024-08-18 04:43:32,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3701320.0, ans=0.125 2024-08-18 04:43:36,301 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.93 vs. limit=15.0 2024-08-18 04:43:53,178 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-18 04:43:53,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3701520.0, ans=0.125 2024-08-18 04:43:54,290 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 23 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-18 04:44:00,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3701520.0, ans=0.0 2024-08-18 04:44:03,052 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-18 04:44:30,805 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 12550, loss[loss=0.09852, beats_loss=0.01002, ecapa_loss=0.0001281, whisper_loss=0.08722, over 21200.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01036, ecapa_loss=0.0001473, whisper_loss=0.09279, over 3945283.13 frames. ], batch size: 84, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:44:32,144 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 20 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-18 04:44:35,779 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 27 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-18 04:44:42,131 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 30 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-18 04:44:44,768 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-18 04:44:53,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3701920.0, ans=0.1 2024-08-18 04:44:55,887 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.343e+01 2.607e+01 2.934e+01 3.730e+01, threshold=5.215e+01, percent-clipped=0.0 2024-08-18 04:44:58,777 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-18 04:45:08,917 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 30 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-18 04:45:09,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3702120.0, ans=0.0 2024-08-18 04:45:32,197 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-18 04:45:33,249 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 12600, loss[loss=0.08664, beats_loss=0.01248, ecapa_loss=0.0001859, whisper_loss=0.0723, over 18098.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.0104, ecapa_loss=0.000147, whisper_loss=0.09262, over 3928399.23 frames. ], batch size: 77, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:45:44,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3702320.0, ans=0.2 2024-08-18 04:45:47,874 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3702420.0, ans=0.125 2024-08-18 04:46:06,623 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 04:46:12,965 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.31 vs. limit=15.0 2024-08-18 04:46:22,790 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-18 04:46:23,859 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-18 04:46:35,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3702820.0, ans=0.2 2024-08-18 04:46:36,122 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 12650, loss[loss=0.08712, beats_loss=0.01428, ecapa_loss=0.0001249, whisper_loss=0.07159, over 17806.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01046, ecapa_loss=0.0001474, whisper_loss=0.09157, over 3935286.31 frames. ], batch size: 75, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:46:43,457 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 28 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-18 04:47:01,282 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.719e+01 2.352e+01 2.595e+01 2.962e+01 3.883e+01, threshold=5.190e+01, percent-clipped=0.0 2024-08-18 04:47:07,395 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 23 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-18 04:47:07,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3703020.0, ans=0.1 2024-08-18 04:47:07,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3703020.0, ans=0.0 2024-08-18 04:47:18,581 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 11 from Vox, 53 fro AS 2024-08-18 04:47:25,907 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-18 04:47:38,495 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 12700, loss[loss=0.1072, beats_loss=0.01159, ecapa_loss=0.0001269, whisper_loss=0.09429, over 18954.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01049, ecapa_loss=0.0001477, whisper_loss=0.09174, over 3913341.47 frames. ], batch size: 75, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:47:44,962 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3703320.0, ans=0.0 2024-08-18 04:47:45,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3703320.0, ans=0.0 2024-08-18 04:47:52,490 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-18 04:47:52,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3703420.0, ans=0.125 2024-08-18 04:47:56,236 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 25 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-18 04:48:26,913 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 29 from LS+wenet, 30 from Vox, 25 fro AS 2024-08-18 04:48:40,216 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 12750, loss[loss=0.1145, beats_loss=0.00919, ecapa_loss=0.0001506, whisper_loss=0.1038, over 19452.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0105, ecapa_loss=0.0001467, whisper_loss=0.09151, over 3903065.75 frames. ], batch size: 76, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:48:45,535 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.79 vs. limit=22.5 2024-08-18 04:48:57,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3703920.0, ans=10.0 2024-08-18 04:49:00,064 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-18 04:49:02,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3703920.0, ans=0.125 2024-08-18 04:49:05,081 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.358e+01 2.568e+01 2.897e+01 5.259e+01, threshold=5.137e+01, percent-clipped=1.0 2024-08-18 04:49:09,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3704020.0, ans=0.1 2024-08-18 04:49:15,277 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 26 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-18 04:49:16,869 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3704120.0, ans=0.1 2024-08-18 04:49:22,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3704120.0, ans=0.95 2024-08-18 04:49:29,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3704220.0, ans=0.0 2024-08-18 04:49:29,990 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.74 vs. limit=15.0 2024-08-18 04:49:37,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3704220.0, ans=0.0 2024-08-18 04:49:42,421 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 12800, loss[loss=0.123, beats_loss=0.01103, ecapa_loss=0.0001358, whisper_loss=0.1106, over 18715.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01055, ecapa_loss=0.0001483, whisper_loss=0.09038, over 3897285.97 frames. ], batch size: 73, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:50:00,591 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.78 vs. limit=15.0 2024-08-18 04:50:02,889 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 23 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-18 04:50:29,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3704620.0, ans=0.2 2024-08-18 04:50:40,943 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3704720.0, ans=0.0 2024-08-18 04:50:43,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3704720.0, ans=0.0 2024-08-18 04:50:45,706 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 12850, loss[loss=0.1196, beats_loss=0.01042, ecapa_loss=0.0001417, whisper_loss=0.1078, over 22369.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01064, ecapa_loss=0.0001475, whisper_loss=0.08975, over 3892246.98 frames. ], batch size: 89, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:51:10,385 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.841e+01 2.342e+01 2.570e+01 2.903e+01 1.113e+02, threshold=5.140e+01, percent-clipped=1.0 2024-08-18 04:51:28,060 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-18 04:51:29,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3705120.0, ans=0.125 2024-08-18 04:51:36,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3705220.0, ans=0.125 2024-08-18 04:51:48,029 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 12900, loss[loss=0.09752, beats_loss=0.01103, ecapa_loss=0.0001497, whisper_loss=0.08499, over 21932.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01059, ecapa_loss=0.0001483, whisper_loss=0.08959, over 3897390.74 frames. ], batch size: 92, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:51:53,599 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.70 vs. limit=10.0 2024-08-18 04:51:54,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3705320.0, ans=0.125 2024-08-18 04:51:59,663 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.87 vs. limit=12.0 2024-08-18 04:52:01,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3705420.0, ans=0.125 2024-08-18 04:52:04,260 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 23 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-18 04:52:17,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3705520.0, ans=0.125 2024-08-18 04:52:40,247 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 16 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-18 04:52:49,891 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 12950, loss[loss=0.102, beats_loss=0.01189, ecapa_loss=0.0001039, whisper_loss=0.0891, over 18571.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01054, ecapa_loss=0.0001481, whisper_loss=0.08956, over 3864850.77 frames. ], batch size: 71, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:53:00,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3705820.0, ans=0.1 2024-08-18 04:53:03,874 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-18 04:53:06,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3705920.0, ans=0.0 2024-08-18 04:53:15,034 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.225e+01 2.385e+01 2.707e+01 3.126e+02, threshold=4.771e+01, percent-clipped=1.0 2024-08-18 04:53:15,192 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-18 04:53:52,849 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 13000, loss[loss=0.07258, beats_loss=0.0131, ecapa_loss=0.000136, whisper_loss=0.05811, over 19112.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01053, ecapa_loss=0.0001482, whisper_loss=0.08942, over 3869656.88 frames. ], batch size: 80, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:54:07,134 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.71 vs. limit=6.0 2024-08-18 04:54:07,922 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 21 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-18 04:54:11,171 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.75 vs. limit=12.0 2024-08-18 04:54:23,168 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 27 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-18 04:54:29,375 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 15 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-18 04:54:30,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3706620.0, ans=0.07 2024-08-18 04:54:34,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3706620.0, ans=0.125 2024-08-18 04:54:38,849 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.88 vs. limit=15.0 2024-08-18 04:54:46,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3706720.0, ans=0.1 2024-08-18 04:54:52,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3706720.0, ans=0.1 2024-08-18 04:54:55,455 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 13050, loss[loss=0.08575, beats_loss=0.01035, ecapa_loss=0.0001513, whisper_loss=0.07388, over 17121.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01048, ecapa_loss=0.0001472, whisper_loss=0.08976, over 3857839.91 frames. ], batch size: 70, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:55:04,603 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.157e-03 2024-08-18 04:55:10,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3706920.0, ans=0.2 2024-08-18 04:55:20,642 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.374e+01 2.590e+01 2.851e+01 4.425e+02, threshold=5.179e+01, percent-clipped=1.0 2024-08-18 04:55:30,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3707020.0, ans=0.0 2024-08-18 04:55:31,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3707120.0, ans=0.0 2024-08-18 04:55:35,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3707120.0, ans=0.1 2024-08-18 04:55:37,893 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 13 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-18 04:55:39,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3707120.0, ans=0.0 2024-08-18 04:55:57,594 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 13100, loss[loss=0.1054, beats_loss=0.009431, ecapa_loss=0.0001297, whisper_loss=0.09471, over 19088.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01058, ecapa_loss=0.0001464, whisper_loss=0.0892, over 3846022.38 frames. ], batch size: 73, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:56:01,450 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 24 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-18 04:56:35,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3707620.0, ans=0.2 2024-08-18 04:56:36,323 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-18 04:56:39,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3707620.0, ans=0.0 2024-08-18 04:56:40,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3707620.0, ans=0.0 2024-08-18 04:56:41,586 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 21 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-18 04:56:47,749 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-18 04:56:52,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3707720.0, ans=0.125 2024-08-18 04:56:59,808 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 13150, loss[loss=0.1128, beats_loss=0.01035, ecapa_loss=0.0001431, whisper_loss=0.101, over 21489.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0106, ecapa_loss=0.0001457, whisper_loss=0.08996, over 3860727.27 frames. ], batch size: 86, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:57:08,888 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 27 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-18 04:57:17,006 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2024-08-18 04:57:20,146 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 15 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-18 04:57:20,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3707920.0, ans=0.0 2024-08-18 04:57:21,002 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.56 vs. limit=15.0 2024-08-18 04:57:21,705 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 20 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-18 04:57:24,236 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 23 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-18 04:57:25,548 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.778e+01 2.230e+01 2.439e+01 2.719e+01 6.493e+01, threshold=4.878e+01, percent-clipped=1.0 2024-08-18 04:57:27,155 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 14 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-18 04:57:36,114 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.27 vs. limit=15.0 2024-08-18 04:57:36,935 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-18 04:57:37,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3708120.0, ans=0.125 2024-08-18 04:57:48,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3708120.0, ans=0.0 2024-08-18 04:57:51,412 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.89 vs. limit=15.0 2024-08-18 04:57:53,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3708220.0, ans=0.125 2024-08-18 04:58:02,837 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 13200, loss[loss=0.1211, beats_loss=0.009736, ecapa_loss=0.0001417, whisper_loss=0.1099, over 22836.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01053, ecapa_loss=0.000146, whisper_loss=0.08978, over 3846402.45 frames. ], batch size: 88, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:58:07,250 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.25 vs. limit=12.0 2024-08-18 04:58:30,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3708520.0, ans=0.125 2024-08-18 04:58:46,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3708620.0, ans=0.0 2024-08-18 04:58:55,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3708720.0, ans=0.2 2024-08-18 04:58:57,718 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-18 04:59:04,258 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-18 04:59:04,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3708820.0, ans=0.125 2024-08-18 04:59:05,259 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 13250, loss[loss=0.1, beats_loss=0.009427, ecapa_loss=0.0001456, whisper_loss=0.08915, over 16877.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01051, ecapa_loss=0.0001469, whisper_loss=0.08993, over 3835743.37 frames. ], batch size: 67, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:59:05,421 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 13 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-18 04:59:16,753 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.601e+01 2024-08-18 04:59:19,218 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=10.39 vs. limit=12.0 2024-08-18 04:59:29,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3709020.0, ans=0.125 2024-08-18 04:59:30,036 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.594e+01 2.263e+01 2.511e+01 2.835e+01 4.406e+01, threshold=5.022e+01, percent-clipped=0.0 2024-08-18 04:59:30,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3709020.0, ans=0.1 2024-08-18 04:59:32,667 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-18 04:59:33,238 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.98 vs. limit=15.0 2024-08-18 04:59:42,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3709120.0, ans=0.0 2024-08-18 05:00:07,491 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 13300, loss[loss=0.1165, beats_loss=0.00933, ecapa_loss=0.0001744, whisper_loss=0.1054, over 20023.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01057, ecapa_loss=0.0001454, whisper_loss=0.08959, over 3809237.42 frames. ], batch size: 80, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:00:20,424 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.64 vs. limit=22.5 2024-08-18 05:00:21,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3709420.0, ans=0.2 2024-08-18 05:00:23,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3709420.0, ans=0.125 2024-08-18 05:00:30,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3709420.0, ans=0.2 2024-08-18 05:00:34,123 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.01 vs. limit=22.5 2024-08-18 05:00:49,927 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 27 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-18 05:00:51,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3709620.0, ans=0.0 2024-08-18 05:01:09,793 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 13350, loss[loss=0.08862, beats_loss=0.01451, ecapa_loss=0.0001429, whisper_loss=0.07268, over 20052.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01058, ecapa_loss=0.0001456, whisper_loss=0.08921, over 3806583.47 frames. ], batch size: 85, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:01:18,638 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-18 05:01:34,952 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.554e+01 2.429e+01 2.621e+01 2.901e+01 4.949e+01, threshold=5.243e+01, percent-clipped=0.0 2024-08-18 05:01:47,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3710120.0, ans=0.2 2024-08-18 05:02:02,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3710220.0, ans=0.125 2024-08-18 05:02:13,048 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 13400, loss[loss=0.1027, beats_loss=0.01144, ecapa_loss=0.0001507, whisper_loss=0.08974, over 18561.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01064, ecapa_loss=0.0001445, whisper_loss=0.08878, over 3815735.90 frames. ], batch size: 79, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:02:19,920 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 20 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-18 05:02:20,503 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.30 vs. limit=15.0 2024-08-18 05:02:26,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3710420.0, ans=0.0 2024-08-18 05:02:28,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3710420.0, ans=0.2 2024-08-18 05:02:28,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3710420.0, ans=0.1 2024-08-18 05:02:30,751 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 21 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-18 05:02:34,696 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 36 from Vox, 30 fro AS 2024-08-18 05:02:55,738 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.75 vs. limit=15.0 2024-08-18 05:03:10,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3710720.0, ans=0.125 2024-08-18 05:03:15,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3710820.0, ans=0.0 2024-08-18 05:03:16,471 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 13450, loss[loss=0.09743, beats_loss=0.01172, ecapa_loss=0.0001744, whisper_loss=0.08396, over 21055.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01066, ecapa_loss=0.000144, whisper_loss=0.08863, over 3830212.85 frames. ], batch size: 90, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:03:21,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3710820.0, ans=0.1 2024-08-18 05:03:28,092 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-18 05:03:33,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3710920.0, ans=0.04949747468305833 2024-08-18 05:03:41,694 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.380e+01 2.555e+01 2.922e+01 3.832e+01, threshold=5.110e+01, percent-clipped=0.0 2024-08-18 05:03:43,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3711020.0, ans=0.0 2024-08-18 05:03:44,233 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-18 05:04:07,289 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.61 vs. limit=10.0 2024-08-18 05:04:18,785 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 13500, loss[loss=0.09818, beats_loss=0.01154, ecapa_loss=0.0001481, whisper_loss=0.08516, over 22082.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01065, ecapa_loss=0.0001447, whisper_loss=0.08871, over 3818778.51 frames. ], batch size: 91, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:04:20,335 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 05:04:36,736 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-18 05:04:36,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3711420.0, ans=0.125 2024-08-18 05:04:41,437 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 21 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-18 05:04:42,037 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.78 vs. limit=15.0 2024-08-18 05:04:43,394 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.25 vs. limit=15.0 2024-08-18 05:04:43,766 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 35 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-18 05:04:51,399 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-18 05:04:56,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3711620.0, ans=0.07 2024-08-18 05:05:10,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3711720.0, ans=0.1 2024-08-18 05:05:18,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3711720.0, ans=0.125 2024-08-18 05:05:20,842 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 13550, loss[loss=0.09526, beats_loss=0.01135, ecapa_loss=0.0001162, whisper_loss=0.08275, over 15694.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01058, ecapa_loss=0.0001454, whisper_loss=0.08949, over 3822732.99 frames. ], batch size: 62, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:05:22,097 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 35 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-18 05:05:22,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=3711820.0, ans=0.02 2024-08-18 05:05:22,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3711820.0, ans=0.125 2024-08-18 05:05:45,913 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+01 2.297e+01 2.515e+01 2.826e+01 4.852e+01, threshold=5.029e+01, percent-clipped=0.0 2024-08-18 05:06:03,295 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 16 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-18 05:06:10,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3712220.0, ans=0.0 2024-08-18 05:06:17,205 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 27 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-18 05:06:23,028 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 13600, loss[loss=0.09389, beats_loss=0.01395, ecapa_loss=0.0001167, whisper_loss=0.07877, over 19781.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01074, ecapa_loss=0.0001447, whisper_loss=0.08981, over 3855185.16 frames. ], batch size: 80, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:06:26,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3712320.0, ans=0.125 2024-08-18 05:06:27,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3712320.0, ans=0.125 2024-08-18 05:06:28,487 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-18 05:06:39,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3712420.0, ans=0.0 2024-08-18 05:06:41,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3712420.0, ans=0.2 2024-08-18 05:06:47,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3712520.0, ans=0.125 2024-08-18 05:06:49,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3712520.0, ans=0.125 2024-08-18 05:06:52,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3712520.0, ans=0.0 2024-08-18 05:07:01,809 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 16 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-18 05:07:13,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3712720.0, ans=0.1 2024-08-18 05:07:25,918 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 13650, loss[loss=0.113, beats_loss=0.009146, ecapa_loss=0.0001208, whisper_loss=0.1027, over 14649.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0107, ecapa_loss=0.0001455, whisper_loss=0.09027, over 3815138.70 frames. ], batch size: 56, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:07:30,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3712820.0, ans=0.125 2024-08-18 05:07:44,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3712920.0, ans=0.0 2024-08-18 05:07:50,782 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.696e+01 2.250e+01 2.521e+01 2.924e+01 4.330e+02, threshold=5.042e+01, percent-clipped=2.0 2024-08-18 05:08:08,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3713120.0, ans=0.0 2024-08-18 05:08:23,596 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 36 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-18 05:08:28,465 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 13700, loss[loss=0.122, beats_loss=0.008984, ecapa_loss=0.0001308, whisper_loss=0.1117, over 19229.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01075, ecapa_loss=0.0001441, whisper_loss=0.0901, over 3852754.98 frames. ], batch size: 71, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:08:46,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3713420.0, ans=0.0 2024-08-18 05:08:49,755 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-18 05:09:29,561 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 16 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-18 05:09:30,604 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 13750, loss[loss=0.1027, beats_loss=0.008181, ecapa_loss=0.000186, whisper_loss=0.09265, over 13034.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01064, ecapa_loss=0.0001442, whisper_loss=0.09086, over 3875141.03 frames. ], batch size: 55, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:09:38,833 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.54 vs. limit=15.0 2024-08-18 05:09:40,895 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.81 vs. limit=15.0 2024-08-18 05:09:44,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3713920.0, ans=0.125 2024-08-18 05:09:52,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3713920.0, ans=0.125 2024-08-18 05:09:55,366 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.369e+01 2.573e+01 3.048e+01 2.205e+02, threshold=5.146e+01, percent-clipped=4.0 2024-08-18 05:10:02,012 WARNING [optim.py:496] (0/4) Scaling gradients by 0.0230120737105608, model_norm_threshold=51.46477508544922 2024-08-18 05:10:02,183 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.952e+05, grad_sumsq=7.749e+07, orig_rms_sq=1.026e-02 2024-08-18 05:10:15,987 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.879e-02 2024-08-18 05:10:16,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3714120.0, ans=0.1 2024-08-18 05:10:28,346 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.61 vs. limit=15.0 2024-08-18 05:10:30,478 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 28 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-18 05:10:32,792 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 13800, loss[loss=0.09043, beats_loss=0.01031, ecapa_loss=0.0001419, whisper_loss=0.0787, over 19528.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01066, ecapa_loss=0.0001456, whisper_loss=0.09091, over 3897546.86 frames. ], batch size: 76, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:10:34,928 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.91 vs. limit=15.0 2024-08-18 05:10:38,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3714320.0, ans=0.125 2024-08-18 05:10:58,040 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 20 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-18 05:11:21,438 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 29 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-18 05:11:23,852 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-18 05:11:35,168 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 13850, loss[loss=0.08838, beats_loss=0.01024, ecapa_loss=0.0001542, whisper_loss=0.0766, over 16400.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01062, ecapa_loss=0.0001453, whisper_loss=0.09065, over 3882769.95 frames. ], batch size: 68, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:11:36,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3714820.0, ans=0.0 2024-08-18 05:11:39,066 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 22 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-18 05:11:42,615 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 16 from Vox, 49 fro AS 2024-08-18 05:11:44,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3714820.0, ans=0.125 2024-08-18 05:11:59,754 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.280e+01 2.571e+01 2.873e+01 2.236e+03, threshold=5.141e+01, percent-clipped=2.0 2024-08-18 05:12:19,846 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-18 05:12:23,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3715220.0, ans=0.125 2024-08-18 05:12:36,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3715320.0, ans=0.0 2024-08-18 05:12:37,495 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 13900, loss[loss=0.1188, beats_loss=0.01055, ecapa_loss=0.0001405, whisper_loss=0.1069, over 23043.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01054, ecapa_loss=0.0001453, whisper_loss=0.09152, over 3895804.21 frames. ], batch size: 90, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:12:52,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3715420.0, ans=0.125 2024-08-18 05:12:53,722 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-18 05:13:01,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3715520.0, ans=0.125 2024-08-18 05:13:39,547 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 13950, loss[loss=0.109, beats_loss=0.01021, ecapa_loss=0.000184, whisper_loss=0.09699, over 21991.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01045, ecapa_loss=0.0001467, whisper_loss=0.0922, over 3896497.91 frames. ], batch size: 91, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:13:43,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3715820.0, ans=0.0 2024-08-18 05:13:44,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3715820.0, ans=0.1 2024-08-18 05:13:52,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3715920.0, ans=0.125 2024-08-18 05:14:04,515 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.937e+01 2.378e+01 2.637e+01 2.974e+01 4.505e+01, threshold=5.274e+01, percent-clipped=0.0 2024-08-18 05:14:05,942 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 23 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-18 05:14:08,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3716020.0, ans=0.05 2024-08-18 05:14:19,618 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 15 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-18 05:14:20,227 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.22 vs. limit=15.0 2024-08-18 05:14:39,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3716220.0, ans=0.0 2024-08-18 05:14:41,774 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 14000, loss[loss=0.1176, beats_loss=0.01057, ecapa_loss=0.000164, whisper_loss=0.1054, over 22253.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01046, ecapa_loss=0.000146, whisper_loss=0.09156, over 3899260.20 frames. ], batch size: 91, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:15:16,821 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3716520.0, ans=0.125 2024-08-18 05:15:27,234 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.05 vs. limit=12.0 2024-08-18 05:15:33,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3716720.0, ans=0.125 2024-08-18 05:15:35,459 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 15 from Vox, 48 fro AS 2024-08-18 05:15:38,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3716720.0, ans=0.0 2024-08-18 05:15:42,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3716720.0, ans=0.1 2024-08-18 05:15:44,546 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 14050, loss[loss=0.131, beats_loss=0.00717, ecapa_loss=0.0001428, whisper_loss=0.1224, over 21100.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01052, ecapa_loss=0.0001454, whisper_loss=0.09083, over 3887726.92 frames. ], batch size: 79, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:15:46,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3716820.0, ans=0.125 2024-08-18 05:15:52,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3716820.0, ans=0.1 2024-08-18 05:15:56,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3716920.0, ans=0.1 2024-08-18 05:16:00,512 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.60 vs. limit=12.0 2024-08-18 05:16:02,883 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3716920.0, ans=0.125 2024-08-18 05:16:03,710 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 26 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-18 05:16:07,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3716920.0, ans=0.0 2024-08-18 05:16:08,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3717020.0, ans=0.125 2024-08-18 05:16:09,490 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.272e+01 2.582e+01 2.804e+01 4.484e+01, threshold=5.163e+01, percent-clipped=0.0 2024-08-18 05:16:16,179 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.64 vs. limit=15.0 2024-08-18 05:16:34,706 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 35 from LS+wenet, 13 from Vox, 44 fro AS 2024-08-18 05:16:41,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3717220.0, ans=0.2 2024-08-18 05:16:43,667 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 27 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-18 05:16:43,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3717220.0, ans=0.125 2024-08-18 05:16:47,031 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 14100, loss[loss=0.1005, beats_loss=0.01367, ecapa_loss=0.0001138, whisper_loss=0.08569, over 23275.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01054, ecapa_loss=0.0001441, whisper_loss=0.09117, over 3894638.97 frames. ], batch size: 91, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:17:02,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3717420.0, ans=0.125 2024-08-18 05:17:07,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3717420.0, ans=0.125 2024-08-18 05:17:13,574 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-18 05:17:21,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3717520.0, ans=0.1 2024-08-18 05:17:23,897 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 05:17:30,766 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-18 05:17:38,873 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 25 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-18 05:17:48,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3717820.0, ans=0.0 2024-08-18 05:17:49,610 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 14150, loss[loss=0.09579, beats_loss=0.008579, ecapa_loss=0.0001855, whisper_loss=0.08536, over 20065.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0105, ecapa_loss=0.0001455, whisper_loss=0.09161, over 3874484.04 frames. ], batch size: 86, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:17:52,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3717820.0, ans=0.0 2024-08-18 05:17:56,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3717820.0, ans=0.0 2024-08-18 05:18:14,478 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.317e+01 2.583e+01 2.863e+01 6.029e+01, threshold=5.165e+01, percent-clipped=1.0 2024-08-18 05:18:26,995 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 19 from LS+wenet, 20 from Vox, 52 fro AS 2024-08-18 05:18:51,396 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 14200, loss[loss=0.0992, beats_loss=0.01367, ecapa_loss=9.513e-05, whisper_loss=0.08458, over 19589.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01048, ecapa_loss=0.0001454, whisper_loss=0.0915, over 3893540.66 frames. ], batch size: 77, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:18:57,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3718320.0, ans=0.125 2024-08-18 05:19:00,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3718320.0, ans=0.125 2024-08-18 05:19:01,879 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-18 05:19:19,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3718520.0, ans=0.0 2024-08-18 05:19:24,039 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 18 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-18 05:19:29,974 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 32 from Vox, 29 fro AS 2024-08-18 05:19:54,070 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 14250, loss[loss=0.1387, beats_loss=0.00746, ecapa_loss=0.0001647, whisper_loss=0.1296, over 20095.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01049, ecapa_loss=0.0001451, whisper_loss=0.09141, over 3886692.01 frames. ], batch size: 81, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:19:55,391 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-18 05:20:03,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3718820.0, ans=0.125 2024-08-18 05:20:18,496 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 26 from Vox, 19 fro AS 2024-08-18 05:20:20,724 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.786e+01 2.251e+01 2.558e+01 2.864e+01 4.072e+01, threshold=5.115e+01, percent-clipped=0.0 2024-08-18 05:20:37,342 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.98 vs. limit=15.0 2024-08-18 05:20:58,177 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-18 05:21:10,241 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 14300, loss[loss=0.1137, beats_loss=0.01032, ecapa_loss=0.0001319, whisper_loss=0.102, over 19845.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01048, ecapa_loss=0.0001452, whisper_loss=0.09146, over 3914178.72 frames. ], batch size: 75, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:21:10,413 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-18 05:21:35,301 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.34 vs. limit=15.0 2024-08-18 05:21:41,640 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-18 05:22:16,403 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 28 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-18 05:22:22,459 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.47 vs. limit=15.0 2024-08-18 05:22:40,166 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 14350, loss[loss=0.09429, beats_loss=0.0112, ecapa_loss=0.000157, whisper_loss=0.08152, over 21583.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01047, ecapa_loss=0.000146, whisper_loss=0.09087, over 3903729.92 frames. ], batch size: 87, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:22:40,361 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 33 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-18 05:23:15,226 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-372000.pt 2024-08-18 05:23:22,502 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.359e+01 2.540e+01 2.872e+01 4.791e+01, threshold=5.079e+01, percent-clipped=0.0 2024-08-18 05:23:23,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3720020.0, ans=0.1 2024-08-18 05:23:42,250 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-18 05:23:56,409 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-18 05:24:21,585 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 33 from Vox, 35 fro AS 2024-08-18 05:24:22,736 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 14400, loss[loss=0.08556, beats_loss=0.01193, ecapa_loss=0.0002178, whisper_loss=0.07146, over 19611.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0104, ecapa_loss=0.0001478, whisper_loss=0.09118, over 3902201.15 frames. ], batch size: 92, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:24:39,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3720320.0, ans=0.0 2024-08-18 05:24:46,515 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 22 from LS+wenet, 9 from Vox, 31 fro AS 2024-08-18 05:25:09,362 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-18 05:25:24,110 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-18 05:25:24,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3720620.0, ans=0.0 2024-08-18 05:25:28,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3720620.0, ans=0.125 2024-08-18 05:25:34,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3720620.0, ans=0.0 2024-08-18 05:25:42,173 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 22 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-18 05:25:46,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3720720.0, ans=0.125 2024-08-18 05:26:08,515 INFO [train_multi_KD3.py:1116] (0/4) Epoch 25, batch 14450, loss[loss=0.1145, beats_loss=0.01071, ecapa_loss=9.189e-05, whisper_loss=0.1028, over 17790.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01046, ecapa_loss=0.0001468, whisper_loss=0.09087, over 3890424.46 frames. ], batch size: 65, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:26:37,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3720920.0, ans=0.125 2024-08-18 05:26:48,678 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.314e+01 2.548e+01 2.908e+01 2.011e+02, threshold=5.097e+01, percent-clipped=3.0 2024-08-18 05:26:55,280 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 28 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-18 05:26:58,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3721020.0, ans=0.125 2024-08-18 05:26:58,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3721020.0, ans=0.125 2024-08-18 05:27:10,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3721120.0, ans=0.125 2024-08-18 05:27:18,379 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/epoch-25.pt 2024-08-18 05:27:51,750 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 0, loss[loss=0.1173, beats_loss=0.0074, ecapa_loss=0.0001864, whisper_loss=0.108, over 22548.00 frames. ], tot_loss[loss=0.1173, beats_loss=0.0074, ecapa_loss=0.0001864, whisper_loss=0.108, over 22548.00 frames. ], batch size: 94, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:27:51,752 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-18 05:28:25,235 INFO [train_multi_KD3.py:1149] (0/4) Epoch 26, validation on ASR_libri: loss=0.251, beats_loss=0, ecapa_loss=0.0005273, whisper_loss=0.2457, over 922467.00 frames. 2024-08-18 05:28:39,658 INFO [train_multi_KD3.py:1149] (0/4) Epoch 26, validation on SV_voxceleb1: loss=0.004107, beats_loss=0, ecapa_loss=0.0004107, whisper_loss=0, over 939242.00 frames. 2024-08-18 05:30:15,507 INFO [train_multi_KD3.py:1149] (0/4) Epoch 26, validation on AT_audioset: loss=0.02319, beats_loss=0.02319, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 05:30:15,510 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32389MB 2024-08-18 05:30:24,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3721230.0, ans=10.0 2024-08-18 05:30:25,529 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.68 vs. limit=15.0 2024-08-18 05:30:41,755 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 21 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-18 05:31:12,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3721430.0, ans=0.125 2024-08-18 05:31:38,465 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-18 05:31:41,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3721530.0, ans=0.125 2024-08-18 05:31:52,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3721630.0, ans=0.05 2024-08-18 05:32:09,465 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 50, loss[loss=0.1085, beats_loss=0.009332, ecapa_loss=0.0001642, whisper_loss=0.09748, over 16776.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.009257, ecapa_loss=0.0001493, whisper_loss=0.09258, over 869182.82 frames. ], batch size: 68, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:32:20,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3721730.0, ans=0.2 2024-08-18 05:32:20,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3721730.0, ans=0.0 2024-08-18 05:32:29,845 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-18 05:32:32,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3721830.0, ans=0.0 2024-08-18 05:32:39,270 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-18 05:33:02,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3721930.0, ans=0.0 2024-08-18 05:33:12,199 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.655e+01 2.458e+01 2.767e+01 3.037e+01 4.050e+01, threshold=5.534e+01, percent-clipped=0.0 2024-08-18 05:33:15,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3722030.0, ans=0.1 2024-08-18 05:33:54,756 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 17 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-18 05:33:58,132 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 100, loss[loss=0.1058, beats_loss=0.009362, ecapa_loss=0.0001596, whisper_loss=0.09484, over 23176.00 frames. ], tot_loss[loss=0.09946, beats_loss=0.009413, ecapa_loss=0.000148, whisper_loss=0.08857, over 1515614.26 frames. ], batch size: 93, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:33:58,428 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 05:34:04,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3722230.0, ans=0.125 2024-08-18 05:34:16,316 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-18 05:34:22,735 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 24 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-18 05:34:27,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3722330.0, ans=0.125 2024-08-18 05:34:31,219 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.14 vs. limit=6.0 2024-08-18 05:34:41,623 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3722430.0, ans=0.2 2024-08-18 05:34:45,452 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-18 05:35:12,658 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 16 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-18 05:35:24,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3722630.0, ans=0.1 2024-08-18 05:35:35,846 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 150, loss[loss=0.09002, beats_loss=0.01016, ecapa_loss=0.0001804, whisper_loss=0.07806, over 21321.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.009592, ecapa_loss=0.0001474, whisper_loss=0.08902, over 2019189.60 frames. ], batch size: 90, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:35:41,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3722730.0, ans=0.125 2024-08-18 05:36:07,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3722930.0, ans=0.125 2024-08-18 05:36:11,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3722930.0, ans=0.125 2024-08-18 05:36:12,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3722930.0, ans=0.125 2024-08-18 05:36:22,207 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 17 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-18 05:36:23,223 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.872e+01 2.548e+01 2.773e+01 3.033e+01 4.359e+01, threshold=5.546e+01, percent-clipped=0.0 2024-08-18 05:36:28,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3723030.0, ans=0.125 2024-08-18 05:36:32,135 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 17 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-18 05:36:33,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3723030.0, ans=0.125 2024-08-18 05:36:55,854 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 200, loss[loss=0.09625, beats_loss=0.01106, ecapa_loss=0.0001484, whisper_loss=0.0837, over 17025.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.009713, ecapa_loss=0.0001486, whisper_loss=0.08921, over 2386921.70 frames. ], batch size: 68, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:36:58,236 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.61 vs. limit=15.0 2024-08-18 05:37:21,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3723330.0, ans=0.125 2024-08-18 05:37:33,350 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.63 vs. limit=15.0 2024-08-18 05:37:41,826 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-18 05:37:46,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3723530.0, ans=0.0 2024-08-18 05:37:49,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3723530.0, ans=0.0 2024-08-18 05:38:06,315 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.91 vs. limit=12.0 2024-08-18 05:38:09,687 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 250, loss[loss=0.1109, beats_loss=0.008047, ecapa_loss=0.0001509, whisper_loss=0.1013, over 17493.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.009905, ecapa_loss=0.0001473, whisper_loss=0.09023, over 2701725.02 frames. ], batch size: 65, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:38:11,200 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 30 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-18 05:38:15,435 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.83 vs. limit=12.0 2024-08-18 05:38:31,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3723830.0, ans=0.04949747468305833 2024-08-18 05:38:33,048 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-18 05:38:34,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3723830.0, ans=0.125 2024-08-18 05:38:34,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3723830.0, ans=0.2 2024-08-18 05:38:49,110 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.790e+01 2.303e+01 2.551e+01 2.928e+01 5.127e+01, threshold=5.101e+01, percent-clipped=0.0 2024-08-18 05:38:58,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3724030.0, ans=0.125 2024-08-18 05:39:19,651 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 300, loss[loss=0.08752, beats_loss=0.01156, ecapa_loss=0.0001267, whisper_loss=0.07469, over 16876.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01014, ecapa_loss=0.0001455, whisper_loss=0.08995, over 2939337.04 frames. ], batch size: 68, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:39:27,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3724230.0, ans=0.125 2024-08-18 05:39:47,026 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 23 from LS+wenet, 32 from Vox, 27 fro AS 2024-08-18 05:39:52,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3724430.0, ans=0.125 2024-08-18 05:39:54,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3724430.0, ans=0.125 2024-08-18 05:39:55,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3724430.0, ans=10.0 2024-08-18 05:40:26,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3724630.0, ans=0.1 2024-08-18 05:40:29,033 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 350, loss[loss=0.1151, beats_loss=0.01087, ecapa_loss=0.0001175, whisper_loss=0.103, over 23763.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01024, ecapa_loss=0.0001454, whisper_loss=0.08943, over 3128205.53 frames. ], batch size: 92, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:40:40,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3724730.0, ans=0.125 2024-08-18 05:41:06,695 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.645e+01 2.139e+01 2.480e+01 2.873e+01 3.431e+01, threshold=4.959e+01, percent-clipped=0.0 2024-08-18 05:41:09,684 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-18 05:41:33,970 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 400, loss[loss=0.1119, beats_loss=0.009678, ecapa_loss=0.0001538, whisper_loss=0.1007, over 17832.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01029, ecapa_loss=0.0001442, whisper_loss=0.08892, over 3283267.79 frames. ], batch size: 70, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:41:34,178 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 17 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-18 05:41:34,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3725230.0, ans=0.0 2024-08-18 05:41:39,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3725230.0, ans=0.125 2024-08-18 05:41:40,246 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 26 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-18 05:41:45,084 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.69 vs. limit=6.0 2024-08-18 05:41:54,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3725330.0, ans=0.2 2024-08-18 05:42:00,633 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-18 05:42:03,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3725430.0, ans=0.1 2024-08-18 05:42:10,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3725430.0, ans=0.125 2024-08-18 05:42:13,930 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 28 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-18 05:42:18,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn2.whiten.whitening_limit, batch_count=3725530.0, ans=22.5 2024-08-18 05:42:22,492 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.14 vs. limit=15.0 2024-08-18 05:42:39,648 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 450, loss[loss=0.09587, beats_loss=0.01199, ecapa_loss=0.0001294, whisper_loss=0.08258, over 17880.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01034, ecapa_loss=0.0001454, whisper_loss=0.08926, over 3413378.31 frames. ], batch size: 72, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:42:46,187 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-18 05:43:17,592 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.230e+01 2.487e+01 2.863e+01 4.267e+01, threshold=4.973e+01, percent-clipped=0.0 2024-08-18 05:43:25,005 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2024-08-18 05:43:45,017 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 500, loss[loss=0.08211, beats_loss=0.01163, ecapa_loss=0.0001558, whisper_loss=0.06892, over 14860.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01032, ecapa_loss=0.0001447, whisper_loss=0.08942, over 3518259.27 frames. ], batch size: 63, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:43:49,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3726230.0, ans=0.125 2024-08-18 05:44:16,631 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.53 vs. limit=22.5 2024-08-18 05:44:44,338 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.41 vs. limit=22.5 2024-08-18 05:44:46,151 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-18 05:44:50,357 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 550, loss[loss=0.09529, beats_loss=0.01013, ecapa_loss=0.0001254, whisper_loss=0.08391, over 17194.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0103, ecapa_loss=0.0001449, whisper_loss=0.08991, over 3594586.78 frames. ], batch size: 65, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:45:06,902 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.53 vs. limit=15.0 2024-08-18 05:45:28,250 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.311e+01 2.532e+01 2.757e+01 3.672e+01, threshold=5.065e+01, percent-clipped=0.0 2024-08-18 05:45:37,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3727030.0, ans=0.125 2024-08-18 05:45:54,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3727230.0, ans=0.125 2024-08-18 05:45:55,379 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 600, loss[loss=0.1255, beats_loss=0.009019, ecapa_loss=0.0001785, whisper_loss=0.1147, over 16765.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01029, ecapa_loss=0.0001449, whisper_loss=0.09049, over 3638440.60 frames. ], batch size: 69, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:46:04,801 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=15.0 2024-08-18 05:46:24,075 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 25 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-18 05:46:41,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3727530.0, ans=0.125 2024-08-18 05:46:55,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3727630.0, ans=0.125 2024-08-18 05:46:57,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3727630.0, ans=0.125 2024-08-18 05:47:00,510 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 650, loss[loss=0.1017, beats_loss=0.007198, ecapa_loss=0.0001636, whisper_loss=0.09288, over 15494.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01027, ecapa_loss=0.0001445, whisper_loss=0.09084, over 3681421.84 frames. ], batch size: 60, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:47:03,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3727730.0, ans=0.125 2024-08-18 05:47:07,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3727730.0, ans=0.125 2024-08-18 05:47:12,203 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-18 05:47:21,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3727830.0, ans=0.04949747468305833 2024-08-18 05:47:24,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3727830.0, ans=0.125 2024-08-18 05:47:26,407 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 27 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-18 05:47:35,908 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 23 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-18 05:47:38,256 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.709e+01 2.282e+01 2.594e+01 2.924e+01 5.519e+01, threshold=5.188e+01, percent-clipped=2.0 2024-08-18 05:47:47,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3728030.0, ans=0.125 2024-08-18 05:47:54,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3728130.0, ans=0.1 2024-08-18 05:48:03,955 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.79 vs. limit=22.5 2024-08-18 05:48:05,902 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 700, loss[loss=0.118, beats_loss=0.008495, ecapa_loss=0.0001513, whisper_loss=0.108, over 14362.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01024, ecapa_loss=0.0001455, whisper_loss=0.09154, over 3706515.74 frames. ], batch size: 53, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:48:06,649 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.13 vs. limit=6.0 2024-08-18 05:48:07,159 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 26 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-18 05:48:09,769 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 28 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-18 05:48:12,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3728230.0, ans=0.2 2024-08-18 05:48:14,826 WARNING [optim.py:496] (0/4) Scaling gradients by 0.0922105461359024, model_norm_threshold=51.87888717651367 2024-08-18 05:48:14,989 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.030e+04, grad_sumsq=4.030e+04, orig_rms_sq=1.000e+00 2024-08-18 05:48:43,075 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-18 05:48:53,478 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.52 vs. limit=22.5 2024-08-18 05:48:55,647 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.60 vs. limit=15.0 2024-08-18 05:48:59,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3728630.0, ans=0.035 2024-08-18 05:49:10,639 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 750, loss[loss=0.0939, beats_loss=0.01, ecapa_loss=0.0001457, whisper_loss=0.08244, over 15873.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01031, ecapa_loss=0.0001447, whisper_loss=0.09127, over 3740847.33 frames. ], batch size: 62, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:49:12,910 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=16.96 vs. limit=15.0 2024-08-18 05:49:15,379 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 05:49:21,562 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.03 vs. limit=12.0 2024-08-18 05:49:22,136 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-18 05:49:35,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3728930.0, ans=0.0 2024-08-18 05:49:36,437 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3728930.0, ans=0.125 2024-08-18 05:49:47,398 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.292e+01 2.475e+01 2.827e+01 5.626e+02, threshold=4.950e+01, percent-clipped=2.0 2024-08-18 05:49:56,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3729030.0, ans=0.2 2024-08-18 05:50:02,720 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 14 from Vox, 43 fro AS 2024-08-18 05:50:05,638 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 20 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-18 05:50:06,170 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.21 vs. limit=15.0 2024-08-18 05:50:08,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3729130.0, ans=0.125 2024-08-18 05:50:15,051 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 14 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-18 05:50:16,227 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 800, loss[loss=0.08324, beats_loss=0.009024, ecapa_loss=0.0001619, whisper_loss=0.0726, over 13458.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01034, ecapa_loss=0.0001445, whisper_loss=0.08977, over 3748287.28 frames. ], batch size: 57, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:50:21,721 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-18 05:50:32,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3729330.0, ans=0.125 2024-08-18 05:50:34,586 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.92 vs. limit=22.5 2024-08-18 05:50:52,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3729430.0, ans=0.0 2024-08-18 05:51:08,356 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.83 vs. limit=15.0 2024-08-18 05:51:22,025 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.94 vs. limit=15.0 2024-08-18 05:51:22,552 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-18 05:51:23,688 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 850, loss[loss=0.09676, beats_loss=0.0079, ecapa_loss=0.0001587, whisper_loss=0.08727, over 17549.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01031, ecapa_loss=0.0001441, whisper_loss=0.08994, over 3748951.89 frames. ], batch size: 68, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:51:41,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3729830.0, ans=0.125 2024-08-18 05:51:47,871 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 20 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-18 05:52:02,224 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.231e+01 2.471e+01 2.825e+01 3.854e+01, threshold=4.942e+01, percent-clipped=0.0 2024-08-18 05:52:11,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3730030.0, ans=0.0 2024-08-18 05:52:30,532 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 900, loss[loss=0.1154, beats_loss=0.008719, ecapa_loss=0.0001824, whisper_loss=0.1048, over 20766.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01027, ecapa_loss=0.000144, whisper_loss=0.08994, over 3759062.58 frames. ], batch size: 84, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:52:33,374 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-18 05:52:37,816 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.07 vs. limit=10.0 2024-08-18 05:52:59,058 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 22 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-18 05:53:02,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3730430.0, ans=0.125 2024-08-18 05:53:03,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3730430.0, ans=0.125 2024-08-18 05:53:03,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3730430.0, ans=0.1 2024-08-18 05:53:12,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3730530.0, ans=0.125 2024-08-18 05:53:23,393 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.52 vs. limit=15.0 2024-08-18 05:53:26,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3730630.0, ans=0.0 2024-08-18 05:53:38,973 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 950, loss[loss=0.08626, beats_loss=0.01372, ecapa_loss=0.0001081, whisper_loss=0.07146, over 18892.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01035, ecapa_loss=0.0001429, whisper_loss=0.08944, over 3795086.17 frames. ], batch size: 72, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:53:39,134 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 20 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-18 05:53:53,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3730830.0, ans=0.1 2024-08-18 05:54:04,529 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 22 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-18 05:54:17,808 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.336e+01 2.576e+01 2.851e+01 4.260e+01, threshold=5.152e+01, percent-clipped=0.0 2024-08-18 05:54:23,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3731030.0, ans=0.1 2024-08-18 05:54:35,170 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-18 05:54:38,368 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 24 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-18 05:54:40,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3731130.0, ans=0.1 2024-08-18 05:54:46,851 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 1000, loss[loss=0.1098, beats_loss=0.01073, ecapa_loss=0.0001539, whisper_loss=0.0975, over 23127.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01042, ecapa_loss=0.0001429, whisper_loss=0.08969, over 3813766.37 frames. ], batch size: 92, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:54:57,449 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 15 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-18 05:54:59,361 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.52 vs. limit=15.0 2024-08-18 05:55:01,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3731330.0, ans=0.125 2024-08-18 05:55:07,412 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 13 from Vox, 43 fro AS 2024-08-18 05:55:07,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3731330.0, ans=0.2 2024-08-18 05:55:08,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3731330.0, ans=0.125 2024-08-18 05:55:32,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.whiten.whitening_limit, batch_count=3731530.0, ans=12.0 2024-08-18 05:55:41,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3731630.0, ans=0.125 2024-08-18 05:55:46,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3731630.0, ans=0.0 2024-08-18 05:55:51,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3731630.0, ans=0.1 2024-08-18 05:55:52,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3731630.0, ans=0.125 2024-08-18 05:55:53,915 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 1050, loss[loss=0.0735, beats_loss=0.01294, ecapa_loss=0.0001407, whisper_loss=0.05915, over 18551.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01041, ecapa_loss=0.000142, whisper_loss=0.09009, over 3810312.08 frames. ], batch size: 78, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:56:21,453 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-18 05:56:31,492 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 32 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-18 05:56:34,276 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 21 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-18 05:56:35,551 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+01 2.334e+01 2.539e+01 2.786e+01 5.351e+01, threshold=5.078e+01, percent-clipped=0.0 2024-08-18 05:56:54,028 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.64 vs. limit=15.0 2024-08-18 05:56:55,424 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.56 vs. limit=15.0 2024-08-18 05:56:56,236 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 24 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-18 05:57:06,313 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 1100, loss[loss=0.1085, beats_loss=0.01017, ecapa_loss=0.0001423, whisper_loss=0.09692, over 23056.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01044, ecapa_loss=0.0001422, whisper_loss=0.09033, over 3829432.90 frames. ], batch size: 94, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:57:18,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3732330.0, ans=0.0 2024-08-18 05:57:19,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3732330.0, ans=0.0 2024-08-18 05:57:32,586 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 05:57:43,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3732430.0, ans=0.125 2024-08-18 05:57:54,598 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.65 vs. limit=15.0 2024-08-18 05:57:58,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3732530.0, ans=0.0 2024-08-18 05:58:16,610 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 1150, loss[loss=0.1115, beats_loss=0.01079, ecapa_loss=0.0001338, whisper_loss=0.09937, over 19430.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01048, ecapa_loss=0.0001415, whisper_loss=0.09016, over 3826043.69 frames. ], batch size: 76, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:58:25,663 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-18 05:58:32,214 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 13 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-18 05:58:36,865 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-18 05:58:43,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3732930.0, ans=0.125 2024-08-18 05:58:44,499 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 25 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-18 05:58:53,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3732930.0, ans=0.0 2024-08-18 05:58:54,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3732930.0, ans=0.125 2024-08-18 05:58:57,164 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.301e+01 2.610e+01 2.950e+01 4.151e+01, threshold=5.220e+01, percent-clipped=1.0 2024-08-18 05:58:57,706 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.70 vs. limit=22.5 2024-08-18 05:59:02,487 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-18 05:59:04,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3733030.0, ans=0.125 2024-08-18 05:59:17,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3733130.0, ans=0.0 2024-08-18 05:59:20,202 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=15.0 2024-08-18 05:59:26,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3733230.0, ans=0.0 2024-08-18 05:59:27,909 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 1200, loss[loss=0.0723, beats_loss=0.01153, ecapa_loss=0.0001627, whisper_loss=0.05915, over 14929.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0106, ecapa_loss=0.0001413, whisper_loss=0.08918, over 3854859.08 frames. ], batch size: 62, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:59:28,914 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.42 vs. limit=15.0 2024-08-18 05:59:31,153 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.69 vs. limit=22.5 2024-08-18 05:59:32,985 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 33 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-18 05:59:33,408 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.69 vs. limit=15.0 2024-08-18 05:59:37,048 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.72 vs. limit=22.5 2024-08-18 05:59:40,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3733330.0, ans=0.125 2024-08-18 06:00:04,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3733430.0, ans=0.125 2024-08-18 06:00:14,413 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-18 06:00:15,618 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 14 from Vox, 43 fro AS 2024-08-18 06:00:40,950 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 1250, loss[loss=0.1254, beats_loss=0.01001, ecapa_loss=0.000171, whisper_loss=0.1136, over 15513.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0106, ecapa_loss=0.0001414, whisper_loss=0.08925, over 3858453.36 frames. ], batch size: 63, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:00:47,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3733730.0, ans=0.125 2024-08-18 06:00:54,634 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.95 vs. limit=15.0 2024-08-18 06:01:09,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3733930.0, ans=0.125 2024-08-18 06:01:18,604 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.44 vs. limit=15.0 2024-08-18 06:01:20,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3733930.0, ans=0.125 2024-08-18 06:01:22,965 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-18 06:01:24,453 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.310e+01 2.549e+01 2.839e+01 4.783e+01, threshold=5.098e+01, percent-clipped=0.0 2024-08-18 06:01:31,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3734030.0, ans=0.1 2024-08-18 06:01:37,797 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-18 06:01:55,071 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 1300, loss[loss=0.06862, beats_loss=0.01275, ecapa_loss=0.0001226, whisper_loss=0.05464, over 19460.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0106, ecapa_loss=0.0001416, whisper_loss=0.0892, over 3866008.32 frames. ], batch size: 79, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:02:00,403 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.96 vs. limit=15.0 2024-08-18 06:02:12,566 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 23 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-18 06:02:19,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3734330.0, ans=0.125 2024-08-18 06:02:36,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3734430.0, ans=0.1 2024-08-18 06:02:46,016 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 10 from Vox, 32 fro AS 2024-08-18 06:02:54,130 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.37 vs. limit=15.0 2024-08-18 06:03:04,075 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.63 vs. limit=15.0 2024-08-18 06:03:12,406 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 1350, loss[loss=0.1027, beats_loss=0.01002, ecapa_loss=0.00017, whisper_loss=0.09102, over 16918.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01057, ecapa_loss=0.000142, whisper_loss=0.08927, over 3863022.93 frames. ], batch size: 67, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:03:19,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3734730.0, ans=0.0 2024-08-18 06:03:19,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3734730.0, ans=0.0 2024-08-18 06:03:20,447 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.52 vs. limit=22.5 2024-08-18 06:04:00,190 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.786e+01 2.254e+01 2.510e+01 2.787e+01 4.431e+01, threshold=5.020e+01, percent-clipped=0.0 2024-08-18 06:04:22,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3735130.0, ans=0.125 2024-08-18 06:04:34,975 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 1400, loss[loss=0.08964, beats_loss=0.01206, ecapa_loss=0.0001667, whisper_loss=0.07592, over 20816.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.0106, ecapa_loss=0.000142, whisper_loss=0.08857, over 3838010.40 frames. ], batch size: 89, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:04:36,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3735230.0, ans=0.125 2024-08-18 06:04:41,263 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-18 06:05:04,775 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.21 vs. limit=15.0 2024-08-18 06:05:12,022 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.82 vs. limit=12.0 2024-08-18 06:05:20,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3735430.0, ans=0.125 2024-08-18 06:05:26,699 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.79 vs. limit=22.5 2024-08-18 06:05:30,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=3735530.0, ans=0.05 2024-08-18 06:05:34,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3735530.0, ans=0.0 2024-08-18 06:05:44,996 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.52 vs. limit=15.0 2024-08-18 06:05:48,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3735630.0, ans=0.125 2024-08-18 06:06:25,081 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 1450, loss[loss=0.08439, beats_loss=0.01255, ecapa_loss=0.0001509, whisper_loss=0.07033, over 20883.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01051, ecapa_loss=0.0001422, whisper_loss=0.08903, over 3865284.53 frames. ], batch size: 88, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:06:34,181 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.22 vs. limit=10.0 2024-08-18 06:06:47,110 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-18 06:07:07,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3735930.0, ans=0.125 2024-08-18 06:07:09,461 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.184e+01 2.399e+01 2.651e+01 6.055e+01, threshold=4.798e+01, percent-clipped=1.0 2024-08-18 06:07:12,345 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-18 06:07:28,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3736130.0, ans=0.125 2024-08-18 06:07:29,581 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-18 06:07:35,270 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 13 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-18 06:07:36,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3736130.0, ans=0.125 2024-08-18 06:07:37,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3736130.0, ans=0.125 2024-08-18 06:07:40,104 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.18 vs. limit=15.0 2024-08-18 06:07:40,715 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 1500, loss[loss=0.1284, beats_loss=0.005853, ecapa_loss=0.0001733, whisper_loss=0.1208, over 22681.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.0105, ecapa_loss=0.0001421, whisper_loss=0.08858, over 3864134.08 frames. ], batch size: 87, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:07:52,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3736230.0, ans=0.0 2024-08-18 06:08:25,107 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-18 06:08:28,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3736530.0, ans=0.125 2024-08-18 06:08:30,682 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 18 from LS+wenet, 9 from Vox, 30 fro AS 2024-08-18 06:08:32,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3736530.0, ans=0.125 2024-08-18 06:08:34,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3736530.0, ans=0.0 2024-08-18 06:08:55,156 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 1550, loss[loss=0.1193, beats_loss=0.008198, ecapa_loss=0.0001708, whisper_loss=0.1094, over 18488.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01048, ecapa_loss=0.000141, whisper_loss=0.08894, over 3846739.80 frames. ], batch size: 75, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:08:59,989 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-18 06:09:09,968 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=15.97 vs. limit=15.0 2024-08-18 06:09:30,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3736930.0, ans=0.125 2024-08-18 06:09:38,791 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+01 2.223e+01 2.492e+01 2.734e+01 3.919e+01, threshold=4.985e+01, percent-clipped=0.0 2024-08-18 06:09:40,284 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 16 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-18 06:09:47,377 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-18 06:09:58,965 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-18 06:10:03,422 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-18 06:10:08,712 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 1600, loss[loss=0.1001, beats_loss=0.00941, ecapa_loss=0.0001371, whisper_loss=0.08935, over 23358.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.0105, ecapa_loss=0.0001412, whisper_loss=0.08839, over 3851088.86 frames. ], batch size: 90, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:10:10,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3737230.0, ans=0.0 2024-08-18 06:10:20,685 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 20 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-18 06:10:23,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3737330.0, ans=0.1 2024-08-18 06:10:26,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3737330.0, ans=0.125 2024-08-18 06:10:45,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3737430.0, ans=0.125 2024-08-18 06:11:01,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3737530.0, ans=0.2 2024-08-18 06:11:17,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3737630.0, ans=0.05 2024-08-18 06:11:20,012 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 1650, loss[loss=0.1061, beats_loss=0.008969, ecapa_loss=0.0001438, whisper_loss=0.09574, over 15452.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.0105, ecapa_loss=0.0001404, whisper_loss=0.08875, over 3816753.90 frames. ], batch size: 59, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:11:44,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3737830.0, ans=0.125 2024-08-18 06:11:47,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3737930.0, ans=0.125 2024-08-18 06:11:50,736 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 25 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-18 06:11:58,787 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.262e+01 2.498e+01 2.828e+01 4.112e+01, threshold=4.996e+01, percent-clipped=0.0 2024-08-18 06:12:09,872 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-18 06:12:15,031 WARNING [optim.py:496] (0/4) Scaling gradients by 0.09963374584913254, model_norm_threshold=49.962425231933594 2024-08-18 06:12:15,195 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.17, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.352e+04, grad_sumsq=4.352e+04, orig_rms_sq=1.000e+00 2024-08-18 06:12:20,959 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-18 06:12:27,726 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 1700, loss[loss=0.104, beats_loss=0.009526, ecapa_loss=0.0001467, whisper_loss=0.09303, over 22851.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01052, ecapa_loss=0.0001393, whisper_loss=0.08872, over 3827299.65 frames. ], batch size: 92, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:12:31,739 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 12 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-18 06:12:43,380 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.67 vs. limit=10.0 2024-08-18 06:12:58,259 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.12 vs. limit=12.0 2024-08-18 06:13:07,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3738530.0, ans=0.2 2024-08-18 06:13:32,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3738630.0, ans=0.125 2024-08-18 06:13:33,577 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 24 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-18 06:13:35,002 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 1750, loss[loss=0.1116, beats_loss=0.008999, ecapa_loss=0.0001394, whisper_loss=0.1012, over 15797.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01042, ecapa_loss=0.0001398, whisper_loss=0.08865, over 3811743.05 frames. ], batch size: 63, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:13:50,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3738830.0, ans=0.125 2024-08-18 06:14:09,169 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 06:14:15,689 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.736e+01 2.240e+01 2.515e+01 2.865e+01 5.015e+02, threshold=5.030e+01, percent-clipped=2.0 2024-08-18 06:14:21,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3739030.0, ans=0.2 2024-08-18 06:14:26,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3739030.0, ans=0.1 2024-08-18 06:14:30,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3739130.0, ans=0.0 2024-08-18 06:14:32,860 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 16 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-18 06:14:34,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3739130.0, ans=0.125 2024-08-18 06:14:42,636 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 1800, loss[loss=0.08404, beats_loss=0.01117, ecapa_loss=0.0001202, whisper_loss=0.07167, over 18574.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01034, ecapa_loss=0.000141, whisper_loss=0.08888, over 3819016.25 frames. ], batch size: 70, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:15:34,887 WARNING [optim.py:496] (0/4) Scaling gradients by 0.04103388637304306, model_norm_threshold=50.29759979248047 2024-08-18 06:15:35,051 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.23, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.438e+05, grad_sumsq=3.364e+07, orig_rms_sq=1.022e-02 2024-08-18 06:15:44,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=3739630.0, ans=0.2 2024-08-18 06:15:49,578 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 1850, loss[loss=0.07602, beats_loss=0.01315, ecapa_loss=0.000111, whisper_loss=0.06176, over 17212.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01042, ecapa_loss=0.0001399, whisper_loss=0.08844, over 3824931.53 frames. ], batch size: 68, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:15:54,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=3739730.0, ans=0.1 2024-08-18 06:15:55,805 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 29 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-18 06:15:56,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3739730.0, ans=0.2 2024-08-18 06:16:08,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3739830.0, ans=0.0 2024-08-18 06:16:29,340 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.281e+01 2.584e+01 3.021e+01 1.226e+03, threshold=5.167e+01, percent-clipped=3.0 2024-08-18 06:16:43,037 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 35 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-18 06:16:44,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3740130.0, ans=0.125 2024-08-18 06:16:48,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3740130.0, ans=0.125 2024-08-18 06:16:50,828 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 17 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-18 06:16:52,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3740130.0, ans=0.07 2024-08-18 06:16:58,384 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 1900, loss[loss=0.0848, beats_loss=0.01206, ecapa_loss=0.0001444, whisper_loss=0.07129, over 21299.00 frames. ], tot_loss[loss=0.09998, beats_loss=0.01044, ecapa_loss=0.0001412, whisper_loss=0.08813, over 3829722.27 frames. ], batch size: 89, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:17:09,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3740230.0, ans=0.1 2024-08-18 06:17:41,300 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 26 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-18 06:17:43,041 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 25 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-18 06:17:53,932 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-18 06:18:05,313 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 1950, loss[loss=0.09852, beats_loss=0.00988, ecapa_loss=0.0001682, whisper_loss=0.08696, over 19746.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01045, ecapa_loss=0.000142, whisper_loss=0.08821, over 3813750.08 frames. ], batch size: 81, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:18:05,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3740730.0, ans=0.0 2024-08-18 06:18:22,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3740830.0, ans=0.125 2024-08-18 06:18:39,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3740930.0, ans=0.0 2024-08-18 06:18:43,505 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.770e+01 2.240e+01 2.452e+01 2.849e+01 7.205e+01, threshold=4.903e+01, percent-clipped=1.0 2024-08-18 06:18:54,924 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.10 vs. limit=15.0 2024-08-18 06:19:11,481 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 2000, loss[loss=0.1056, beats_loss=0.009404, ecapa_loss=0.0001385, whisper_loss=0.09481, over 22112.00 frames. ], tot_loss[loss=0.09995, beats_loss=0.01038, ecapa_loss=0.0001424, whisper_loss=0.08814, over 3791238.98 frames. ], batch size: 88, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:19:22,205 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 29 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-18 06:19:26,816 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.81 vs. limit=15.0 2024-08-18 06:19:30,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3741330.0, ans=0.2 2024-08-18 06:19:34,605 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.69 vs. limit=22.5 2024-08-18 06:19:45,284 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-18 06:19:50,572 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 24 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-18 06:19:58,662 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 26 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-18 06:20:06,641 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-18 06:20:09,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3741630.0, ans=0.0 2024-08-18 06:20:16,805 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 2050, loss[loss=0.09381, beats_loss=0.01146, ecapa_loss=0.0001142, whisper_loss=0.08121, over 16738.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01043, ecapa_loss=0.0001414, whisper_loss=0.08832, over 3798622.43 frames. ], batch size: 65, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:20:18,312 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 29 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-18 06:20:23,202 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-18 06:20:27,203 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 18 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-18 06:20:33,664 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 21 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-18 06:20:46,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3741930.0, ans=0.0 2024-08-18 06:20:46,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3741930.0, ans=0.125 2024-08-18 06:20:54,432 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.321e+01 2.575e+01 2.805e+01 3.958e+01, threshold=5.151e+01, percent-clipped=0.0 2024-08-18 06:20:57,483 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.643e+00 2024-08-18 06:21:07,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3742030.0, ans=0.125 2024-08-18 06:21:12,270 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.08 vs. limit=15.0 2024-08-18 06:21:17,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3742130.0, ans=0.2 2024-08-18 06:21:23,081 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 2100, loss[loss=0.1244, beats_loss=0.00933, ecapa_loss=0.0001338, whisper_loss=0.1138, over 21612.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01042, ecapa_loss=0.0001413, whisper_loss=0.08907, over 3794063.28 frames. ], batch size: 81, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:21:47,634 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-18 06:22:00,086 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-18 06:22:11,266 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 19 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-18 06:22:12,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3742630.0, ans=0.125 2024-08-18 06:22:24,997 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.45 vs. limit=15.0 2024-08-18 06:22:26,802 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 2150, loss[loss=0.1315, beats_loss=0.00784, ecapa_loss=0.0001546, whisper_loss=0.1221, over 18887.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01043, ecapa_loss=0.0001406, whisper_loss=0.08968, over 3810551.05 frames. ], batch size: 69, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:22:28,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3742730.0, ans=0.0 2024-08-18 06:22:33,841 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=9.455e+00 2024-08-18 06:22:34,832 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 24 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-18 06:22:53,781 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 15 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-18 06:22:55,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3742930.0, ans=0.125 2024-08-18 06:23:07,303 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.693e+01 2.305e+01 2.600e+01 2.971e+01 3.562e+02, threshold=5.201e+01, percent-clipped=4.0 2024-08-18 06:23:13,468 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-18 06:23:39,407 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 2200, loss[loss=0.09258, beats_loss=0.01431, ecapa_loss=0.0001142, whisper_loss=0.07712, over 16897.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01047, ecapa_loss=0.0001401, whisper_loss=0.08985, over 3812412.59 frames. ], batch size: 69, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:23:45,195 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 14 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-18 06:24:12,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3743430.0, ans=0.1 2024-08-18 06:24:12,514 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.08 vs. limit=6.0 2024-08-18 06:24:59,758 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 2250, loss[loss=0.1089, beats_loss=0.01126, ecapa_loss=0.0001399, whisper_loss=0.0962, over 22051.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01051, ecapa_loss=0.0001406, whisper_loss=0.09076, over 3866332.27 frames. ], batch size: 89, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:25:41,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3743930.0, ans=0.1 2024-08-18 06:25:47,166 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.275e+01 2.589e+01 2.957e+01 4.064e+01, threshold=5.179e+01, percent-clipped=0.0 2024-08-18 06:25:55,658 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 27 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-18 06:25:59,486 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 24 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-18 06:26:07,732 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 32 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-18 06:26:20,397 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 2300, loss[loss=0.1126, beats_loss=0.01306, ecapa_loss=0.0001078, whisper_loss=0.09841, over 17986.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01048, ecapa_loss=0.0001414, whisper_loss=0.09133, over 3870484.23 frames. ], batch size: 70, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:26:24,970 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.58 vs. limit=15.0 2024-08-18 06:26:28,138 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 19 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-18 06:26:30,975 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 06:26:51,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3744330.0, ans=0.0 2024-08-18 06:27:01,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3744430.0, ans=0.125 2024-08-18 06:27:03,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3744430.0, ans=0.0 2024-08-18 06:27:04,727 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 06:27:29,387 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-18 06:27:39,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3744630.0, ans=0.2 2024-08-18 06:27:42,111 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 2350, loss[loss=0.116, beats_loss=0.008937, ecapa_loss=0.000183, whisper_loss=0.1052, over 21809.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01041, ecapa_loss=0.0001429, whisper_loss=0.09177, over 3864790.35 frames. ], batch size: 88, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:27:55,374 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 23 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-18 06:27:57,138 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 25 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-18 06:27:58,803 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 19 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-18 06:28:29,612 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.717e+01 2.243e+01 2.499e+01 2.727e+01 3.618e+01, threshold=4.998e+01, percent-clipped=0.0 2024-08-18 06:29:04,263 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 2400, loss[loss=0.07941, beats_loss=0.01047, ecapa_loss=0.0001602, whisper_loss=0.06734, over 17034.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01048, ecapa_loss=0.0001428, whisper_loss=0.09047, over 3861897.43 frames. ], batch size: 70, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:29:12,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3745230.0, ans=0.0 2024-08-18 06:29:14,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3745230.0, ans=0.125 2024-08-18 06:29:14,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3745230.0, ans=0.1 2024-08-18 06:29:17,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3745230.0, ans=0.05 2024-08-18 06:29:33,195 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 22 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-18 06:29:40,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3745430.0, ans=0.125 2024-08-18 06:30:08,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3745630.0, ans=0.1 2024-08-18 06:30:19,376 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 2450, loss[loss=0.1224, beats_loss=0.009143, ecapa_loss=0.0001676, whisper_loss=0.1115, over 21305.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01046, ecapa_loss=0.0001436, whisper_loss=0.09086, over 3872764.44 frames. ], batch size: 87, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:30:26,241 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 14 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-18 06:30:29,010 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-18 06:30:35,498 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.02 vs. limit=15.0 2024-08-18 06:30:51,822 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 29 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-18 06:31:08,961 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.751e+01 2.283e+01 2.536e+01 2.779e+01 5.169e+01, threshold=5.071e+01, percent-clipped=0.0 2024-08-18 06:31:25,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3746130.0, ans=0.0 2024-08-18 06:31:36,360 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 19 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-18 06:31:40,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3746230.0, ans=0.1 2024-08-18 06:31:40,861 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 2500, loss[loss=0.09254, beats_loss=0.01129, ecapa_loss=0.0001298, whisper_loss=0.07995, over 16047.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01047, ecapa_loss=0.0001424, whisper_loss=0.09065, over 3869918.90 frames. ], batch size: 62, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:31:50,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3746230.0, ans=0.1 2024-08-18 06:31:54,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3746330.0, ans=0.0 2024-08-18 06:31:56,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3746330.0, ans=0.2 2024-08-18 06:31:56,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3746330.0, ans=0.0 2024-08-18 06:31:56,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3746330.0, ans=0.125 2024-08-18 06:31:58,415 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.03 vs. limit=15.0 2024-08-18 06:32:02,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3746330.0, ans=0.0 2024-08-18 06:32:03,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3746330.0, ans=0.125 2024-08-18 06:32:15,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3746430.0, ans=0.0 2024-08-18 06:32:28,004 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-18 06:32:51,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3746630.0, ans=0.125 2024-08-18 06:32:53,642 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 31 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-18 06:32:57,798 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 2550, loss[loss=0.1147, beats_loss=0.008816, ecapa_loss=0.0001475, whisper_loss=0.1044, over 15587.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01045, ecapa_loss=0.0001425, whisper_loss=0.09073, over 3863962.49 frames. ], batch size: 60, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:32:58,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3746730.0, ans=0.2 2024-08-18 06:33:00,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3746730.0, ans=0.125 2024-08-18 06:33:18,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3746830.0, ans=0.125 2024-08-18 06:33:27,372 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 23 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-18 06:33:29,482 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2024-08-18 06:33:32,097 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 22 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-18 06:33:41,715 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.402e+01 2.684e+01 2.879e+01 3.926e+01, threshold=5.368e+01, percent-clipped=1.0 2024-08-18 06:34:09,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3747130.0, ans=0.2 2024-08-18 06:34:13,255 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 2600, loss[loss=0.1316, beats_loss=0.00741, ecapa_loss=0.0001703, whisper_loss=0.1225, over 15060.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01052, ecapa_loss=0.0001413, whisper_loss=0.09088, over 3873685.71 frames. ], batch size: 57, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:34:24,154 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.80 vs. limit=22.5 2024-08-18 06:34:35,052 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 18 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-18 06:34:42,904 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 31 from Vox, 30 fro AS 2024-08-18 06:34:44,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3747430.0, ans=0.125 2024-08-18 06:34:51,807 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.60 vs. limit=6.0 2024-08-18 06:35:13,925 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-18 06:35:15,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3747630.0, ans=0.0 2024-08-18 06:35:29,323 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 2650, loss[loss=0.1222, beats_loss=0.009937, ecapa_loss=0.0001687, whisper_loss=0.1106, over 21542.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0105, ecapa_loss=0.0001431, whisper_loss=0.09073, over 3885663.94 frames. ], batch size: 85, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:35:35,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3747730.0, ans=0.125 2024-08-18 06:35:47,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3747830.0, ans=0.125 2024-08-18 06:35:53,515 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-18 06:35:56,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3747830.0, ans=0.1 2024-08-18 06:36:03,146 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 12 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-18 06:36:12,413 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.431e+01 2.791e+01 3.155e+01 3.699e+02, threshold=5.582e+01, percent-clipped=1.0 2024-08-18 06:36:16,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3748030.0, ans=0.125 2024-08-18 06:36:17,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3748030.0, ans=0.0 2024-08-18 06:36:17,546 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.81 vs. limit=15.0 2024-08-18 06:36:30,661 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 20 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-18 06:36:35,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3748130.0, ans=0.0 2024-08-18 06:36:43,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3748130.0, ans=0.0 2024-08-18 06:36:43,585 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=15.0 2024-08-18 06:36:46,371 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 2700, loss[loss=0.0945, beats_loss=0.01066, ecapa_loss=0.0001173, whisper_loss=0.08267, over 16602.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01051, ecapa_loss=0.0001426, whisper_loss=0.09068, over 3884429.76 frames. ], batch size: 63, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:36:53,910 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-18 06:36:54,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3748230.0, ans=0.125 2024-08-18 06:36:56,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3748230.0, ans=0.0 2024-08-18 06:36:58,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3748230.0, ans=0.125 2024-08-18 06:37:08,069 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 34 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-18 06:37:12,613 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-18 06:37:16,223 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2024-08-18 06:37:17,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3748430.0, ans=0.0 2024-08-18 06:37:32,227 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-18 06:38:05,076 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 2750, loss[loss=0.07387, beats_loss=0.01063, ecapa_loss=0.0002213, whisper_loss=0.06103, over 19268.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01041, ecapa_loss=0.0001442, whisper_loss=0.09101, over 3898852.29 frames. ], batch size: 87, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:38:28,480 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.11 vs. limit=10.0 2024-08-18 06:38:45,351 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.25 vs. limit=15.0 2024-08-18 06:38:48,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3748930.0, ans=0.1 2024-08-18 06:38:49,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3748930.0, ans=0.95 2024-08-18 06:38:49,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3748930.0, ans=0.125 2024-08-18 06:38:51,047 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.293e+01 2.523e+01 2.815e+01 3.785e+01, threshold=5.047e+01, percent-clipped=0.0 2024-08-18 06:38:54,544 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-18 06:38:58,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3749030.0, ans=0.125 2024-08-18 06:39:07,571 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3749030.0, ans=0.0 2024-08-18 06:39:21,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3749130.0, ans=0.125 2024-08-18 06:39:28,576 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 2800, loss[loss=0.08326, beats_loss=0.01279, ecapa_loss=0.0001275, whisper_loss=0.06919, over 19415.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01051, ecapa_loss=0.0001441, whisper_loss=0.09051, over 3880388.28 frames. ], batch size: 81, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:39:39,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3749230.0, ans=0.125 2024-08-18 06:39:43,305 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-18 06:39:43,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3749330.0, ans=0.0 2024-08-18 06:39:51,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3749330.0, ans=0.125 2024-08-18 06:40:08,304 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 15 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-18 06:40:13,206 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 11 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-18 06:40:14,050 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.43 vs. limit=15.0 2024-08-18 06:40:24,868 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-18 06:40:33,630 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-18 06:40:40,872 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 14 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-18 06:40:48,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3749630.0, ans=0.125 2024-08-18 06:40:51,335 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 2850, loss[loss=0.1148, beats_loss=0.009298, ecapa_loss=0.0001561, whisper_loss=0.104, over 21355.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0105, ecapa_loss=0.0001439, whisper_loss=0.09074, over 3877701.30 frames. ], batch size: 89, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:41:03,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3749730.0, ans=0.0 2024-08-18 06:41:10,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3749830.0, ans=0.125 2024-08-18 06:41:10,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3749830.0, ans=10.0 2024-08-18 06:41:19,810 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.21 vs. limit=15.0 2024-08-18 06:41:21,325 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 36 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-18 06:41:42,702 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.928e+01 2.356e+01 2.615e+01 2.993e+01 1.081e+02, threshold=5.230e+01, percent-clipped=3.0 2024-08-18 06:41:52,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3750030.0, ans=0.2 2024-08-18 06:42:04,733 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 19 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-18 06:42:09,829 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-18 06:42:13,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3750130.0, ans=0.125 2024-08-18 06:42:16,849 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 2900, loss[loss=0.1111, beats_loss=0.009892, ecapa_loss=0.0001709, whisper_loss=0.09951, over 21579.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01054, ecapa_loss=0.0001445, whisper_loss=0.09052, over 3893464.50 frames. ], batch size: 88, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:42:19,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3750230.0, ans=0.125 2024-08-18 06:42:19,495 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.05 vs. limit=15.0 2024-08-18 06:42:25,483 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-18 06:42:40,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3750330.0, ans=0.2 2024-08-18 06:42:46,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3750430.0, ans=0.1 2024-08-18 06:43:20,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3750630.0, ans=0.1 2024-08-18 06:43:21,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3750630.0, ans=0.125 2024-08-18 06:43:29,696 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 2950, loss[loss=0.1068, beats_loss=0.008437, ecapa_loss=0.000166, whisper_loss=0.09666, over 22978.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01049, ecapa_loss=0.0001458, whisper_loss=0.0907, over 3907849.24 frames. ], batch size: 93, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:43:35,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3750730.0, ans=0.2 2024-08-18 06:43:47,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3750830.0, ans=0.125 2024-08-18 06:43:50,155 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 29 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-18 06:43:50,642 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.16 vs. limit=15.0 2024-08-18 06:43:50,948 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.60 vs. limit=15.0 2024-08-18 06:44:00,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3750930.0, ans=0.0 2024-08-18 06:44:06,406 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.27 vs. limit=15.0 2024-08-18 06:44:09,587 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.710e+01 2.348e+01 2.612e+01 2.875e+01 5.578e+01, threshold=5.225e+01, percent-clipped=1.0 2024-08-18 06:44:21,559 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-18 06:44:34,146 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.15 vs. limit=22.5 2024-08-18 06:44:35,901 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 3000, loss[loss=0.07274, beats_loss=0.01115, ecapa_loss=0.0001634, whisper_loss=0.05995, over 16099.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01043, ecapa_loss=0.0001453, whisper_loss=0.09125, over 3952991.69 frames. ], batch size: 67, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:44:35,902 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-18 06:45:15,267 INFO [train_multi_KD3.py:1149] (0/4) Epoch 26, validation on ASR_libri: loss=0.2538, beats_loss=0, ecapa_loss=0.0005294, whisper_loss=0.2485, over 922467.00 frames. 2024-08-18 06:45:31,849 INFO [train_multi_KD3.py:1149] (0/4) Epoch 26, validation on SV_voxceleb1: loss=0.004081, beats_loss=0, ecapa_loss=0.0004081, whisper_loss=0, over 939242.00 frames. 2024-08-18 06:47:15,145 INFO [train_multi_KD3.py:1149] (0/4) Epoch 26, validation on AT_audioset: loss=0.02323, beats_loss=0.02323, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 06:47:15,151 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32389MB 2024-08-18 06:47:40,205 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-18 06:47:47,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3751430.0, ans=0.0 2024-08-18 06:47:54,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3751530.0, ans=0.125 2024-08-18 06:48:21,847 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 3050, loss[loss=0.09971, beats_loss=0.009201, ecapa_loss=0.0001359, whisper_loss=0.08915, over 14457.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01049, ecapa_loss=0.0001456, whisper_loss=0.09074, over 3971896.72 frames. ], batch size: 56, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:48:43,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3751830.0, ans=0.125 2024-08-18 06:48:47,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3751930.0, ans=0.125 2024-08-18 06:49:00,877 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-18 06:49:01,922 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.301e+01 2.557e+01 2.884e+01 4.342e+01, threshold=5.114e+01, percent-clipped=0.0 2024-08-18 06:49:05,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3752030.0, ans=0.125 2024-08-18 06:49:05,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3752030.0, ans=0.025 2024-08-18 06:49:21,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3752130.0, ans=0.1 2024-08-18 06:49:28,774 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 3100, loss[loss=0.1224, beats_loss=0.007963, ecapa_loss=0.0001472, whisper_loss=0.113, over 22190.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01047, ecapa_loss=0.0001452, whisper_loss=0.09149, over 3944827.16 frames. ], batch size: 83, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:49:37,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3752230.0, ans=0.125 2024-08-18 06:50:05,721 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 22 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-18 06:50:13,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3752530.0, ans=0.1 2024-08-18 06:50:26,983 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 22 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-18 06:50:36,417 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 3150, loss[loss=0.09645, beats_loss=0.01135, ecapa_loss=0.0001363, whisper_loss=0.08373, over 17129.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01055, ecapa_loss=0.0001447, whisper_loss=0.09128, over 3920156.75 frames. ], batch size: 69, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:50:39,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3752730.0, ans=0.0 2024-08-18 06:51:16,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3753030.0, ans=0.07 2024-08-18 06:51:17,308 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.299e+01 2.506e+01 2.759e+01 4.272e+01, threshold=5.011e+01, percent-clipped=0.0 2024-08-18 06:51:29,276 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.18 vs. limit=15.0 2024-08-18 06:51:30,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3753130.0, ans=0.0 2024-08-18 06:51:33,418 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.86 vs. limit=15.0 2024-08-18 06:51:36,721 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-18 06:51:36,968 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3753130.0, ans=0.125 2024-08-18 06:51:44,059 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 3200, loss[loss=0.1148, beats_loss=0.01018, ecapa_loss=0.0001491, whisper_loss=0.1031, over 24345.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01051, ecapa_loss=0.0001449, whisper_loss=0.09175, over 3886532.88 frames. ], batch size: 94, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:51:57,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3753330.0, ans=0.0 2024-08-18 06:52:02,371 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 16 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-18 06:52:21,286 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 18 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-18 06:52:27,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3753530.0, ans=0.125 2024-08-18 06:52:40,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3753630.0, ans=0.025 2024-08-18 06:52:50,679 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 3250, loss[loss=0.1172, beats_loss=0.01068, ecapa_loss=0.0001394, whisper_loss=0.1051, over 23779.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01045, ecapa_loss=0.0001453, whisper_loss=0.09239, over 3902131.40 frames. ], batch size: 94, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:53:00,208 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-18 06:53:24,750 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.22 vs. limit=15.0 2024-08-18 06:53:28,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3753930.0, ans=0.09899494936611666 2024-08-18 06:53:30,577 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.721e+01 2.285e+01 2.505e+01 2.811e+01 5.288e+01, threshold=5.010e+01, percent-clipped=1.0 2024-08-18 06:53:42,164 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.04 vs. limit=22.5 2024-08-18 06:53:45,314 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 24 from Vox, 18 fro AS 2024-08-18 06:53:46,047 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=15.97 vs. limit=15.0 2024-08-18 06:53:57,384 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 3300, loss[loss=0.09261, beats_loss=0.01095, ecapa_loss=0.0001218, whisper_loss=0.08044, over 17754.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01044, ecapa_loss=0.0001459, whisper_loss=0.09208, over 3890137.09 frames. ], batch size: 70, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:54:30,823 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 31 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-18 06:54:32,257 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 18 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-18 06:54:52,320 WARNING [optim.py:496] (0/4) Scaling gradients by 0.08354800194501877, model_norm_threshold=50.10049057006836 2024-08-18 06:54:52,485 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.20, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.148e+04, grad_sumsq=7.148e+04, orig_rms_sq=1.000e+00 2024-08-18 06:54:55,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3754630.0, ans=0.0 2024-08-18 06:55:05,909 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 3350, loss[loss=0.1086, beats_loss=0.01091, ecapa_loss=0.0001884, whisper_loss=0.09578, over 22591.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01041, ecapa_loss=0.0001464, whisper_loss=0.09174, over 3880245.31 frames. ], batch size: 94, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:55:07,540 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 17 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-18 06:55:07,850 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=6.396e-02 2024-08-18 06:55:12,346 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.26 vs. limit=12.0 2024-08-18 06:55:43,607 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-18 06:55:45,979 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.771e+01 2.324e+01 2.611e+01 2.907e+01 5.997e+02, threshold=5.222e+01, percent-clipped=2.0 2024-08-18 06:55:47,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3755030.0, ans=0.125 2024-08-18 06:55:51,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3755030.0, ans=0.125 2024-08-18 06:55:53,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=3755030.0, ans=0.5 2024-08-18 06:55:53,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3755030.0, ans=0.1 2024-08-18 06:55:55,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3755030.0, ans=0.0 2024-08-18 06:55:56,903 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-18 06:56:01,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3755130.0, ans=0.09899494936611666 2024-08-18 06:56:08,701 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 20 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-18 06:56:12,463 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 3400, loss[loss=0.08649, beats_loss=0.01191, ecapa_loss=0.0001632, whisper_loss=0.07295, over 17536.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01052, ecapa_loss=0.0001461, whisper_loss=0.09039, over 3855595.59 frames. ], batch size: 75, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:56:19,632 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-18 06:56:21,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3755230.0, ans=0.1 2024-08-18 06:56:32,594 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 14 from LS+wenet, 25 from Vox, 49 fro AS 2024-08-18 06:56:43,037 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 12 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-18 06:56:48,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3755430.0, ans=0.015 2024-08-18 06:57:02,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3755530.0, ans=0.0 2024-08-18 06:57:08,124 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-18 06:57:08,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3755530.0, ans=0.1 2024-08-18 06:57:15,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3755630.0, ans=0.125 2024-08-18 06:57:26,963 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 3450, loss[loss=0.1002, beats_loss=0.01269, ecapa_loss=0.0001198, whisper_loss=0.08627, over 17490.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01068, ecapa_loss=0.0001463, whisper_loss=0.08979, over 3877378.64 frames. ], batch size: 70, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 06:57:34,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3755730.0, ans=0.125 2024-08-18 06:57:38,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3755730.0, ans=0.125 2024-08-18 06:57:44,893 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-18 06:57:48,082 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 23 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-18 06:57:48,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3755830.0, ans=0.0 2024-08-18 06:58:09,871 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.45 vs. limit=15.0 2024-08-18 06:58:15,254 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 15 from LS+wenet, 26 from Vox, 53 fro AS 2024-08-18 06:58:19,433 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.47 vs. limit=15.0 2024-08-18 06:58:21,268 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.293e+01 2.573e+01 2.898e+01 3.051e+02, threshold=5.147e+01, percent-clipped=2.0 2024-08-18 06:58:23,538 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.36 vs. limit=12.0 2024-08-18 06:58:37,728 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-18 06:58:50,839 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-18 06:58:54,591 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 3500, loss[loss=0.09927, beats_loss=0.01085, ecapa_loss=0.0001412, whisper_loss=0.08701, over 18884.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01073, ecapa_loss=0.0001463, whisper_loss=0.08897, over 3857793.05 frames. ], batch size: 76, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 06:58:54,883 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 36 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-18 06:59:06,529 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-18 06:59:20,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3756330.0, ans=0.125 2024-08-18 06:59:25,246 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=27.80 vs. limit=22.5 2024-08-18 06:59:31,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3756330.0, ans=0.125 2024-08-18 06:59:46,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3756430.0, ans=0.04949747468305833 2024-08-18 06:59:46,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3756430.0, ans=0.125 2024-08-18 06:59:54,782 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.29 vs. limit=15.0 2024-08-18 06:59:58,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3756530.0, ans=0.1 2024-08-18 07:00:05,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3756530.0, ans=0.0 2024-08-18 07:00:33,093 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 20 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-18 07:00:35,047 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 3550, loss[loss=0.08739, beats_loss=0.01145, ecapa_loss=0.0001458, whisper_loss=0.07448, over 19098.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0106, ecapa_loss=0.0001456, whisper_loss=0.08975, over 3866665.35 frames. ], batch size: 79, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:00:41,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3756730.0, ans=0.0 2024-08-18 07:00:43,762 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3756730.0, ans=0.2 2024-08-18 07:00:53,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3756830.0, ans=0.2 2024-08-18 07:01:00,143 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.82 vs. limit=5.0 2024-08-18 07:01:13,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3756930.0, ans=0.0 2024-08-18 07:01:35,618 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.288e+01 2.493e+01 2.847e+01 8.952e+01, threshold=4.987e+01, percent-clipped=1.0 2024-08-18 07:01:58,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3757130.0, ans=0.125 2024-08-18 07:02:06,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3757130.0, ans=0.1 2024-08-18 07:02:08,521 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 07:02:09,433 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 3600, loss[loss=0.1349, beats_loss=0.008628, ecapa_loss=0.0001527, whisper_loss=0.1247, over 24594.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01064, ecapa_loss=0.0001454, whisper_loss=0.08952, over 3853006.85 frames. ], batch size: 94, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:02:11,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3757230.0, ans=0.125 2024-08-18 07:02:16,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3757230.0, ans=0.2 2024-08-18 07:02:19,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3757230.0, ans=0.07 2024-08-18 07:02:28,367 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 35 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-18 07:02:37,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3757430.0, ans=0.0 2024-08-18 07:02:39,642 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2024-08-18 07:02:53,059 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-18 07:03:02,026 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 8 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-18 07:03:04,789 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 26 from LS+wenet, 28 from Vox, 22 fro AS 2024-08-18 07:03:06,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3757630.0, ans=0.0 2024-08-18 07:03:18,846 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 3650, loss[loss=0.1174, beats_loss=0.01008, ecapa_loss=0.0001539, whisper_loss=0.1058, over 21263.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01058, ecapa_loss=0.0001459, whisper_loss=0.09009, over 3866046.43 frames. ], batch size: 88, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:03:27,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3757730.0, ans=0.125 2024-08-18 07:03:30,919 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 25 from LS+wenet, 17 from Vox, 16 fro AS 2024-08-18 07:03:43,557 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-18 07:03:45,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3757930.0, ans=0.125 2024-08-18 07:03:46,461 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-18 07:04:02,326 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.285e+01 2.488e+01 2.673e+01 1.240e+02, threshold=4.975e+01, percent-clipped=2.0 2024-08-18 07:04:19,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3758130.0, ans=0.125 2024-08-18 07:04:29,709 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 3700, loss[loss=0.08765, beats_loss=0.01053, ecapa_loss=0.0001448, whisper_loss=0.07567, over 19472.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01045, ecapa_loss=0.000146, whisper_loss=0.0911, over 3839530.69 frames. ], batch size: 78, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:04:48,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3758330.0, ans=0.1 2024-08-18 07:05:12,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3758530.0, ans=0.1 2024-08-18 07:05:26,240 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 30 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-18 07:05:39,457 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 3750, loss[loss=0.1168, beats_loss=0.0103, ecapa_loss=0.0001259, whisper_loss=0.1052, over 17534.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01058, ecapa_loss=0.0001453, whisper_loss=0.0911, over 3837000.12 frames. ], batch size: 67, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:05:59,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3758830.0, ans=0.0 2024-08-18 07:06:12,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3758930.0, ans=0.125 2024-08-18 07:06:22,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3759030.0, ans=0.1 2024-08-18 07:06:22,854 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.005e+01 2.345e+01 2.587e+01 2.858e+01 2.449e+02, threshold=5.174e+01, percent-clipped=2.0 2024-08-18 07:06:34,023 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-18 07:06:47,871 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 3800, loss[loss=0.1363, beats_loss=0.008026, ecapa_loss=0.0001451, whisper_loss=0.1268, over 25240.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01062, ecapa_loss=0.0001449, whisper_loss=0.09099, over 3869071.54 frames. ], batch size: 96, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:06:49,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3759230.0, ans=0.1 2024-08-18 07:06:58,444 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-18 07:07:06,058 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 28 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-18 07:07:19,484 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 34 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-18 07:07:52,251 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 3850, loss[loss=0.09053, beats_loss=0.009643, ecapa_loss=0.0001461, whisper_loss=0.07942, over 14502.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01053, ecapa_loss=0.0001463, whisper_loss=0.09135, over 3874547.56 frames. ], batch size: 58, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:08:00,718 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.137e+01 2024-08-18 07:08:01,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3759730.0, ans=0.2 2024-08-18 07:08:26,147 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-376000.pt 2024-08-18 07:08:34,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3760030.0, ans=0.09899494936611666 2024-08-18 07:08:35,275 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.791e+01 2.367e+01 2.592e+01 2.770e+01 3.521e+02, threshold=5.184e+01, percent-clipped=1.0 2024-08-18 07:08:35,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3760030.0, ans=0.09899494936611666 2024-08-18 07:08:50,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3760130.0, ans=0.125 2024-08-18 07:08:56,554 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.27 vs. limit=22.5 2024-08-18 07:08:57,594 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.762e+01 2024-08-18 07:08:59,611 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 3900, loss[loss=0.09624, beats_loss=0.009506, ecapa_loss=0.0001421, whisper_loss=0.08531, over 19771.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01048, ecapa_loss=0.0001465, whisper_loss=0.09153, over 3887119.38 frames. ], batch size: 79, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:09:01,394 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 18 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-18 07:09:06,580 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-18 07:09:08,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3760230.0, ans=0.0 2024-08-18 07:09:50,790 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.08 vs. limit=10.0 2024-08-18 07:09:53,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3760630.0, ans=0.1 2024-08-18 07:10:00,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3760630.0, ans=0.125 2024-08-18 07:10:04,627 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 3950, loss[loss=0.1308, beats_loss=0.00761, ecapa_loss=0.0001804, whisper_loss=0.1214, over 23305.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0105, ecapa_loss=0.0001473, whisper_loss=0.09167, over 3892404.27 frames. ], batch size: 95, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:10:09,744 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 21 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-18 07:10:10,203 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.77 vs. limit=12.0 2024-08-18 07:10:44,187 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.379e+01 2.624e+01 2.905e+01 3.854e+01, threshold=5.247e+01, percent-clipped=0.0 2024-08-18 07:10:51,024 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-18 07:11:00,959 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 30 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-18 07:11:01,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3761130.0, ans=0.1 2024-08-18 07:11:05,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3761130.0, ans=0.0 2024-08-18 07:11:08,689 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 4000, loss[loss=0.1003, beats_loss=0.01255, ecapa_loss=0.000117, whisper_loss=0.08656, over 20014.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01049, ecapa_loss=0.0001472, whisper_loss=0.09135, over 3913705.60 frames. ], batch size: 81, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:11:19,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3761230.0, ans=0.0 2024-08-18 07:11:22,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=3761330.0, ans=10.0 2024-08-18 07:11:25,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3761330.0, ans=0.04949747468305833 2024-08-18 07:11:27,875 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 24 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-18 07:11:32,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3761330.0, ans=0.025 2024-08-18 07:11:56,320 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 15 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-18 07:12:04,058 WARNING [optim.py:496] (0/4) Scaling gradients by 0.08813267201185226, model_norm_threshold=52.47489929199219 2024-08-18 07:12:04,300 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.12, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.229e+04, grad_sumsq=4.229e+04, orig_rms_sq=1.000e+00 2024-08-18 07:12:07,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3761630.0, ans=0.0 2024-08-18 07:12:07,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3761630.0, ans=0.125 2024-08-18 07:12:13,326 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 4050, loss[loss=0.1031, beats_loss=0.008138, ecapa_loss=0.0001831, whisper_loss=0.09314, over 15073.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01043, ecapa_loss=0.0001474, whisper_loss=0.09211, over 3912363.40 frames. ], batch size: 60, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:12:13,616 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 28 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-18 07:12:20,329 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.51 vs. limit=15.0 2024-08-18 07:12:31,136 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 35 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-18 07:12:33,577 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 13 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-18 07:12:38,799 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-18 07:12:38,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3761930.0, ans=0.125 2024-08-18 07:12:40,410 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.60 vs. limit=10.0 2024-08-18 07:12:54,508 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.935e+01 2.303e+01 2.573e+01 2.954e+01 5.954e+02, threshold=5.146e+01, percent-clipped=3.0 2024-08-18 07:13:18,390 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 4100, loss[loss=0.1147, beats_loss=0.009061, ecapa_loss=0.0001336, whisper_loss=0.1044, over 23655.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01051, ecapa_loss=0.0001476, whisper_loss=0.09222, over 3915011.52 frames. ], batch size: 91, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:13:19,757 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-18 07:13:23,564 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 32 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-18 07:13:26,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3762230.0, ans=0.0 2024-08-18 07:13:48,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3762430.0, ans=0.125 2024-08-18 07:13:49,130 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 18 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-18 07:14:03,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3762530.0, ans=0.125 2024-08-18 07:14:08,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3762630.0, ans=0.1 2024-08-18 07:14:22,254 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 4150, loss[loss=0.1021, beats_loss=0.00983, ecapa_loss=0.0001683, whisper_loss=0.09061, over 16745.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01046, ecapa_loss=0.0001491, whisper_loss=0.092, over 3904525.79 frames. ], batch size: 67, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:14:31,698 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 25 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-18 07:14:38,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3762830.0, ans=0.125 2024-08-18 07:14:43,176 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 22 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-18 07:15:01,862 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.335e+01 2.524e+01 2.797e+01 3.665e+01, threshold=5.048e+01, percent-clipped=0.0 2024-08-18 07:15:06,053 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 25 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-18 07:15:06,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3763030.0, ans=0.125 2024-08-18 07:15:21,810 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 21 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-18 07:15:26,649 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 4200, loss[loss=0.09679, beats_loss=0.01215, ecapa_loss=0.0001401, whisper_loss=0.08324, over 20991.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01051, ecapa_loss=0.0001484, whisper_loss=0.09161, over 3862156.13 frames. ], batch size: 85, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:15:39,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3763330.0, ans=0.2 2024-08-18 07:15:51,301 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 18 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-18 07:15:57,794 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-18 07:16:11,664 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 26 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-18 07:16:27,200 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 19 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-18 07:16:30,495 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 4250, loss[loss=0.09545, beats_loss=0.009966, ecapa_loss=0.00015, whisper_loss=0.08399, over 20917.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01052, ecapa_loss=0.0001479, whisper_loss=0.09085, over 3848155.55 frames. ], batch size: 85, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:16:51,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3763830.0, ans=0.125 2024-08-18 07:16:58,221 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.85 vs. limit=15.0 2024-08-18 07:17:06,903 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 28 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-18 07:17:10,782 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.296e+01 2.590e+01 3.069e+01 5.204e+01, threshold=5.180e+01, percent-clipped=2.0 2024-08-18 07:17:13,468 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-18 07:17:13,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3764030.0, ans=0.125 2024-08-18 07:17:24,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3764130.0, ans=0.0 2024-08-18 07:17:35,004 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 4300, loss[loss=0.1063, beats_loss=0.01077, ecapa_loss=0.0001466, whisper_loss=0.09411, over 22919.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01047, ecapa_loss=0.0001476, whisper_loss=0.09082, over 3855410.59 frames. ], batch size: 94, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:17:44,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3764230.0, ans=0.05 2024-08-18 07:17:56,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3764330.0, ans=0.125 2024-08-18 07:18:06,750 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.36 vs. limit=15.0 2024-08-18 07:18:11,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3764430.0, ans=0.1 2024-08-18 07:18:15,339 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-18 07:18:22,688 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.51 vs. limit=22.5 2024-08-18 07:18:25,962 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3764630.0, ans=0.125 2024-08-18 07:18:39,527 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 4350, loss[loss=0.1102, beats_loss=0.006986, ecapa_loss=0.0001656, whisper_loss=0.1015, over 17443.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01044, ecapa_loss=0.0001464, whisper_loss=0.09088, over 3821970.46 frames. ], batch size: 71, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:18:43,865 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-18 07:18:47,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3764730.0, ans=0.0 2024-08-18 07:18:50,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3764730.0, ans=0.2 2024-08-18 07:18:50,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3764730.0, ans=0.125 2024-08-18 07:19:02,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3764830.0, ans=0.125 2024-08-18 07:19:02,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3764830.0, ans=0.1 2024-08-18 07:19:05,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3764930.0, ans=0.0 2024-08-18 07:19:10,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3764930.0, ans=0.0 2024-08-18 07:19:18,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3765030.0, ans=0.0 2024-08-18 07:19:18,993 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.231e+01 2.499e+01 2.816e+01 6.235e+01, threshold=4.999e+01, percent-clipped=1.0 2024-08-18 07:19:22,142 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.15 vs. limit=22.5 2024-08-18 07:19:42,398 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 07:19:43,380 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 4400, loss[loss=0.117, beats_loss=0.01017, ecapa_loss=0.000142, whisper_loss=0.1054, over 19867.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01039, ecapa_loss=0.0001477, whisper_loss=0.09083, over 3827117.40 frames. ], batch size: 76, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:19:46,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3765230.0, ans=0.125 2024-08-18 07:20:04,188 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 17 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-18 07:20:08,404 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 22 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-18 07:20:08,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3765430.0, ans=0.125 2024-08-18 07:20:13,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3765430.0, ans=0.1 2024-08-18 07:20:15,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3765430.0, ans=0.125 2024-08-18 07:20:28,529 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 20 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-18 07:20:38,604 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.81 vs. limit=15.0 2024-08-18 07:20:47,416 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 4450, loss[loss=0.1236, beats_loss=0.008358, ecapa_loss=0.0001169, whisper_loss=0.114, over 22378.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01043, ecapa_loss=0.0001458, whisper_loss=0.09069, over 3865788.24 frames. ], batch size: 82, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:20:48,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3765730.0, ans=0.0 2024-08-18 07:20:49,519 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.11 vs. limit=15.0 2024-08-18 07:21:01,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3765830.0, ans=0.125 2024-08-18 07:21:02,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3765830.0, ans=0.125 2024-08-18 07:21:21,674 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-18 07:21:23,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3765930.0, ans=0.1 2024-08-18 07:21:26,931 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.341e+01 2.651e+01 2.929e+01 6.825e+01, threshold=5.301e+01, percent-clipped=1.0 2024-08-18 07:21:27,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3766030.0, ans=0.125 2024-08-18 07:21:27,781 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.63 vs. limit=15.0 2024-08-18 07:21:37,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3766130.0, ans=0.1 2024-08-18 07:21:50,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3766230.0, ans=0.035 2024-08-18 07:21:51,497 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 4500, loss[loss=0.1063, beats_loss=0.01111, ecapa_loss=0.0001345, whisper_loss=0.09388, over 17338.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01049, ecapa_loss=0.0001459, whisper_loss=0.09065, over 3887492.76 frames. ], batch size: 68, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:21:54,035 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 23 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-18 07:21:55,410 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 19 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-18 07:22:01,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3766230.0, ans=0.0 2024-08-18 07:22:29,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3766530.0, ans=0.1 2024-08-18 07:22:30,013 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 24 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-18 07:22:39,400 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.06 vs. limit=12.0 2024-08-18 07:22:50,914 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.59 vs. limit=22.5 2024-08-18 07:22:55,340 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 4550, loss[loss=0.106, beats_loss=0.0102, ecapa_loss=0.0001782, whisper_loss=0.09404, over 18335.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01044, ecapa_loss=0.000147, whisper_loss=0.09082, over 3884403.09 frames. ], batch size: 78, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:22:58,847 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.79 vs. limit=22.5 2024-08-18 07:23:18,926 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 30 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-18 07:23:22,451 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 13 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-18 07:23:22,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3766930.0, ans=0.125 2024-08-18 07:23:30,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3766930.0, ans=0.05 2024-08-18 07:23:34,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3767030.0, ans=0.1 2024-08-18 07:23:35,410 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.816e+01 2.270e+01 2.466e+01 2.642e+01 3.409e+01, threshold=4.932e+01, percent-clipped=0.0 2024-08-18 07:23:58,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3767230.0, ans=0.0 2024-08-18 07:23:59,819 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 4600, loss[loss=0.1109, beats_loss=0.006398, ecapa_loss=0.0001923, whisper_loss=0.1026, over 17809.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01037, ecapa_loss=0.000147, whisper_loss=0.09078, over 3868539.23 frames. ], batch size: 72, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:24:00,119 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-18 07:24:09,259 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-18 07:24:10,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3767230.0, ans=0.2 2024-08-18 07:24:18,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3767330.0, ans=0.125 2024-08-18 07:24:31,577 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 19 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-18 07:24:37,121 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-18 07:24:49,598 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 27 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-18 07:25:04,723 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 4650, loss[loss=0.1052, beats_loss=0.01086, ecapa_loss=0.0001571, whisper_loss=0.0928, over 19983.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01041, ecapa_loss=0.0001471, whisper_loss=0.09042, over 3890811.71 frames. ], batch size: 80, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:25:40,164 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.41 vs. limit=15.0 2024-08-18 07:25:44,556 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.243e+01 2.444e+01 2.788e+01 4.840e+01, threshold=4.887e+01, percent-clipped=0.0 2024-08-18 07:25:55,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3768130.0, ans=0.0 2024-08-18 07:26:00,775 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.25 vs. limit=10.0 2024-08-18 07:26:05,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3768130.0, ans=0.125 2024-08-18 07:26:06,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3768130.0, ans=0.125 2024-08-18 07:26:08,881 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 4700, loss[loss=0.1045, beats_loss=0.01023, ecapa_loss=0.0001474, whisper_loss=0.09283, over 19897.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0105, ecapa_loss=0.0001459, whisper_loss=0.09016, over 3887309.02 frames. ], batch size: 79, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:26:18,688 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.35 vs. limit=15.0 2024-08-18 07:26:19,905 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 07:26:19,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3768230.0, ans=0.07 2024-08-18 07:26:27,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3768330.0, ans=0.125 2024-08-18 07:26:59,127 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 30 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-18 07:27:01,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3768630.0, ans=0.125 2024-08-18 07:27:10,332 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 28 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-18 07:27:10,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3768630.0, ans=0.125 2024-08-18 07:27:12,722 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 4750, loss[loss=0.09786, beats_loss=0.01063, ecapa_loss=0.0001383, whisper_loss=0.08585, over 19189.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01048, ecapa_loss=0.0001458, whisper_loss=0.09026, over 3906503.67 frames. ], batch size: 79, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:27:21,996 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-18 07:27:22,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3768730.0, ans=0.125 2024-08-18 07:27:52,001 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.283e+01 2.543e+01 2.859e+01 8.026e+01, threshold=5.085e+01, percent-clipped=1.0 2024-08-18 07:28:15,462 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 4800, loss[loss=0.09254, beats_loss=0.01072, ecapa_loss=0.0001546, whisper_loss=0.08028, over 20773.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0105, ecapa_loss=0.0001455, whisper_loss=0.09008, over 3920707.30 frames. ], batch size: 83, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:28:28,498 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 24 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-18 07:28:35,053 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.082e-03 2024-08-18 07:28:36,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3769330.0, ans=0.125 2024-08-18 07:28:40,756 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.78 vs. limit=15.0 2024-08-18 07:28:41,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3769430.0, ans=0.125 2024-08-18 07:28:45,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3769430.0, ans=0.125 2024-08-18 07:28:51,638 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 20 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-18 07:28:59,408 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-18 07:29:00,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3769530.0, ans=0.05 2024-08-18 07:29:10,676 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-18 07:29:13,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3769630.0, ans=0.1 2024-08-18 07:29:19,652 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 4850, loss[loss=0.1108, beats_loss=0.009593, ecapa_loss=0.0001296, whisper_loss=0.09987, over 17724.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0106, ecapa_loss=0.0001453, whisper_loss=0.08968, over 3921973.30 frames. ], batch size: 70, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:29:20,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3769730.0, ans=0.125 2024-08-18 07:29:40,441 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-18 07:29:59,172 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.841e+01 2.248e+01 2.515e+01 2.808e+01 3.681e+01, threshold=5.031e+01, percent-clipped=0.0 2024-08-18 07:30:03,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3770030.0, ans=0.125 2024-08-18 07:30:04,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3770030.0, ans=0.0 2024-08-18 07:30:10,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3770130.0, ans=0.0 2024-08-18 07:30:11,756 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.30 vs. limit=12.0 2024-08-18 07:30:22,831 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-18 07:30:23,889 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 4900, loss[loss=0.1092, beats_loss=0.01038, ecapa_loss=0.0001639, whisper_loss=0.09723, over 22781.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01057, ecapa_loss=0.0001462, whisper_loss=0.09025, over 3905707.21 frames. ], batch size: 92, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:30:27,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3770230.0, ans=0.1 2024-08-18 07:30:34,116 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-18 07:30:35,370 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-18 07:30:54,176 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=18.59 vs. limit=22.5 2024-08-18 07:31:00,833 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 21 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-18 07:31:03,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3770530.0, ans=0.125 2024-08-18 07:31:07,647 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 24 from LS+wenet, 10 from Vox, 28 fro AS 2024-08-18 07:31:13,982 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 36 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-18 07:31:27,758 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 4950, loss[loss=0.09542, beats_loss=0.01034, ecapa_loss=0.000139, whisper_loss=0.08368, over 20566.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01053, ecapa_loss=0.0001455, whisper_loss=0.09038, over 3881237.67 frames. ], batch size: 81, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:31:29,655 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.48 vs. limit=22.5 2024-08-18 07:31:33,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3770730.0, ans=0.0 2024-08-18 07:31:43,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3770830.0, ans=0.125 2024-08-18 07:31:48,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3770830.0, ans=0.2 2024-08-18 07:31:51,682 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 21 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-18 07:31:56,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3770930.0, ans=0.125 2024-08-18 07:31:56,956 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.23 vs. limit=15.0 2024-08-18 07:31:59,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3770930.0, ans=0.2 2024-08-18 07:32:06,687 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.529e+01 2.313e+01 2.600e+01 2.856e+01 8.113e+01, threshold=5.200e+01, percent-clipped=1.0 2024-08-18 07:32:09,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3771030.0, ans=0.0 2024-08-18 07:32:20,776 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-18 07:32:23,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3771130.0, ans=0.05 2024-08-18 07:32:29,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3771130.0, ans=0.125 2024-08-18 07:32:31,250 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 5000, loss[loss=0.08154, beats_loss=0.01065, ecapa_loss=0.0001693, whisper_loss=0.06919, over 21399.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01052, ecapa_loss=0.0001465, whisper_loss=0.09042, over 3867371.88 frames. ], batch size: 90, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:32:34,040 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-18 07:32:35,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=3771230.0, ans=0.02 2024-08-18 07:33:03,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3771430.0, ans=0.125 2024-08-18 07:33:22,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3771630.0, ans=0.0 2024-08-18 07:33:35,099 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 5050, loss[loss=0.1238, beats_loss=0.009321, ecapa_loss=0.0001798, whisper_loss=0.1126, over 16311.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01051, ecapa_loss=0.0001469, whisper_loss=0.09158, over 3910112.97 frames. ], batch size: 66, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:33:44,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3771730.0, ans=0.125 2024-08-18 07:33:49,096 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.46 vs. limit=22.5 2024-08-18 07:33:56,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3771830.0, ans=0.0 2024-08-18 07:33:59,864 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-18 07:34:15,377 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.763e+01 2.257e+01 2.473e+01 2.747e+01 4.109e+01, threshold=4.946e+01, percent-clipped=0.0 2024-08-18 07:34:18,142 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-18 07:34:21,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3772030.0, ans=0.125 2024-08-18 07:34:27,112 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 26 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-18 07:34:32,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3772130.0, ans=0.125 2024-08-18 07:34:39,433 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 5100, loss[loss=0.09953, beats_loss=0.008858, ecapa_loss=0.0001425, whisper_loss=0.08925, over 23194.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01053, ecapa_loss=0.0001459, whisper_loss=0.09188, over 3933705.66 frames. ], batch size: 91, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:34:54,522 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.11 vs. limit=15.0 2024-08-18 07:35:02,410 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.13 vs. limit=22.5 2024-08-18 07:35:29,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3772530.0, ans=0.125 2024-08-18 07:35:31,991 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.50 vs. limit=22.5 2024-08-18 07:35:37,532 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-18 07:35:43,893 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 5150, loss[loss=0.08682, beats_loss=0.01069, ecapa_loss=0.0001413, whisper_loss=0.07472, over 17612.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01056, ecapa_loss=0.0001453, whisper_loss=0.09139, over 3924483.64 frames. ], batch size: 71, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:36:01,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3772830.0, ans=0.125 2024-08-18 07:36:06,196 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 13 from LS+wenet, 33 from Vox, 24 fro AS 2024-08-18 07:36:12,517 WARNING [optim.py:496] (0/4) Scaling gradients by 0.06933695822954178, model_norm_threshold=49.455631256103516 2024-08-18 07:36:12,686 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.0.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.293e+04, grad_sumsq=7.293e+04, orig_rms_sq=1.000e+00 2024-08-18 07:36:24,069 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.841e+01 2.306e+01 2.541e+01 2.832e+01 7.133e+02, threshold=5.082e+01, percent-clipped=3.0 2024-08-18 07:36:37,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3773130.0, ans=0.1 2024-08-18 07:36:44,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3773130.0, ans=0.125 2024-08-18 07:36:48,916 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 5200, loss[loss=0.1013, beats_loss=0.01182, ecapa_loss=0.0001108, whisper_loss=0.08838, over 19121.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01063, ecapa_loss=0.0001454, whisper_loss=0.09101, over 3884512.38 frames. ], batch size: 73, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:36:49,610 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.63 vs. limit=22.5 2024-08-18 07:36:52,169 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.93 vs. limit=15.0 2024-08-18 07:36:58,392 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 28 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-18 07:37:11,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=3773330.0, ans=0.05 2024-08-18 07:37:14,068 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 14 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-18 07:37:35,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3773530.0, ans=0.125 2024-08-18 07:37:43,022 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 24 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-18 07:37:54,549 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 5250, loss[loss=0.117, beats_loss=0.00655, ecapa_loss=0.0001608, whisper_loss=0.1088, over 14762.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01067, ecapa_loss=0.0001452, whisper_loss=0.09077, over 3866511.28 frames. ], batch size: 58, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:38:07,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3773830.0, ans=0.1 2024-08-18 07:38:21,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3773830.0, ans=0.0 2024-08-18 07:38:24,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3773930.0, ans=0.95 2024-08-18 07:38:24,600 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.58 vs. limit=22.5 2024-08-18 07:38:37,081 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-18 07:38:39,602 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.656e+01 2.417e+01 2.630e+01 2.894e+01 3.805e+02, threshold=5.260e+01, percent-clipped=2.0 2024-08-18 07:38:59,851 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 30 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-18 07:39:02,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3774130.0, ans=0.125 2024-08-18 07:39:03,144 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.88 vs. limit=22.5 2024-08-18 07:39:05,626 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 5300, loss[loss=0.09717, beats_loss=0.01075, ecapa_loss=0.0001384, whisper_loss=0.08504, over 20246.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01054, ecapa_loss=0.0001459, whisper_loss=0.0912, over 3855613.70 frames. ], batch size: 81, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:39:23,263 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.20 vs. limit=15.0 2024-08-18 07:39:35,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3774430.0, ans=0.1 2024-08-18 07:39:43,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3774430.0, ans=0.125 2024-08-18 07:39:50,408 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-18 07:39:57,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3774530.0, ans=0.0 2024-08-18 07:40:08,208 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.05 vs. limit=15.0 2024-08-18 07:40:13,022 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-18 07:40:15,250 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 5350, loss[loss=0.1249, beats_loss=0.008174, ecapa_loss=0.0001504, whisper_loss=0.1152, over 23289.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01059, ecapa_loss=0.0001446, whisper_loss=0.09052, over 3877941.39 frames. ], batch size: 90, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:40:16,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3774730.0, ans=0.125 2024-08-18 07:40:32,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3774830.0, ans=0.2 2024-08-18 07:40:37,865 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.16 vs. limit=15.0 2024-08-18 07:40:46,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3774930.0, ans=0.1 2024-08-18 07:40:55,421 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.637e+01 2.297e+01 2.541e+01 2.871e+01 2.091e+02, threshold=5.082e+01, percent-clipped=1.0 2024-08-18 07:40:56,157 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.55 vs. limit=6.0 2024-08-18 07:41:00,546 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 16 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-18 07:41:06,062 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.87 vs. limit=15.0 2024-08-18 07:41:06,904 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-18 07:41:09,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3775130.0, ans=0.125 2024-08-18 07:41:19,107 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 5400, loss[loss=0.0843, beats_loss=0.01162, ecapa_loss=0.0001326, whisper_loss=0.07135, over 16754.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01058, ecapa_loss=0.0001446, whisper_loss=0.09059, over 3862416.74 frames. ], batch size: 67, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:41:27,348 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.72 vs. limit=15.0 2024-08-18 07:41:34,987 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-18 07:41:36,437 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3775330.0, ans=0.0 2024-08-18 07:42:19,256 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.55 vs. limit=15.0 2024-08-18 07:42:21,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3775630.0, ans=0.1 2024-08-18 07:42:23,626 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 5450, loss[loss=0.1107, beats_loss=0.009044, ecapa_loss=0.00018, whisper_loss=0.09983, over 18090.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01051, ecapa_loss=0.0001454, whisper_loss=0.0909, over 3831516.36 frames. ], batch size: 78, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 07:42:25,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3775730.0, ans=0.0 2024-08-18 07:42:41,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3775830.0, ans=0.0 2024-08-18 07:43:03,288 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.721e+01 2.317e+01 2.510e+01 2.748e+01 4.518e+01, threshold=5.020e+01, percent-clipped=0.0 2024-08-18 07:43:10,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3776030.0, ans=0.125 2024-08-18 07:43:15,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3776130.0, ans=0.125 2024-08-18 07:43:27,239 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 5500, loss[loss=0.1183, beats_loss=0.01028, ecapa_loss=0.0001515, whisper_loss=0.1065, over 22548.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01059, ecapa_loss=0.000145, whisper_loss=0.0907, over 3838255.94 frames. ], batch size: 89, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 07:43:28,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3776230.0, ans=0.04949747468305833 2024-08-18 07:43:33,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3776230.0, ans=0.125 2024-08-18 07:43:43,283 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 20 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-18 07:43:49,728 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 27 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-18 07:43:49,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3776330.0, ans=0.125 2024-08-18 07:43:51,187 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-18 07:43:55,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3776430.0, ans=0.0 2024-08-18 07:43:57,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3776430.0, ans=0.125 2024-08-18 07:44:29,293 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 5550, loss[loss=0.1007, beats_loss=0.01091, ecapa_loss=0.0001422, whisper_loss=0.08834, over 22924.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01063, ecapa_loss=0.0001456, whisper_loss=0.08957, over 3846149.29 frames. ], batch size: 94, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 07:44:38,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3776730.0, ans=0.125 2024-08-18 07:44:58,041 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 36 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-18 07:45:05,859 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-18 07:45:08,292 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.307e+01 2.560e+01 2.922e+01 7.758e+01, threshold=5.121e+01, percent-clipped=1.0 2024-08-18 07:45:18,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3777130.0, ans=0.0 2024-08-18 07:45:19,651 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 36 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-18 07:45:19,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3777130.0, ans=0.0 2024-08-18 07:45:31,911 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 5600, loss[loss=0.1114, beats_loss=0.00973, ecapa_loss=0.000141, whisper_loss=0.1002, over 20815.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01055, ecapa_loss=0.0001456, whisper_loss=0.09006, over 3857308.29 frames. ], batch size: 83, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 07:45:33,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3777230.0, ans=0.0 2024-08-18 07:45:38,184 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 17 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-18 07:45:55,988 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.94 vs. limit=15.0 2024-08-18 07:45:56,797 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-18 07:46:03,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3777430.0, ans=0.035 2024-08-18 07:46:13,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3777530.0, ans=0.125 2024-08-18 07:46:29,049 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-18 07:46:29,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3777630.0, ans=0.125 2024-08-18 07:46:31,328 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-18 07:46:33,459 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 5650, loss[loss=0.1207, beats_loss=0.01133, ecapa_loss=0.0001265, whisper_loss=0.1081, over 22970.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0106, ecapa_loss=0.0001458, whisper_loss=0.08992, over 3895194.19 frames. ], batch size: 89, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 07:47:06,865 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.72 vs. limit=22.5 2024-08-18 07:47:07,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3777930.0, ans=0.95 2024-08-18 07:47:08,595 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 27 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-18 07:47:12,367 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+01 2.355e+01 2.678e+01 2.961e+01 4.241e+01, threshold=5.356e+01, percent-clipped=0.0 2024-08-18 07:47:17,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3778030.0, ans=0.125 2024-08-18 07:47:18,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=3778030.0, ans=0.2 2024-08-18 07:47:33,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3778130.0, ans=0.0 2024-08-18 07:47:35,841 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 5700, loss[loss=0.1084, beats_loss=0.01064, ecapa_loss=0.0001491, whisper_loss=0.09627, over 22830.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01064, ecapa_loss=0.000146, whisper_loss=0.08988, over 3910840.40 frames. ], batch size: 91, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 07:47:51,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3778330.0, ans=0.1 2024-08-18 07:47:54,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3778330.0, ans=0.125 2024-08-18 07:47:59,085 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 24 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-18 07:48:32,871 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.13 vs. limit=22.5 2024-08-18 07:48:56,265 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 5750, loss[loss=0.09196, beats_loss=0.00973, ecapa_loss=0.0001692, whisper_loss=0.08054, over 16482.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0107, ecapa_loss=0.0001457, whisper_loss=0.08965, over 3937276.14 frames. ], batch size: 67, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 07:49:09,790 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 16 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-18 07:49:27,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3778830.0, ans=0.0 2024-08-18 07:49:31,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3778830.0, ans=0.0 2024-08-18 07:49:36,575 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.438e-02 2024-08-18 07:49:41,717 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=15.32 vs. limit=15.0 2024-08-18 07:49:49,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3778930.0, ans=0.2 2024-08-18 07:49:54,439 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.413e+01 2.637e+01 2.878e+01 7.764e+01, threshold=5.274e+01, percent-clipped=1.0 2024-08-18 07:50:28,920 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 5800, loss[loss=0.09965, beats_loss=0.01055, ecapa_loss=0.0001521, whisper_loss=0.08759, over 17420.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01066, ecapa_loss=0.0001449, whisper_loss=0.08984, over 3888258.46 frames. ], batch size: 70, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 07:50:32,396 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.37 vs. limit=15.0 2024-08-18 07:50:33,755 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.35 vs. limit=15.0 2024-08-18 07:51:38,370 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 20 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-18 07:51:40,325 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 14 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-18 07:51:45,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3779530.0, ans=0.125 2024-08-18 07:51:46,718 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-18 07:51:47,129 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.93 vs. limit=15.0 2024-08-18 07:52:04,369 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 5850, loss[loss=0.09359, beats_loss=0.01196, ecapa_loss=0.0001304, whisper_loss=0.08033, over 19994.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01065, ecapa_loss=0.0001449, whisper_loss=0.08985, over 3872460.55 frames. ], batch size: 79, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 07:52:07,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3779730.0, ans=0.2 2024-08-18 07:52:09,538 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.554e+05 2024-08-18 07:52:12,414 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-18 07:52:14,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3779730.0, ans=0.125 2024-08-18 07:52:29,499 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-18 07:52:32,582 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 18 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-18 07:52:35,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3779930.0, ans=0.125 2024-08-18 07:52:50,346 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.605e+01 2.256e+01 2.447e+01 2.728e+01 3.298e+01, threshold=4.893e+01, percent-clipped=0.0 2024-08-18 07:52:50,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3780030.0, ans=0.2 2024-08-18 07:52:54,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3780030.0, ans=0.0 2024-08-18 07:53:01,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3780130.0, ans=0.1 2024-08-18 07:53:17,948 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 5900, loss[loss=0.1017, beats_loss=0.01067, ecapa_loss=0.000142, whisper_loss=0.08962, over 22289.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0107, ecapa_loss=0.0001441, whisper_loss=0.08973, over 3868192.27 frames. ], batch size: 89, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 07:53:27,441 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-18 07:53:29,302 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.72 vs. limit=15.0 2024-08-18 07:53:35,616 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-18 07:53:42,618 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-18 07:54:11,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3780530.0, ans=0.1 2024-08-18 07:54:17,471 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 07:54:19,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3780630.0, ans=0.2 2024-08-18 07:54:27,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=3780630.0, ans=15.0 2024-08-18 07:54:28,923 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 25 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-18 07:54:30,162 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 5950, loss[loss=0.1063, beats_loss=0.01079, ecapa_loss=0.0001417, whisper_loss=0.0941, over 19116.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01074, ecapa_loss=0.0001434, whisper_loss=0.0897, over 3856940.79 frames. ], batch size: 73, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 07:54:45,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3780830.0, ans=0.1 2024-08-18 07:55:03,750 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-18 07:55:05,213 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 18 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-18 07:55:10,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3780930.0, ans=0.125 2024-08-18 07:55:14,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3781030.0, ans=0.0 2024-08-18 07:55:15,692 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.895e+01 2.332e+01 2.559e+01 2.973e+01 4.806e+01, threshold=5.118e+01, percent-clipped=0.0 2024-08-18 07:55:25,814 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-18 07:55:38,482 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 26 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-18 07:55:44,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3781230.0, ans=0.2 2024-08-18 07:55:45,370 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 6000, loss[loss=0.09403, beats_loss=0.009432, ecapa_loss=0.0001931, whisper_loss=0.08267, over 15534.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01066, ecapa_loss=0.0001437, whisper_loss=0.0909, over 3845107.33 frames. ], batch size: 66, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 07:55:45,372 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-18 07:56:22,520 INFO [train_multi_KD3.py:1149] (0/4) Epoch 26, validation on ASR_libri: loss=0.2523, beats_loss=0, ecapa_loss=0.0005188, whisper_loss=0.2471, over 922467.00 frames. 2024-08-18 07:56:37,668 INFO [train_multi_KD3.py:1149] (0/4) Epoch 26, validation on SV_voxceleb1: loss=0.004005, beats_loss=0, ecapa_loss=0.0004005, whisper_loss=0, over 939242.00 frames. 2024-08-18 07:58:26,837 INFO [train_multi_KD3.py:1149] (0/4) Epoch 26, validation on AT_audioset: loss=0.0231, beats_loss=0.0231, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 07:58:26,841 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32389MB 2024-08-18 07:58:29,540 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 28 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-18 07:58:42,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3781330.0, ans=0.0 2024-08-18 07:58:44,388 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 18 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-18 07:58:51,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3781330.0, ans=0.125 2024-08-18 07:58:51,768 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.88 vs. limit=22.5 2024-08-18 07:58:55,660 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-18 07:58:57,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3781430.0, ans=0.2 2024-08-18 07:59:00,121 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 26 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-18 07:59:09,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3781530.0, ans=0.2 2024-08-18 07:59:24,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3781630.0, ans=0.125 2024-08-18 07:59:35,355 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 20 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-18 07:59:39,132 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 6050, loss[loss=0.1056, beats_loss=0.009925, ecapa_loss=0.0001974, whisper_loss=0.09369, over 13776.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01063, ecapa_loss=0.0001433, whisper_loss=0.09149, over 3843124.51 frames. ], batch size: 60, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 07:59:42,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3781730.0, ans=0.125 2024-08-18 07:59:58,884 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-18 08:00:03,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3781830.0, ans=0.1 2024-08-18 08:00:17,767 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-18 08:00:21,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3782030.0, ans=0.2 2024-08-18 08:00:22,645 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.816e+01 2.283e+01 2.534e+01 2.790e+01 4.412e+01, threshold=5.068e+01, percent-clipped=0.0 2024-08-18 08:00:25,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3782030.0, ans=0.125 2024-08-18 08:00:26,621 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 23 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-18 08:00:37,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3782130.0, ans=0.05 2024-08-18 08:00:49,567 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 6100, loss[loss=0.1084, beats_loss=0.009567, ecapa_loss=0.0001569, whisper_loss=0.0973, over 18482.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01067, ecapa_loss=0.000144, whisper_loss=0.09068, over 3851469.92 frames. ], batch size: 76, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:01:00,431 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-18 08:01:08,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3782330.0, ans=0.2 2024-08-18 08:01:20,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3782430.0, ans=0.125 2024-08-18 08:01:23,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3782430.0, ans=0.0 2024-08-18 08:01:31,378 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.31 vs. limit=22.5 2024-08-18 08:01:34,030 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.91 vs. limit=15.0 2024-08-18 08:01:35,043 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-18 08:01:42,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3782530.0, ans=0.125 2024-08-18 08:01:47,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3782630.0, ans=0.0 2024-08-18 08:02:02,596 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 6150, loss[loss=0.1194, beats_loss=0.01053, ecapa_loss=0.0001357, whisper_loss=0.1075, over 23779.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01062, ecapa_loss=0.0001442, whisper_loss=0.09142, over 3887977.28 frames. ], batch size: 90, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:02:11,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3782730.0, ans=0.0 2024-08-18 08:02:14,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3782730.0, ans=0.0 2024-08-18 08:02:14,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3782730.0, ans=0.1 2024-08-18 08:02:25,569 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 32 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-18 08:02:33,289 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.19 vs. limit=15.0 2024-08-18 08:02:37,867 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 25 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-18 08:02:46,184 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.766e+01 2.364e+01 2.615e+01 3.104e+01 2.704e+02, threshold=5.230e+01, percent-clipped=2.0 2024-08-18 08:02:46,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3783030.0, ans=0.035 2024-08-18 08:02:53,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3783030.0, ans=0.2 2024-08-18 08:02:57,497 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-18 08:03:01,859 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-18 08:03:13,270 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.57 vs. limit=10.0 2024-08-18 08:03:13,875 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 6200, loss[loss=0.09418, beats_loss=0.01281, ecapa_loss=9.359e-05, whisper_loss=0.08044, over 23612.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01058, ecapa_loss=0.0001454, whisper_loss=0.09121, over 3899032.17 frames. ], batch size: 91, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:03:44,403 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 25 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-18 08:03:49,104 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.08 vs. limit=15.0 2024-08-18 08:04:06,508 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-18 08:04:20,843 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 16 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-18 08:04:23,399 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 6250, loss[loss=0.09325, beats_loss=0.01154, ecapa_loss=9.477e-05, whisper_loss=0.08076, over 15084.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0106, ecapa_loss=0.0001442, whisper_loss=0.09051, over 3872190.74 frames. ], batch size: 56, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:04:27,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3783730.0, ans=0.07 2024-08-18 08:04:33,429 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.67 vs. limit=12.0 2024-08-18 08:04:44,067 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-18 08:04:50,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3783930.0, ans=0.0 2024-08-18 08:04:53,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3783930.0, ans=0.125 2024-08-18 08:05:02,741 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 15 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-18 08:05:04,954 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.253e+01 2.525e+01 2.867e+01 3.662e+01, threshold=5.049e+01, percent-clipped=0.0 2024-08-18 08:05:19,092 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 24 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-18 08:05:30,943 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 6300, loss[loss=0.1158, beats_loss=0.01104, ecapa_loss=0.0001417, whisper_loss=0.1033, over 23220.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01064, ecapa_loss=0.0001429, whisper_loss=0.09122, over 3863796.93 frames. ], batch size: 92, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:05:32,491 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-18 08:05:41,206 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 17 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-18 08:05:58,233 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-18 08:06:21,073 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 22 from LS+wenet, 8 from Vox, 23 fro AS 2024-08-18 08:06:23,887 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 16 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-18 08:06:30,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3784630.0, ans=0.125 2024-08-18 08:06:31,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3784630.0, ans=0.125 2024-08-18 08:06:36,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3784630.0, ans=0.125 2024-08-18 08:06:41,603 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 6350, loss[loss=0.1244, beats_loss=0.008655, ecapa_loss=0.0001255, whisper_loss=0.1144, over 18550.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01057, ecapa_loss=0.0001435, whisper_loss=0.09112, over 3813021.62 frames. ], batch size: 67, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:06:45,387 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 25 from LS+wenet, 8 from Vox, 22 fro AS 2024-08-18 08:06:49,721 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 23 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-18 08:07:02,575 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-18 08:07:03,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3784830.0, ans=0.0 2024-08-18 08:07:11,560 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.76 vs. limit=22.5 2024-08-18 08:07:19,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3784930.0, ans=0.125 2024-08-18 08:07:24,439 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.986e+01 2.408e+01 2.616e+01 2.938e+01 2.370e+02, threshold=5.231e+01, percent-clipped=3.0 2024-08-18 08:07:37,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3785130.0, ans=0.125 2024-08-18 08:07:51,361 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 6400, loss[loss=0.1056, beats_loss=0.009672, ecapa_loss=0.0001314, whisper_loss=0.09461, over 18299.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01072, ecapa_loss=0.0001429, whisper_loss=0.09053, over 3883858.00 frames. ], batch size: 68, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:08:27,254 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-18 08:08:39,116 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 34 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-18 08:08:41,843 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-18 08:09:02,637 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 6450, loss[loss=0.07814, beats_loss=0.01301, ecapa_loss=0.0001522, whisper_loss=0.0636, over 21931.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01072, ecapa_loss=0.0001445, whisper_loss=0.09031, over 3902017.84 frames. ], batch size: 93, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:09:14,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3785730.0, ans=0.125 2024-08-18 08:09:27,118 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.20 vs. limit=15.0 2024-08-18 08:09:32,099 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.03 vs. limit=15.0 2024-08-18 08:09:42,195 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 19 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-18 08:09:43,855 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.533e+05 2024-08-18 08:09:43,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3786030.0, ans=0.125 2024-08-18 08:09:46,280 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.963e+01 2.303e+01 2.568e+01 2.899e+01 1.758e+02, threshold=5.135e+01, percent-clipped=1.0 2024-08-18 08:09:56,346 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=8.231e+00 2024-08-18 08:09:56,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3786030.0, ans=0.1 2024-08-18 08:09:56,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3786030.0, ans=0.1 2024-08-18 08:10:11,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3786130.0, ans=0.125 2024-08-18 08:10:15,955 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 6500, loss[loss=0.0977, beats_loss=0.009816, ecapa_loss=0.0001166, whisper_loss=0.08672, over 16811.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01062, ecapa_loss=0.0001446, whisper_loss=0.09055, over 3918738.57 frames. ], batch size: 62, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:10:24,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3786230.0, ans=0.1 2024-08-18 08:10:24,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3786230.0, ans=0.0 2024-08-18 08:10:52,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3786430.0, ans=0.2 2024-08-18 08:10:53,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3786430.0, ans=0.125 2024-08-18 08:11:08,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3786530.0, ans=0.125 2024-08-18 08:11:26,584 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 6550, loss[loss=0.08941, beats_loss=0.009284, ecapa_loss=0.000196, whisper_loss=0.07817, over 15631.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01062, ecapa_loss=0.0001446, whisper_loss=0.09024, over 3931523.73 frames. ], batch size: 63, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:11:31,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3786730.0, ans=0.09899494936611666 2024-08-18 08:11:42,259 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-18 08:11:45,087 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-18 08:11:47,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3786830.0, ans=0.0 2024-08-18 08:11:50,458 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.76 vs. limit=15.0 2024-08-18 08:11:58,092 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 26 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-18 08:12:09,853 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.653e+01 2.440e+01 2.660e+01 3.012e+01 4.611e+01, threshold=5.320e+01, percent-clipped=0.0 2024-08-18 08:12:36,488 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.31 vs. limit=15.0 2024-08-18 08:12:37,271 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 6600, loss[loss=0.09032, beats_loss=0.01286, ecapa_loss=0.0001596, whisper_loss=0.07587, over 14877.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01056, ecapa_loss=0.0001455, whisper_loss=0.09088, over 3928571.24 frames. ], batch size: 63, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:12:37,421 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 30 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-18 08:12:41,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3787230.0, ans=0.125 2024-08-18 08:12:50,334 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 16 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-18 08:13:32,623 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3787630.0, ans=0.125 2024-08-18 08:13:33,694 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 19 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-18 08:13:36,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3787630.0, ans=0.2 2024-08-18 08:13:48,036 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 6650, loss[loss=0.08167, beats_loss=0.01033, ecapa_loss=0.0002313, whisper_loss=0.06903, over 18635.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01049, ecapa_loss=0.0001456, whisper_loss=0.09108, over 3923328.86 frames. ], batch size: 84, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:13:54,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3787730.0, ans=0.125 2024-08-18 08:13:58,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3787730.0, ans=0.0 2024-08-18 08:13:58,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3787730.0, ans=0.125 2024-08-18 08:13:59,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3787730.0, ans=0.1 2024-08-18 08:14:02,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3787830.0, ans=0.125 2024-08-18 08:14:05,792 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.88 vs. limit=15.0 2024-08-18 08:14:20,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3787930.0, ans=0.125 2024-08-18 08:14:26,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3787930.0, ans=0.125 2024-08-18 08:14:27,689 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 23 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-18 08:14:31,532 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.309e+01 2.540e+01 2.799e+01 5.437e+01, threshold=5.081e+01, percent-clipped=1.0 2024-08-18 08:14:35,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3788030.0, ans=0.0 2024-08-18 08:14:50,751 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 24 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-18 08:14:58,179 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 37 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-18 08:14:58,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3788230.0, ans=0.125 2024-08-18 08:14:59,201 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 6700, loss[loss=0.1314, beats_loss=0.009034, ecapa_loss=0.000105, whisper_loss=0.1213, over 24542.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01037, ecapa_loss=0.0001459, whisper_loss=0.09213, over 3922662.20 frames. ], batch size: 89, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:15:30,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3788430.0, ans=0.1 2024-08-18 08:15:31,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3788430.0, ans=0.07 2024-08-18 08:15:37,922 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 18 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-18 08:15:59,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3788630.0, ans=0.0 2024-08-18 08:16:06,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3788630.0, ans=0.125 2024-08-18 08:16:08,899 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 6750, loss[loss=0.1273, beats_loss=0.006994, ecapa_loss=0.000143, whisper_loss=0.1188, over 18898.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01032, ecapa_loss=0.0001462, whisper_loss=0.09183, over 3896869.54 frames. ], batch size: 71, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:16:10,507 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 29 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-18 08:16:22,475 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.77 vs. limit=15.0 2024-08-18 08:16:24,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3788830.0, ans=0.0 2024-08-18 08:16:34,517 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 17 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-18 08:16:43,304 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.26 vs. limit=15.0 2024-08-18 08:16:52,020 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.278e+01 2.601e+01 3.008e+01 1.413e+02, threshold=5.202e+01, percent-clipped=1.0 2024-08-18 08:16:55,924 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.57 vs. limit=8.0 2024-08-18 08:17:00,528 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.01 vs. limit=15.0 2024-08-18 08:17:14,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3789130.0, ans=0.125 2024-08-18 08:17:17,994 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.80 vs. limit=6.0 2024-08-18 08:17:18,530 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 6800, loss[loss=0.1115, beats_loss=0.0089, ecapa_loss=0.0001725, whisper_loss=0.1009, over 18264.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01043, ecapa_loss=0.0001457, whisper_loss=0.09121, over 3900821.71 frames. ], batch size: 75, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:17:18,683 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 34 from LS+wenet, 10 from Vox, 39 fro AS 2024-08-18 08:17:21,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3789230.0, ans=0.125 2024-08-18 08:17:26,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3789230.0, ans=0.125 2024-08-18 08:17:34,845 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 22 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-18 08:18:03,565 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 22 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-18 08:18:12,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3789530.0, ans=0.0 2024-08-18 08:18:15,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3789630.0, ans=0.0 2024-08-18 08:18:17,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3789630.0, ans=0.1 2024-08-18 08:18:18,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3789630.0, ans=0.0 2024-08-18 08:18:20,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3789630.0, ans=0.125 2024-08-18 08:18:26,838 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.088e+01 2024-08-18 08:18:27,562 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 6850, loss[loss=0.09049, beats_loss=0.01214, ecapa_loss=0.0001404, whisper_loss=0.07694, over 22548.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01052, ecapa_loss=0.0001462, whisper_loss=0.09004, over 3865243.61 frames. ], batch size: 92, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:18:57,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3789930.0, ans=0.0 2024-08-18 08:19:02,802 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 17 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-18 08:19:09,387 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.742e+01 2.277e+01 2.530e+01 2.768e+01 4.068e+01, threshold=5.060e+01, percent-clipped=0.0 2024-08-18 08:19:09,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3790030.0, ans=0.1 2024-08-18 08:19:19,906 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-18 08:19:26,494 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 21 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-18 08:19:36,691 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 6900, loss[loss=0.1009, beats_loss=0.01147, ecapa_loss=0.0001212, whisper_loss=0.08819, over 23179.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01053, ecapa_loss=0.0001461, whisper_loss=0.09, over 3862703.64 frames. ], batch size: 92, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:19:43,257 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.55 vs. limit=15.0 2024-08-18 08:20:06,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3790430.0, ans=0.1 2024-08-18 08:20:21,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3790530.0, ans=0.0 2024-08-18 08:20:24,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3790530.0, ans=0.0 2024-08-18 08:20:30,729 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 32 from Vox, 32 fro AS 2024-08-18 08:20:39,138 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-18 08:20:45,608 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 6950, loss[loss=0.09346, beats_loss=0.01276, ecapa_loss=0.0001218, whisper_loss=0.07948, over 18197.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01054, ecapa_loss=0.0001471, whisper_loss=0.09032, over 3858377.54 frames. ], batch size: 75, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:20:52,719 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 25 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-18 08:21:17,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3790930.0, ans=0.0 2024-08-18 08:21:17,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3790930.0, ans=0.125 2024-08-18 08:21:25,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3791030.0, ans=0.1 2024-08-18 08:21:28,902 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.780e+01 2.281e+01 2.543e+01 2.749e+01 3.905e+01, threshold=5.086e+01, percent-clipped=0.0 2024-08-18 08:21:37,419 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.14 vs. limit=15.0 2024-08-18 08:21:44,857 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 17 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-18 08:21:50,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3791130.0, ans=0.125 2024-08-18 08:21:54,204 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 7000, loss[loss=0.1199, beats_loss=0.01119, ecapa_loss=0.0001131, whisper_loss=0.1076, over 23806.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0105, ecapa_loss=0.0001471, whisper_loss=0.09044, over 3899144.20 frames. ], batch size: 90, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:22:02,173 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.07 vs. limit=12.0 2024-08-18 08:22:34,809 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-18 08:22:35,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3791530.0, ans=0.125 2024-08-18 08:22:41,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3791530.0, ans=0.035 2024-08-18 08:22:43,589 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-18 08:22:44,012 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.74 vs. limit=15.0 2024-08-18 08:22:45,735 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-18 08:22:48,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3791530.0, ans=0.125 2024-08-18 08:22:53,452 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 22 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-18 08:23:04,181 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 7050, loss[loss=0.1017, beats_loss=0.01028, ecapa_loss=0.0001522, whisper_loss=0.08985, over 17620.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01057, ecapa_loss=0.0001475, whisper_loss=0.09047, over 3930497.96 frames. ], batch size: 70, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:23:10,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3791730.0, ans=0.09899494936611666 2024-08-18 08:23:11,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3791730.0, ans=0.125 2024-08-18 08:23:25,121 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.80 vs. limit=12.0 2024-08-18 08:23:31,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3791930.0, ans=0.1 2024-08-18 08:23:37,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3791930.0, ans=0.0 2024-08-18 08:23:37,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3791930.0, ans=0.2 2024-08-18 08:23:40,264 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.71 vs. limit=15.0 2024-08-18 08:23:47,629 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.689e+01 2.340e+01 2.598e+01 2.847e+01 9.151e+01, threshold=5.195e+01, percent-clipped=1.0 2024-08-18 08:23:52,511 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.27 vs. limit=12.0 2024-08-18 08:23:56,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3792030.0, ans=0.125 2024-08-18 08:23:58,852 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 16 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-18 08:24:13,624 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 7100, loss[loss=0.0907, beats_loss=0.01173, ecapa_loss=0.0001487, whisper_loss=0.07749, over 17287.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01057, ecapa_loss=0.0001464, whisper_loss=0.09018, over 3885928.90 frames. ], batch size: 71, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:24:28,389 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-18 08:24:43,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3792430.0, ans=0.0 2024-08-18 08:24:48,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3792430.0, ans=0.125 2024-08-18 08:25:03,953 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-18 08:25:29,643 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 7150, loss[loss=0.104, beats_loss=0.009328, ecapa_loss=0.0001891, whisper_loss=0.09278, over 16967.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01053, ecapa_loss=0.0001454, whisper_loss=0.09061, over 3884466.92 frames. ], batch size: 71, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:25:40,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3792730.0, ans=0.0 2024-08-18 08:25:48,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3792830.0, ans=0.0 2024-08-18 08:25:55,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3792830.0, ans=0.125 2024-08-18 08:26:04,651 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=8.847e+00 2024-08-18 08:26:15,860 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.780e+01 2.213e+01 2.415e+01 2.691e+01 1.069e+02, threshold=4.830e+01, percent-clipped=1.0 2024-08-18 08:26:20,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3793030.0, ans=0.125 2024-08-18 08:26:25,026 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-18 08:26:34,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3793130.0, ans=0.2 2024-08-18 08:26:39,916 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 22 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-18 08:26:40,484 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.29 vs. limit=15.0 2024-08-18 08:26:43,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3793230.0, ans=0.1 2024-08-18 08:26:44,265 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 7200, loss[loss=0.09482, beats_loss=0.01113, ecapa_loss=0.0001403, whisper_loss=0.08229, over 21891.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01047, ecapa_loss=0.0001447, whisper_loss=0.09139, over 3899334.04 frames. ], batch size: 89, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:26:55,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3793230.0, ans=0.1 2024-08-18 08:27:15,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3793430.0, ans=0.0 2024-08-18 08:27:23,660 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 32 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-18 08:27:27,207 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-18 08:27:37,131 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-18 08:28:03,222 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 7250, loss[loss=0.09002, beats_loss=0.01078, ecapa_loss=0.0001593, whisper_loss=0.07764, over 22377.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01051, ecapa_loss=0.0001439, whisper_loss=0.0907, over 3908880.55 frames. ], batch size: 93, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:28:10,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3793730.0, ans=0.0 2024-08-18 08:28:28,171 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 20 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-18 08:28:32,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3793830.0, ans=15.0 2024-08-18 08:28:37,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3793930.0, ans=0.125 2024-08-18 08:28:55,566 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.633e+01 2.291e+01 2.562e+01 2.924e+01 4.395e+01, threshold=5.124e+01, percent-clipped=0.0 2024-08-18 08:29:09,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3794130.0, ans=0.125 2024-08-18 08:29:16,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3794130.0, ans=0.0 2024-08-18 08:29:24,241 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.11 vs. limit=15.0 2024-08-18 08:29:26,013 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 7300, loss[loss=0.08685, beats_loss=0.01358, ecapa_loss=0.0001303, whisper_loss=0.07197, over 22579.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01044, ecapa_loss=0.0001454, whisper_loss=0.09086, over 3917104.23 frames. ], batch size: 91, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:29:52,789 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 16 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-18 08:29:55,579 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 26 from LS+wenet, 22 from Vox, 10 fro AS 2024-08-18 08:30:22,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3794530.0, ans=0.125 2024-08-18 08:30:28,108 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.56 vs. limit=22.5 2024-08-18 08:30:38,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3794730.0, ans=0.125 2024-08-18 08:30:39,480 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 7350, loss[loss=0.1048, beats_loss=0.01104, ecapa_loss=0.0001158, whisper_loss=0.09257, over 22357.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01045, ecapa_loss=0.000145, whisper_loss=0.09076, over 3922102.23 frames. ], batch size: 88, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:31:02,883 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3794830.0, ans=0.125 2024-08-18 08:31:05,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3794930.0, ans=0.1 2024-08-18 08:31:08,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3794930.0, ans=0.125 2024-08-18 08:31:15,539 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 26 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-18 08:31:19,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3795030.0, ans=0.125 2024-08-18 08:31:21,626 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.256e+01 2.556e+01 3.026e+01 2.430e+02, threshold=5.112e+01, percent-clipped=3.0 2024-08-18 08:31:22,226 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 27 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-18 08:31:26,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3795030.0, ans=0.035 2024-08-18 08:31:42,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3795130.0, ans=0.125 2024-08-18 08:31:48,685 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 7400, loss[loss=0.08836, beats_loss=0.01047, ecapa_loss=0.0001293, whisper_loss=0.07659, over 14099.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01052, ecapa_loss=0.0001448, whisper_loss=0.09055, over 3907381.32 frames. ], batch size: 56, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:32:01,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3795330.0, ans=0.125 2024-08-18 08:32:03,557 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.55 vs. limit=22.5 2024-08-18 08:32:21,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3795430.0, ans=0.125 2024-08-18 08:32:26,735 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-18 08:32:35,654 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 23 from LS+wenet, 10 from Vox, 28 fro AS 2024-08-18 08:32:40,485 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 08:32:42,834 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-18 08:32:43,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3795530.0, ans=0.1 2024-08-18 08:32:59,507 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 19 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-18 08:33:00,669 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 7450, loss[loss=0.09103, beats_loss=0.01185, ecapa_loss=0.0001296, whisper_loss=0.07788, over 18331.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01051, ecapa_loss=0.0001465, whisper_loss=0.09015, over 3872913.75 frames. ], batch size: 73, lr: 2.36e-03, grad_scale: 1.152921504606847e+18 2024-08-18 08:33:11,287 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-18 08:33:12,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3795730.0, ans=0.0 2024-08-18 08:33:14,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3795830.0, ans=0.125 2024-08-18 08:33:20,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3795830.0, ans=0.0 2024-08-18 08:33:22,363 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 31 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-18 08:33:26,890 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 19 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-18 08:33:45,459 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.005e+01 2.298e+01 2.515e+01 2.831e+01 4.675e+01, threshold=5.030e+01, percent-clipped=0.0 2024-08-18 08:33:57,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3796130.0, ans=0.125 2024-08-18 08:34:09,207 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 24 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-18 08:34:10,470 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 20 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-18 08:34:12,728 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 7500, loss[loss=0.09549, beats_loss=0.01125, ecapa_loss=0.0001738, whisper_loss=0.0825, over 20704.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01045, ecapa_loss=0.0001465, whisper_loss=0.09084, over 3904569.17 frames. ], batch size: 89, lr: 2.35e-03, grad_scale: 1.152921504606847e+18 2024-08-18 08:34:13,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3796230.0, ans=0.125 2024-08-18 08:34:17,422 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 28 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-18 08:34:18,626 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 25 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-18 08:34:19,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3796230.0, ans=0.125 2024-08-18 08:34:25,667 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 22 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-18 08:34:27,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3796330.0, ans=0.125 2024-08-18 08:34:32,305 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-18 08:34:32,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3796330.0, ans=0.0 2024-08-18 08:34:50,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3796430.0, ans=0.125 2024-08-18 08:34:54,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3796530.0, ans=0.1 2024-08-18 08:34:58,830 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.47 vs. limit=15.0 2024-08-18 08:35:08,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3796630.0, ans=0.125 2024-08-18 08:35:10,614 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-18 08:35:15,014 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-18 08:35:20,075 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 21 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-18 08:35:21,329 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 7550, loss[loss=0.08892, beats_loss=0.01213, ecapa_loss=0.0001233, whisper_loss=0.07555, over 21148.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01043, ecapa_loss=0.0001459, whisper_loss=0.09075, over 3854743.27 frames. ], batch size: 84, lr: 2.35e-03, grad_scale: 1.152921504606847e+18 2024-08-18 08:35:26,351 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 19 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-18 08:35:28,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3796730.0, ans=0.0 2024-08-18 08:35:40,995 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 18 from LS+wenet, 21 from Vox, 51 fro AS 2024-08-18 08:35:44,701 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-18 08:36:00,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3797030.0, ans=0.125 2024-08-18 08:36:01,338 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.773e+01 2.289e+01 2.559e+01 2.836e+01 1.504e+02, threshold=5.118e+01, percent-clipped=1.0 2024-08-18 08:36:01,579 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 20 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-18 08:36:04,000 WARNING [optim.py:496] (0/4) Scaling gradients by 0.04845663160085678, model_norm_threshold=51.1837158203125 2024-08-18 08:36:04,173 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.15, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.631e+05, grad_sumsq=1.588e+07, orig_rms_sq=1.027e-02 2024-08-18 08:36:26,287 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 7600, loss[loss=0.1348, beats_loss=0.00812, ecapa_loss=0.0001787, whisper_loss=0.1249, over 18246.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01042, ecapa_loss=0.0001467, whisper_loss=0.09069, over 3872840.11 frames. ], batch size: 75, lr: 2.35e-03, grad_scale: 1.152921504606847e+18 2024-08-18 08:36:32,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3797230.0, ans=0.125 2024-08-18 08:36:36,557 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 30 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-18 08:36:45,747 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-18 08:36:49,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3797330.0, ans=0.125 2024-08-18 08:37:24,071 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-18 08:37:25,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3797630.0, ans=0.0 2024-08-18 08:37:35,313 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 7650, loss[loss=0.1124, beats_loss=0.01029, ecapa_loss=0.0001367, whisper_loss=0.1008, over 14838.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01043, ecapa_loss=0.0001464, whisper_loss=0.09104, over 3866237.01 frames. ], batch size: 57, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:37:47,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3797830.0, ans=0.0 2024-08-18 08:37:47,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3797830.0, ans=0.1 2024-08-18 08:37:51,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3797830.0, ans=0.125 2024-08-18 08:37:56,121 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 21 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-18 08:37:58,860 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-18 08:38:00,236 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 22 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-18 08:38:02,056 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.58 vs. limit=15.0 2024-08-18 08:38:18,420 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.705e+01 2.494e+01 2.701e+01 3.087e+01 1.056e+03, threshold=5.401e+01, percent-clipped=1.0 2024-08-18 08:38:21,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3798030.0, ans=0.125 2024-08-18 08:38:29,147 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 08:38:30,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3798130.0, ans=0.125 2024-08-18 08:38:41,295 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 7700, loss[loss=0.09696, beats_loss=0.008837, ecapa_loss=0.0001458, whisper_loss=0.08667, over 21928.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01047, ecapa_loss=0.000147, whisper_loss=0.09046, over 3872121.20 frames. ], batch size: 86, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:38:43,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3798230.0, ans=10.0 2024-08-18 08:38:51,239 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-18 08:38:53,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3798330.0, ans=0.125 2024-08-18 08:38:59,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3798330.0, ans=0.125 2024-08-18 08:38:59,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3798330.0, ans=0.125 2024-08-18 08:39:29,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3798530.0, ans=0.1 2024-08-18 08:39:37,854 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-18 08:39:38,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3798630.0, ans=0.125 2024-08-18 08:39:44,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3798730.0, ans=0.125 2024-08-18 08:39:45,425 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 7750, loss[loss=0.1157, beats_loss=0.01008, ecapa_loss=0.000175, whisper_loss=0.1039, over 15929.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01053, ecapa_loss=0.000145, whisper_loss=0.09052, over 3910412.57 frames. ], batch size: 64, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:39:57,348 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 24 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-18 08:40:24,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3799030.0, ans=0.125 2024-08-18 08:40:27,849 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.319e+01 2.612e+01 2.885e+01 4.256e+01, threshold=5.223e+01, percent-clipped=0.0 2024-08-18 08:40:36,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3799030.0, ans=0.0 2024-08-18 08:40:51,662 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 7800, loss[loss=0.1105, beats_loss=0.01009, ecapa_loss=0.0001346, whisper_loss=0.09908, over 22755.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01046, ecapa_loss=0.0001446, whisper_loss=0.09083, over 3902578.55 frames. ], batch size: 88, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:41:01,432 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.83 vs. limit=22.5 2024-08-18 08:41:20,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3799430.0, ans=0.0 2024-08-18 08:41:24,111 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.82 vs. limit=12.0 2024-08-18 08:41:35,357 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-18 08:41:39,364 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 15 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-18 08:41:58,030 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 7850, loss[loss=0.1108, beats_loss=0.009121, ecapa_loss=0.0001925, whisper_loss=0.09974, over 20075.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01045, ecapa_loss=0.0001452, whisper_loss=0.09104, over 3918229.25 frames. ], batch size: 80, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:41:58,242 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-18 08:42:01,003 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 29 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-18 08:42:02,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3799730.0, ans=0.2 2024-08-18 08:42:04,287 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.17 vs. limit=12.0 2024-08-18 08:42:11,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3799830.0, ans=0.2 2024-08-18 08:42:14,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3799830.0, ans=0.125 2024-08-18 08:42:22,282 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 20 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-18 08:42:22,883 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.78 vs. limit=22.5 2024-08-18 08:42:32,210 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-380000.pt 2024-08-18 08:42:42,482 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.955e+01 2.336e+01 2.580e+01 3.031e+01 8.251e+01, threshold=5.160e+01, percent-clipped=2.0 2024-08-18 08:42:49,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3800030.0, ans=0.125 2024-08-18 08:42:51,251 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-18 08:42:53,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3800130.0, ans=0.2 2024-08-18 08:42:56,467 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-18 08:43:01,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3800130.0, ans=0.125 2024-08-18 08:43:02,166 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.63 vs. limit=22.5 2024-08-18 08:43:04,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3800230.0, ans=0.1 2024-08-18 08:43:05,401 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 7900, loss[loss=0.08971, beats_loss=0.01293, ecapa_loss=0.0001747, whisper_loss=0.07503, over 21225.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01055, ecapa_loss=0.0001448, whisper_loss=0.09062, over 3860510.55 frames. ], batch size: 93, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:43:09,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3800230.0, ans=0.125 2024-08-18 08:43:12,100 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 31 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-18 08:43:40,920 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 25 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-18 08:43:52,631 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.535e+05 2024-08-18 08:43:53,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3800530.0, ans=0.0 2024-08-18 08:44:04,838 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.54 vs. limit=22.5 2024-08-18 08:44:10,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3800730.0, ans=0.05 2024-08-18 08:44:11,686 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 7950, loss[loss=0.09555, beats_loss=0.01031, ecapa_loss=0.0001461, whisper_loss=0.08378, over 19741.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01056, ecapa_loss=0.0001451, whisper_loss=0.09044, over 3874869.44 frames. ], batch size: 81, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:44:18,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3800730.0, ans=0.125 2024-08-18 08:44:21,950 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-18 08:44:22,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3800730.0, ans=0.2 2024-08-18 08:44:26,139 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 20 from LS+wenet, 20 from Vox, 53 fro AS 2024-08-18 08:44:37,058 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 15 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-18 08:44:44,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3800930.0, ans=0.1 2024-08-18 08:44:56,043 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.349e+01 2.585e+01 3.062e+01 4.002e+02, threshold=5.169e+01, percent-clipped=3.0 2024-08-18 08:45:03,069 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-18 08:45:19,095 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-18 08:45:21,682 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 8000, loss[loss=0.1081, beats_loss=0.008406, ecapa_loss=0.0001977, whisper_loss=0.09772, over 19422.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01056, ecapa_loss=0.0001444, whisper_loss=0.09056, over 3898966.70 frames. ], batch size: 81, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:45:23,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3801230.0, ans=0.04949747468305833 2024-08-18 08:45:26,681 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.13 vs. limit=15.0 2024-08-18 08:45:33,194 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-08-18 08:45:33,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3801330.0, ans=0.125 2024-08-18 08:45:35,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3801330.0, ans=0.0 2024-08-18 08:45:53,029 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-18 08:46:10,057 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 21 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-18 08:46:26,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3801630.0, ans=0.2 2024-08-18 08:46:30,580 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 8050, loss[loss=0.08978, beats_loss=0.01229, ecapa_loss=0.0001194, whisper_loss=0.07629, over 20128.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01055, ecapa_loss=0.0001437, whisper_loss=0.09033, over 3855811.73 frames. ], batch size: 80, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:46:34,786 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 20 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-18 08:46:39,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3801730.0, ans=0.125 2024-08-18 08:47:02,873 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 24 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-18 08:47:04,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3801930.0, ans=0.0 2024-08-18 08:47:08,931 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.84 vs. limit=22.5 2024-08-18 08:47:14,952 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.216e+01 2.436e+01 2.750e+01 5.017e+01, threshold=4.873e+01, percent-clipped=0.0 2024-08-18 08:47:38,716 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 8100, loss[loss=0.101, beats_loss=0.01208, ecapa_loss=0.0001295, whisper_loss=0.08763, over 22985.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01063, ecapa_loss=0.0001432, whisper_loss=0.08966, over 3867410.72 frames. ], batch size: 90, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:47:56,859 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 29 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-18 08:47:59,929 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 24 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-18 08:48:08,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3802430.0, ans=0.125 2024-08-18 08:48:11,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3802430.0, ans=0.125 2024-08-18 08:48:15,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3802430.0, ans=0.125 2024-08-18 08:48:19,190 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 22 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-18 08:48:49,410 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 8150, loss[loss=0.09578, beats_loss=0.01128, ecapa_loss=0.0001558, whisper_loss=0.08294, over 20370.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01064, ecapa_loss=0.0001433, whisper_loss=0.08935, over 3861841.93 frames. ], batch size: 84, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:48:58,304 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 19 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-18 08:49:10,224 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-18 08:49:11,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3802830.0, ans=0.1 2024-08-18 08:49:17,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3802930.0, ans=0.125 2024-08-18 08:49:30,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3802930.0, ans=0.0 2024-08-18 08:49:34,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3803030.0, ans=0.1 2024-08-18 08:49:35,785 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.360e+01 2.577e+01 2.939e+01 1.297e+02, threshold=5.154e+01, percent-clipped=2.0 2024-08-18 08:49:39,460 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.42 vs. limit=15.0 2024-08-18 08:49:40,582 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 18 from Vox, 50 fro AS 2024-08-18 08:49:47,546 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-18 08:49:50,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3803130.0, ans=0.1 2024-08-18 08:49:52,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3803130.0, ans=0.0 2024-08-18 08:50:01,598 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 8200, loss[loss=0.111, beats_loss=0.01086, ecapa_loss=0.0001311, whisper_loss=0.09882, over 23289.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01063, ecapa_loss=0.0001442, whisper_loss=0.08973, over 3911371.33 frames. ], batch size: 90, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:50:02,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3803230.0, ans=0.125 2024-08-18 08:50:12,490 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-18 08:50:18,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3803330.0, ans=0.125 2024-08-18 08:51:15,038 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 8250, loss[loss=0.1079, beats_loss=0.01205, ecapa_loss=0.0001171, whisper_loss=0.09471, over 23035.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01062, ecapa_loss=0.0001436, whisper_loss=0.08962, over 3891245.93 frames. ], batch size: 91, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:51:28,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3803730.0, ans=0.2 2024-08-18 08:51:53,889 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2024-08-18 08:51:57,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3803930.0, ans=0.125 2024-08-18 08:52:02,881 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.328e+01 2.473e+01 2.837e+01 4.238e+01, threshold=4.947e+01, percent-clipped=0.0 2024-08-18 08:52:21,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=3804130.0, ans=10.0 2024-08-18 08:52:28,161 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.12 vs. limit=10.0 2024-08-18 08:52:29,858 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 8300, loss[loss=0.09401, beats_loss=0.009705, ecapa_loss=0.0001365, whisper_loss=0.08294, over 15641.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01067, ecapa_loss=0.0001421, whisper_loss=0.08924, over 3910908.21 frames. ], batch size: 61, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:52:31,650 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 27 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-18 08:52:51,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3804330.0, ans=0.2 2024-08-18 08:52:58,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3804330.0, ans=0.125 2024-08-18 08:52:59,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3804330.0, ans=0.0 2024-08-18 08:53:02,616 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-18 08:53:05,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3804430.0, ans=0.125 2024-08-18 08:53:12,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=3804430.0, ans=10.0 2024-08-18 08:53:14,423 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 22 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-18 08:53:21,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3804530.0, ans=0.1 2024-08-18 08:53:28,807 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 33 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-18 08:53:33,067 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 17 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-18 08:53:37,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3804630.0, ans=0.125 2024-08-18 08:53:48,470 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 8350, loss[loss=0.0996, beats_loss=0.01022, ecapa_loss=0.0001187, whisper_loss=0.08819, over 15806.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01059, ecapa_loss=0.000143, whisper_loss=0.08964, over 3926084.85 frames. ], batch size: 58, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:53:57,571 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 29 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-18 08:53:59,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3804730.0, ans=0.0 2024-08-18 08:54:05,026 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 27 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-18 08:54:07,152 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.85 vs. limit=6.0 2024-08-18 08:54:16,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3804930.0, ans=0.035 2024-08-18 08:54:34,890 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.322e+01 2.536e+01 2.820e+01 4.636e+01, threshold=5.072e+01, percent-clipped=0.0 2024-08-18 08:54:39,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3805030.0, ans=0.125 2024-08-18 08:54:40,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3805030.0, ans=0.125 2024-08-18 08:54:41,596 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 20 from LS+wenet, 12 from Vox, 47 fro AS 2024-08-18 08:54:42,137 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.22 vs. limit=15.0 2024-08-18 08:54:58,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3805130.0, ans=0.1 2024-08-18 08:55:03,555 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 8400, loss[loss=0.08312, beats_loss=0.01248, ecapa_loss=0.000157, whisper_loss=0.06907, over 20533.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01058, ecapa_loss=0.0001435, whisper_loss=0.0896, over 3905027.45 frames. ], batch size: 89, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:55:10,691 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 21 from LS+wenet, 21 from Vox, 50 fro AS 2024-08-18 08:55:12,704 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 28 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-18 08:55:30,467 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-18 08:55:32,455 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.25 vs. limit=15.0 2024-08-18 08:55:53,786 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 15 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-18 08:56:15,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3805630.0, ans=0.0 2024-08-18 08:56:23,614 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 8450, loss[loss=0.1153, beats_loss=0.00753, ecapa_loss=0.000172, whisper_loss=0.1061, over 21521.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01053, ecapa_loss=0.0001432, whisper_loss=0.09002, over 3901029.62 frames. ], batch size: 89, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:56:27,365 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-18 08:57:18,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=3805830.0, ans=15.0 2024-08-18 08:57:22,301 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 22 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-18 08:57:30,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3805930.0, ans=0.0 2024-08-18 08:57:32,917 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.83 vs. limit=15.0 2024-08-18 08:57:45,555 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.807e+01 2.360e+01 2.600e+01 3.017e+01 9.268e+01, threshold=5.200e+01, percent-clipped=1.0 2024-08-18 08:58:00,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3806130.0, ans=0.125 2024-08-18 08:58:02,747 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-18 08:58:10,411 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 15 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-18 08:58:16,434 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 8500, loss[loss=0.1156, beats_loss=0.01016, ecapa_loss=0.0001344, whisper_loss=0.1041, over 23851.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01051, ecapa_loss=0.0001431, whisper_loss=0.09008, over 3889750.31 frames. ], batch size: 92, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:58:18,121 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.73 vs. limit=15.0 2024-08-18 08:58:24,627 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-18 08:58:29,489 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.44 vs. limit=15.0 2024-08-18 08:58:37,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3806330.0, ans=0.0 2024-08-18 08:58:58,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3806430.0, ans=0.09899494936611666 2024-08-18 08:59:07,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3806530.0, ans=0.0 2024-08-18 08:59:21,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3806630.0, ans=0.0 2024-08-18 08:59:28,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3806630.0, ans=0.1 2024-08-18 08:59:31,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3806630.0, ans=0.125 2024-08-18 08:59:33,480 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 34 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-18 08:59:36,632 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-18 08:59:37,899 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 8550, loss[loss=0.1074, beats_loss=0.01023, ecapa_loss=0.0001581, whisper_loss=0.09555, over 21962.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0105, ecapa_loss=0.0001439, whisper_loss=0.09009, over 3911009.28 frames. ], batch size: 89, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:00:08,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3806930.0, ans=0.0 2024-08-18 09:00:08,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3806930.0, ans=0.0 2024-08-18 09:00:17,913 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 24 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-18 09:00:27,968 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.776e+01 2.270e+01 2.542e+01 2.954e+01 6.029e+01, threshold=5.084e+01, percent-clipped=1.0 2024-08-18 09:00:55,592 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 8600, loss[loss=0.1094, beats_loss=0.009356, ecapa_loss=0.0001073, whisper_loss=0.09896, over 17794.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0104, ecapa_loss=0.0001441, whisper_loss=0.09074, over 3889292.21 frames. ], batch size: 64, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:01:06,849 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 26 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-18 09:01:29,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3807430.0, ans=0.1 2024-08-18 09:01:34,387 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 16 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-18 09:01:47,193 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-18 09:02:09,284 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 8650, loss[loss=0.1179, beats_loss=0.009802, ecapa_loss=0.0001499, whisper_loss=0.1066, over 23795.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01046, ecapa_loss=0.0001445, whisper_loss=0.09043, over 3871114.27 frames. ], batch size: 93, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:02:15,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3807730.0, ans=0.125 2024-08-18 09:02:19,935 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 26 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-18 09:02:20,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3807730.0, ans=0.1 2024-08-18 09:02:31,483 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 27 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-18 09:02:39,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3807930.0, ans=0.0 2024-08-18 09:02:41,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3807930.0, ans=0.125 2024-08-18 09:02:45,937 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 15 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-18 09:02:48,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3807930.0, ans=10.0 2024-08-18 09:02:49,269 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 18 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-18 09:02:54,938 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 18 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-18 09:02:56,378 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.256e+01 2.496e+01 2.847e+01 1.710e+02, threshold=4.992e+01, percent-clipped=2.0 2024-08-18 09:03:04,883 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3808030.0, ans=0.125 2024-08-18 09:03:08,085 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-18 09:03:09,692 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 28 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-18 09:03:13,478 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.34 vs. limit=22.5 2024-08-18 09:03:14,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3808130.0, ans=0.09899494936611666 2024-08-18 09:03:22,276 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 8700, loss[loss=0.111, beats_loss=0.00957, ecapa_loss=0.000169, whisper_loss=0.09978, over 15514.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01056, ecapa_loss=0.0001444, whisper_loss=0.0899, over 3881490.37 frames. ], batch size: 61, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:03:25,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3808230.0, ans=0.125 2024-08-18 09:03:31,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3808230.0, ans=0.0 2024-08-18 09:03:35,042 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-18 09:03:39,412 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-18 09:03:53,072 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 21 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-18 09:04:00,594 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.02 vs. limit=22.5 2024-08-18 09:04:32,230 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 8750, loss[loss=0.08736, beats_loss=0.01181, ecapa_loss=0.0001736, whisper_loss=0.07382, over 21544.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01056, ecapa_loss=0.000144, whisper_loss=0.08975, over 3892528.70 frames. ], batch size: 90, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:04:54,726 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.48 vs. limit=15.0 2024-08-18 09:05:13,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3809030.0, ans=0.0 2024-08-18 09:05:14,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3809030.0, ans=0.2 2024-08-18 09:05:15,048 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.311e+01 2.590e+01 2.877e+01 4.936e+01, threshold=5.179e+01, percent-clipped=0.0 2024-08-18 09:05:16,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3809030.0, ans=0.125 2024-08-18 09:05:30,992 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-18 09:05:33,531 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-18 09:05:38,476 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 8800, loss[loss=0.09159, beats_loss=0.01244, ecapa_loss=0.0001349, whisper_loss=0.0778, over 16335.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01063, ecapa_loss=0.0001443, whisper_loss=0.08964, over 3915910.09 frames. ], batch size: 69, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:06:00,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3809330.0, ans=0.125 2024-08-18 09:06:08,599 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2024-08-18 09:06:14,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3809430.0, ans=0.125 2024-08-18 09:06:16,625 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-18 09:06:22,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3809530.0, ans=0.125 2024-08-18 09:06:32,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten.whitening_limit, batch_count=3809630.0, ans=15.0 2024-08-18 09:06:38,204 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.87 vs. limit=15.0 2024-08-18 09:06:41,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3809730.0, ans=0.2 2024-08-18 09:06:42,316 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 8850, loss[loss=0.106, beats_loss=0.009994, ecapa_loss=0.0001152, whisper_loss=0.09487, over 24595.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01067, ecapa_loss=0.0001441, whisper_loss=0.08929, over 3892372.36 frames. ], batch size: 94, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:06:43,744 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-18 09:06:49,978 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-18 09:06:59,365 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-18 09:07:10,274 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-18 09:07:10,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3809930.0, ans=0.2 2024-08-18 09:07:11,638 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-18 09:07:16,738 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.18 vs. limit=15.0 2024-08-18 09:07:23,696 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.147e+01 2.440e+01 2.882e+01 4.203e+01, threshold=4.880e+01, percent-clipped=0.0 2024-08-18 09:07:26,244 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-18 09:07:30,853 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.93 vs. limit=15.0 2024-08-18 09:07:43,214 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-18 09:07:47,352 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 8900, loss[loss=0.08454, beats_loss=0.01239, ecapa_loss=0.0001477, whisper_loss=0.07067, over 22458.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01068, ecapa_loss=0.0001445, whisper_loss=0.08918, over 3876019.12 frames. ], batch size: 94, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:07:51,720 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-18 09:07:58,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3810230.0, ans=0.1 2024-08-18 09:08:10,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3810330.0, ans=0.125 2024-08-18 09:08:14,022 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 25 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-18 09:08:28,722 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-18 09:08:32,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3810530.0, ans=0.0 2024-08-18 09:08:38,897 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 18 from LS+wenet, 31 from Vox, 26 fro AS 2024-08-18 09:08:40,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3810630.0, ans=0.025 2024-08-18 09:08:53,015 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 8950, loss[loss=0.1141, beats_loss=0.01055, ecapa_loss=0.0001532, whisper_loss=0.102, over 23017.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01054, ecapa_loss=0.000144, whisper_loss=0.09007, over 3855234.13 frames. ], batch size: 94, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:09:10,148 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.55 vs. limit=6.0 2024-08-18 09:09:12,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3810830.0, ans=0.1 2024-08-18 09:09:26,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3810930.0, ans=0.125 2024-08-18 09:09:35,387 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.997e+01 2.305e+01 2.570e+01 2.937e+01 4.370e+01, threshold=5.139e+01, percent-clipped=0.0 2024-08-18 09:09:40,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3811030.0, ans=0.125 2024-08-18 09:09:59,359 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 9000, loss[loss=0.08884, beats_loss=0.01289, ecapa_loss=0.0001139, whisper_loss=0.07481, over 22683.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01056, ecapa_loss=0.0001447, whisper_loss=0.08934, over 3866097.82 frames. ], batch size: 90, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:09:59,361 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-18 09:10:40,555 INFO [train_multi_KD3.py:1149] (0/4) Epoch 26, validation on ASR_libri: loss=0.2531, beats_loss=0, ecapa_loss=0.0005276, whisper_loss=0.2478, over 922467.00 frames. 2024-08-18 09:10:46,181 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.7990, 1.3565, 2.0033, 1.1108, 1.4127, 2.1644, 2.2679, 1.3275], device='cuda:0') 2024-08-18 09:10:56,606 INFO [train_multi_KD3.py:1149] (0/4) Epoch 26, validation on SV_voxceleb1: loss=0.004116, beats_loss=0, ecapa_loss=0.0004116, whisper_loss=0, over 939242.00 frames. 2024-08-18 09:12:49,818 INFO [train_multi_KD3.py:1149] (0/4) Epoch 26, validation on AT_audioset: loss=0.02315, beats_loss=0.02315, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 09:12:49,822 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32389MB 2024-08-18 09:12:56,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3811230.0, ans=0.0 2024-08-18 09:13:13,026 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 13 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-18 09:13:15,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3811430.0, ans=0.125 2024-08-18 09:13:18,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3811430.0, ans=0.2 2024-08-18 09:13:22,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3811430.0, ans=0.0 2024-08-18 09:13:23,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3811430.0, ans=0.0 2024-08-18 09:13:27,688 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-18 09:13:56,144 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 9050, loss[loss=0.1033, beats_loss=0.01035, ecapa_loss=0.0001458, whisper_loss=0.09148, over 22733.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01048, ecapa_loss=0.000146, whisper_loss=0.08956, over 3849643.74 frames. ], batch size: 92, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:14:09,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3811830.0, ans=0.125 2024-08-18 09:14:10,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3811830.0, ans=0.1 2024-08-18 09:14:37,994 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.711e+01 2.306e+01 2.516e+01 2.800e+01 4.042e+01, threshold=5.033e+01, percent-clipped=0.0 2024-08-18 09:14:51,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3812130.0, ans=0.1 2024-08-18 09:15:02,071 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 9100, loss[loss=0.101, beats_loss=0.0117, ecapa_loss=0.0001464, whisper_loss=0.08784, over 21988.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01045, ecapa_loss=0.0001463, whisper_loss=0.08991, over 3863461.97 frames. ], batch size: 92, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:15:03,178 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.98 vs. limit=15.0 2024-08-18 09:15:15,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3812330.0, ans=0.09899494936611666 2024-08-18 09:15:30,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3812430.0, ans=0.1 2024-08-18 09:15:30,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3812430.0, ans=0.0 2024-08-18 09:15:35,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3812430.0, ans=0.1 2024-08-18 09:15:47,742 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-18 09:15:48,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3812530.0, ans=0.2 2024-08-18 09:15:50,784 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=15.66 vs. limit=15.0 2024-08-18 09:15:55,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3812630.0, ans=0.1 2024-08-18 09:15:57,538 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 22 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-18 09:16:06,554 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 9150, loss[loss=0.09226, beats_loss=0.01082, ecapa_loss=0.0001978, whisper_loss=0.07946, over 20176.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01041, ecapa_loss=0.0001469, whisper_loss=0.09037, over 3899321.47 frames. ], batch size: 88, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:16:10,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3812730.0, ans=0.0 2024-08-18 09:16:13,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3812730.0, ans=0.1 2024-08-18 09:16:47,031 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.602e+01 2.322e+01 2.560e+01 2.828e+01 4.789e+01, threshold=5.121e+01, percent-clipped=0.0 2024-08-18 09:16:48,836 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=7.502e-01 2024-08-18 09:16:53,038 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.86 vs. limit=10.0 2024-08-18 09:17:03,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3813130.0, ans=0.125 2024-08-18 09:17:09,809 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-18 09:17:10,771 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 9200, loss[loss=0.1088, beats_loss=0.01058, ecapa_loss=0.0001505, whisper_loss=0.09673, over 22487.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01046, ecapa_loss=0.0001458, whisper_loss=0.09053, over 3879401.58 frames. ], batch size: 92, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:17:14,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3813230.0, ans=0.125 2024-08-18 09:17:16,450 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 23 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-18 09:17:19,718 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.87 vs. limit=22.5 2024-08-18 09:17:20,371 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-18 09:17:28,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3813330.0, ans=0.125 2024-08-18 09:17:37,296 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-18 09:17:50,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3813530.0, ans=0.125 2024-08-18 09:17:58,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3813530.0, ans=0.035 2024-08-18 09:18:13,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3813730.0, ans=0.1 2024-08-18 09:18:14,898 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 9250, loss[loss=0.1077, beats_loss=0.00986, ecapa_loss=0.0001416, whisper_loss=0.09642, over 23057.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01048, ecapa_loss=0.0001476, whisper_loss=0.09015, over 3858461.05 frames. ], batch size: 90, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:18:22,470 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 20 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-18 09:18:23,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3813730.0, ans=0.0 2024-08-18 09:18:36,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3813830.0, ans=0.125 2024-08-18 09:18:55,774 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.301e+01 2.520e+01 2.839e+01 9.703e+01, threshold=5.041e+01, percent-clipped=1.0 2024-08-18 09:19:11,482 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.81 vs. limit=15.0 2024-08-18 09:19:13,496 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-18 09:19:16,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3814130.0, ans=0.125 2024-08-18 09:19:18,372 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 9300, loss[loss=0.1058, beats_loss=0.01013, ecapa_loss=0.0001304, whisper_loss=0.09436, over 19439.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01046, ecapa_loss=0.0001473, whisper_loss=0.09002, over 3855720.07 frames. ], batch size: 76, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:19:31,053 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-18 09:19:32,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3814330.0, ans=0.125 2024-08-18 09:19:42,039 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-18 09:19:47,022 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-18 09:19:48,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3814430.0, ans=0.125 2024-08-18 09:19:53,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3814430.0, ans=0.125 2024-08-18 09:19:53,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3814430.0, ans=10.0 2024-08-18 09:20:12,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3814630.0, ans=0.0 2024-08-18 09:20:16,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3814630.0, ans=0.125 2024-08-18 09:20:19,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3814730.0, ans=0.0 2024-08-18 09:20:20,882 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 9350, loss[loss=0.08974, beats_loss=0.01117, ecapa_loss=0.0001538, whisper_loss=0.07704, over 16374.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01044, ecapa_loss=0.000147, whisper_loss=0.09083, over 3866332.66 frames. ], batch size: 69, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:20:23,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3814730.0, ans=0.125 2024-08-18 09:20:25,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3814730.0, ans=0.125 2024-08-18 09:20:31,589 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.34 vs. limit=6.0 2024-08-18 09:20:33,631 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 18 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-18 09:20:35,504 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.81 vs. limit=15.0 2024-08-18 09:20:41,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3814830.0, ans=0.2 2024-08-18 09:20:41,631 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.97 vs. limit=15.0 2024-08-18 09:20:44,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3814930.0, ans=0.0 2024-08-18 09:21:00,570 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.210e+01 2.465e+01 2.743e+01 3.638e+02, threshold=4.930e+01, percent-clipped=1.0 2024-08-18 09:21:19,847 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 24 from LS+wenet, 11 from Vox, 21 fro AS 2024-08-18 09:21:23,439 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 9400, loss[loss=0.114, beats_loss=0.008507, ecapa_loss=0.0001653, whisper_loss=0.1038, over 22242.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01049, ecapa_loss=0.0001463, whisper_loss=0.09082, over 3878535.18 frames. ], batch size: 90, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:21:25,826 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 28 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-18 09:21:29,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3815230.0, ans=0.0 2024-08-18 09:21:33,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3815230.0, ans=0.1 2024-08-18 09:21:47,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3815430.0, ans=0.2 2024-08-18 09:21:48,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3815430.0, ans=0.125 2024-08-18 09:21:51,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=3815430.0, ans=0.025 2024-08-18 09:21:56,018 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-18 09:21:58,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3815430.0, ans=0.0 2024-08-18 09:21:58,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3815430.0, ans=0.125 2024-08-18 09:22:01,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3815530.0, ans=0.05 2024-08-18 09:22:01,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3815530.0, ans=0.0 2024-08-18 09:22:02,617 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.96 vs. limit=12.0 2024-08-18 09:22:15,370 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 17 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-18 09:22:16,563 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 14 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-18 09:22:20,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3815630.0, ans=0.125 2024-08-18 09:22:26,091 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 9450, loss[loss=0.07422, beats_loss=0.01048, ecapa_loss=0.0001396, whisper_loss=0.06235, over 13826.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01047, ecapa_loss=0.0001473, whisper_loss=0.0898, over 3846709.20 frames. ], batch size: 56, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:22:37,235 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 19 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-18 09:22:51,444 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.17 vs. limit=12.0 2024-08-18 09:22:53,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3815930.0, ans=0.0 2024-08-18 09:23:05,761 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.291e+01 2.521e+01 2.767e+01 4.094e+02, threshold=5.042e+01, percent-clipped=1.0 2024-08-18 09:23:05,929 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 15 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-18 09:23:08,310 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 31 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-18 09:23:08,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3816030.0, ans=0.07 2024-08-18 09:23:27,403 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 9500, loss[loss=0.1081, beats_loss=0.009925, ecapa_loss=0.0001387, whisper_loss=0.09678, over 22088.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01054, ecapa_loss=0.0001469, whisper_loss=0.08961, over 3871138.11 frames. ], batch size: 89, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:23:28,797 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 39 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-18 09:23:33,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3816230.0, ans=0.0 2024-08-18 09:23:38,266 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 17 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-18 09:23:58,515 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.34 vs. limit=6.0 2024-08-18 09:24:10,108 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.38 vs. limit=15.0 2024-08-18 09:24:21,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3816630.0, ans=0.125 2024-08-18 09:24:28,539 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 9550, loss[loss=0.09742, beats_loss=0.01099, ecapa_loss=0.0001427, whisper_loss=0.085, over 16342.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01052, ecapa_loss=0.0001469, whisper_loss=0.08983, over 3858723.63 frames. ], batch size: 66, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:24:31,570 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3816730.0, ans=0.0 2024-08-18 09:24:32,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3816730.0, ans=0.125 2024-08-18 09:24:32,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3816730.0, ans=0.1 2024-08-18 09:25:08,130 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.361e+01 2.629e+01 2.923e+01 5.081e+01, threshold=5.257e+01, percent-clipped=1.0 2024-08-18 09:25:23,034 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-18 09:25:26,671 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 19 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-18 09:25:30,064 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 9600, loss[loss=0.07456, beats_loss=0.01355, ecapa_loss=0.0001119, whisper_loss=0.05988, over 17499.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0105, ecapa_loss=0.0001463, whisper_loss=0.08966, over 3823656.98 frames. ], batch size: 68, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:25:45,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3817330.0, ans=0.1 2024-08-18 09:25:50,789 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.55 vs. limit=15.0 2024-08-18 09:25:56,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3817430.0, ans=0.1 2024-08-18 09:26:12,813 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 21 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-18 09:26:20,362 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 18 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-18 09:26:31,393 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 20 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-18 09:26:32,470 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 9650, loss[loss=0.1116, beats_loss=0.01011, ecapa_loss=0.0001324, whisper_loss=0.1002, over 15718.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01042, ecapa_loss=0.0001448, whisper_loss=0.09024, over 3814491.08 frames. ], batch size: 60, lr: 2.35e-03, grad_scale: 1.152921504606847e+18 2024-08-18 09:26:42,731 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 28 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-18 09:27:03,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3817930.0, ans=0.125 2024-08-18 09:27:11,846 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.349e+01 2.605e+01 2.994e+01 4.918e+01, threshold=5.210e+01, percent-clipped=0.0 2024-08-18 09:27:18,364 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 21 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-18 09:27:28,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3818130.0, ans=0.0 2024-08-18 09:27:30,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3818130.0, ans=0.1 2024-08-18 09:27:33,996 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 9700, loss[loss=0.07149, beats_loss=0.01047, ecapa_loss=0.0001423, whisper_loss=0.0596, over 17587.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01047, ecapa_loss=0.0001453, whisper_loss=0.08936, over 3807772.72 frames. ], batch size: 74, lr: 2.35e-03, grad_scale: 1.152921504606847e+18 2024-08-18 09:27:39,621 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.39 vs. limit=15.0 2024-08-18 09:27:41,576 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 19 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-18 09:27:43,987 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-18 09:27:44,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3818230.0, ans=0.0 2024-08-18 09:28:01,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3818430.0, ans=0.0 2024-08-18 09:28:05,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3818430.0, ans=0.125 2024-08-18 09:28:12,255 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.03 vs. limit=15.0 2024-08-18 09:28:27,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3818630.0, ans=0.125 2024-08-18 09:28:27,842 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=19.12 vs. limit=15.0 2024-08-18 09:28:28,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3818630.0, ans=0.0 2024-08-18 09:28:28,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3818630.0, ans=0.0 2024-08-18 09:28:36,215 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 9750, loss[loss=0.09266, beats_loss=0.01011, ecapa_loss=0.000153, whisper_loss=0.08102, over 16961.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01053, ecapa_loss=0.0001446, whisper_loss=0.08877, over 3812018.36 frames. ], batch size: 69, lr: 2.35e-03, grad_scale: 1.152921504606847e+18 2024-08-18 09:28:39,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3818730.0, ans=0.125 2024-08-18 09:28:42,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3818730.0, ans=0.0 2024-08-18 09:28:44,176 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.67 vs. limit=15.0 2024-08-18 09:29:12,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3819030.0, ans=0.0 2024-08-18 09:29:15,744 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.762e+01 2.255e+01 2.464e+01 2.832e+01 2.481e+02, threshold=4.927e+01, percent-clipped=2.0 2024-08-18 09:29:16,750 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.77 vs. limit=15.0 2024-08-18 09:29:23,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3819030.0, ans=0.1 2024-08-18 09:29:25,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3819130.0, ans=0.025 2024-08-18 09:29:26,600 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 17 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-18 09:29:31,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3819130.0, ans=0.125 2024-08-18 09:29:31,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3819130.0, ans=0.0 2024-08-18 09:29:31,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3819130.0, ans=0.0 2024-08-18 09:29:37,751 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 9800, loss[loss=0.09611, beats_loss=0.01243, ecapa_loss=0.0001286, whisper_loss=0.0824, over 21924.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01052, ecapa_loss=0.0001453, whisper_loss=0.0893, over 3818523.09 frames. ], batch size: 89, lr: 2.35e-03, grad_scale: 1.152921504606847e+18 2024-08-18 09:29:48,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3819230.0, ans=0.0 2024-08-18 09:30:00,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3819330.0, ans=0.1 2024-08-18 09:30:08,892 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.079e+01 2024-08-18 09:30:10,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3819430.0, ans=0.0 2024-08-18 09:30:26,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3819630.0, ans=0.125 2024-08-18 09:30:30,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3819630.0, ans=0.125 2024-08-18 09:30:38,656 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 9850, loss[loss=0.09738, beats_loss=0.009306, ecapa_loss=0.0001729, whisper_loss=0.08634, over 14364.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01046, ecapa_loss=0.0001462, whisper_loss=0.08986, over 3819815.39 frames. ], batch size: 57, lr: 2.35e-03, grad_scale: 1.152921504606847e+18 2024-08-18 09:30:51,929 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 23 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-18 09:31:06,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3819930.0, ans=0.0 2024-08-18 09:31:12,911 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 19 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-18 09:31:13,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3819930.0, ans=0.035 2024-08-18 09:31:18,665 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.410e+01 2.708e+01 2.991e+01 3.936e+01, threshold=5.416e+01, percent-clipped=0.0 2024-08-18 09:31:26,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3820130.0, ans=0.125 2024-08-18 09:31:27,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3820130.0, ans=0.2 2024-08-18 09:31:30,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3820130.0, ans=0.125 2024-08-18 09:31:37,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3820130.0, ans=0.1 2024-08-18 09:31:38,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3820230.0, ans=0.0 2024-08-18 09:31:39,608 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 9900, loss[loss=0.09916, beats_loss=0.01339, ecapa_loss=9.644e-05, whisper_loss=0.08481, over 23052.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0105, ecapa_loss=0.0001452, whisper_loss=0.09015, over 3803322.00 frames. ], batch size: 88, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:31:43,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3820230.0, ans=0.0 2024-08-18 09:31:46,965 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 29 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-18 09:31:53,971 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-18 09:31:55,366 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-18 09:31:56,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3820330.0, ans=0.125 2024-08-18 09:32:02,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3820330.0, ans=0.0 2024-08-18 09:32:10,615 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=16.61 vs. limit=15.0 2024-08-18 09:32:11,140 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 13 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-18 09:32:24,885 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 21 from LS+wenet, 9 from Vox, 29 fro AS 2024-08-18 09:32:36,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3820630.0, ans=0.125 2024-08-18 09:32:40,142 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 30 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-18 09:32:40,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3820630.0, ans=0.07 2024-08-18 09:32:42,364 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 9950, loss[loss=0.1141, beats_loss=0.01086, ecapa_loss=0.0001622, whisper_loss=0.1016, over 21506.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01057, ecapa_loss=0.0001448, whisper_loss=0.09008, over 3813574.50 frames. ], batch size: 87, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:32:43,572 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-18 09:32:47,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3820730.0, ans=0.1 2024-08-18 09:33:00,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3820830.0, ans=0.2 2024-08-18 09:33:09,624 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.93 vs. limit=15.0 2024-08-18 09:33:16,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3820930.0, ans=0.1 2024-08-18 09:33:22,698 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.257e+01 2.517e+01 2.867e+01 4.376e+01, threshold=5.034e+01, percent-clipped=0.0 2024-08-18 09:33:29,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3821030.0, ans=0.1 2024-08-18 09:33:31,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3821130.0, ans=0.125 2024-08-18 09:33:32,975 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 21 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-18 09:33:43,922 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 10000, loss[loss=0.1161, beats_loss=0.008396, ecapa_loss=0.0001431, whisper_loss=0.1063, over 22372.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01059, ecapa_loss=0.0001443, whisper_loss=0.09011, over 3843222.74 frames. ], batch size: 90, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:33:44,275 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3821230.0, ans=0.0 2024-08-18 09:34:20,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3821530.0, ans=0.0 2024-08-18 09:34:22,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3821530.0, ans=0.09899494936611666 2024-08-18 09:34:31,162 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 18 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-18 09:34:32,322 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 29 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-18 09:34:33,516 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-18 09:34:38,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3821630.0, ans=0.0 2024-08-18 09:34:45,185 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 10050, loss[loss=0.08091, beats_loss=0.01142, ecapa_loss=0.0001514, whisper_loss=0.06798, over 18203.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01054, ecapa_loss=0.0001448, whisper_loss=0.0901, over 3849991.11 frames. ], batch size: 77, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:34:55,681 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.34 vs. limit=15.0 2024-08-18 09:35:12,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3821930.0, ans=0.0 2024-08-18 09:35:25,376 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.531e+01 2.231e+01 2.440e+01 2.652e+01 3.423e+01, threshold=4.880e+01, percent-clipped=0.0 2024-08-18 09:35:29,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3822030.0, ans=0.125 2024-08-18 09:35:43,258 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.64 vs. limit=15.0 2024-08-18 09:35:45,977 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 10100, loss[loss=0.1029, beats_loss=0.01354, ecapa_loss=0.0001354, whisper_loss=0.08799, over 22279.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01063, ecapa_loss=0.0001449, whisper_loss=0.08951, over 3877579.71 frames. ], batch size: 92, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:35:55,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3822230.0, ans=0.0 2024-08-18 09:36:06,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3822330.0, ans=0.1 2024-08-18 09:36:08,485 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 15 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-18 09:36:16,630 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.15 vs. limit=10.0 2024-08-18 09:36:23,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3822530.0, ans=0.125 2024-08-18 09:36:23,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3822530.0, ans=0.125 2024-08-18 09:36:24,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3822530.0, ans=0.125 2024-08-18 09:36:33,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3822530.0, ans=0.0 2024-08-18 09:36:38,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3822630.0, ans=0.125 2024-08-18 09:36:41,630 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 15 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-18 09:36:42,251 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.57 vs. limit=12.0 2024-08-18 09:36:45,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3822630.0, ans=0.125 2024-08-18 09:36:46,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3822730.0, ans=0.125 2024-08-18 09:36:47,543 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 10150, loss[loss=0.08437, beats_loss=0.01426, ecapa_loss=0.000136, whisper_loss=0.06875, over 21171.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01063, ecapa_loss=0.0001457, whisper_loss=0.08916, over 3879267.21 frames. ], batch size: 90, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:36:59,424 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.01 vs. limit=15.0 2024-08-18 09:37:17,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3822930.0, ans=0.2 2024-08-18 09:37:28,037 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.258e+01 2.544e+01 2.982e+01 1.019e+02, threshold=5.088e+01, percent-clipped=1.0 2024-08-18 09:37:31,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3823030.0, ans=0.125 2024-08-18 09:37:35,694 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 23 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-18 09:37:35,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3823130.0, ans=0.0 2024-08-18 09:37:48,908 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 10200, loss[loss=0.1226, beats_loss=0.00961, ecapa_loss=0.0001581, whisper_loss=0.1114, over 22711.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01061, ecapa_loss=0.0001459, whisper_loss=0.08918, over 3895960.24 frames. ], batch size: 91, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:37:58,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3823230.0, ans=0.125 2024-08-18 09:38:17,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3823430.0, ans=0.125 2024-08-18 09:38:21,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3823430.0, ans=0.0 2024-08-18 09:38:39,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3823630.0, ans=0.2 2024-08-18 09:38:51,510 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 10250, loss[loss=0.1082, beats_loss=0.01115, ecapa_loss=0.0001216, whisper_loss=0.09588, over 22675.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01059, ecapa_loss=0.0001463, whisper_loss=0.08922, over 3885713.55 frames. ], batch size: 88, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:39:05,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3823830.0, ans=0.0 2024-08-18 09:39:14,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3823830.0, ans=0.0 2024-08-18 09:39:29,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3824030.0, ans=0.1 2024-08-18 09:39:34,724 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.298e+01 2.473e+01 2.719e+01 4.293e+01, threshold=4.946e+01, percent-clipped=0.0 2024-08-18 09:39:56,902 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 10300, loss[loss=0.09878, beats_loss=0.01254, ecapa_loss=0.0001106, whisper_loss=0.08514, over 23515.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0106, ecapa_loss=0.0001457, whisper_loss=0.08933, over 3896326.61 frames. ], batch size: 93, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:39:58,230 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 27 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-18 09:40:10,734 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 22 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-18 09:40:23,948 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-18 09:40:29,637 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.57 vs. limit=12.0 2024-08-18 09:40:54,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3824630.0, ans=0.2 2024-08-18 09:41:01,205 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 10350, loss[loss=0.08076, beats_loss=0.01292, ecapa_loss=0.0001137, whisper_loss=0.06671, over 24230.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01058, ecapa_loss=0.0001461, whisper_loss=0.09014, over 3939256.51 frames. ], batch size: 96, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:41:01,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3824730.0, ans=0.125 2024-08-18 09:41:20,610 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=2.518e-03 2024-08-18 09:41:27,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3824930.0, ans=0.125 2024-08-18 09:41:35,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3824930.0, ans=0.125 2024-08-18 09:41:38,338 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.38 vs. limit=15.0 2024-08-18 09:41:40,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3825030.0, ans=0.125 2024-08-18 09:41:41,553 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-18 09:41:41,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3825030.0, ans=0.125 2024-08-18 09:41:42,573 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.620e+01 2.342e+01 2.610e+01 2.810e+01 3.800e+01, threshold=5.220e+01, percent-clipped=0.0 2024-08-18 09:41:43,883 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-18 09:41:50,406 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 32 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-18 09:41:55,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3825130.0, ans=0.125 2024-08-18 09:42:04,100 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 10400, loss[loss=0.0826, beats_loss=0.01147, ecapa_loss=0.0001285, whisper_loss=0.06985, over 17179.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01062, ecapa_loss=0.0001435, whisper_loss=0.09044, over 3939529.64 frames. ], batch size: 66, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:42:09,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3825230.0, ans=0.0 2024-08-18 09:42:14,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3825230.0, ans=0.125 2024-08-18 09:42:17,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3825330.0, ans=0.125 2024-08-18 09:42:19,530 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.27 vs. limit=10.0 2024-08-18 09:42:37,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3825430.0, ans=0.125 2024-08-18 09:42:38,256 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-18 09:42:52,180 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 14 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-18 09:42:59,241 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.82 vs. limit=22.5 2024-08-18 09:42:59,468 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.58 vs. limit=15.0 2024-08-18 09:43:07,502 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 10450, loss[loss=0.1103, beats_loss=0.01289, ecapa_loss=0.0001277, whisper_loss=0.09614, over 22958.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01061, ecapa_loss=0.0001446, whisper_loss=0.08995, over 3949339.04 frames. ], batch size: 92, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:43:09,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3825730.0, ans=0.1 2024-08-18 09:43:10,070 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-18 09:43:49,660 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.276e+01 2.479e+01 2.678e+01 3.882e+01, threshold=4.957e+01, percent-clipped=0.0 2024-08-18 09:44:01,907 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.61 vs. limit=15.0 2024-08-18 09:44:11,442 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 10500, loss[loss=0.09493, beats_loss=0.01076, ecapa_loss=0.0001531, whisper_loss=0.08264, over 15737.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01053, ecapa_loss=0.0001448, whisper_loss=0.08997, over 3918426.06 frames. ], batch size: 62, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:44:11,810 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 20 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-18 09:44:14,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3826230.0, ans=0.0 2024-08-18 09:44:39,469 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-18 09:44:45,942 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-18 09:44:47,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3826430.0, ans=0.125 2024-08-18 09:45:05,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=3826630.0, ans=0.02 2024-08-18 09:45:09,287 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 29 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-18 09:45:20,459 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 10550, loss[loss=0.09108, beats_loss=0.009644, ecapa_loss=0.0001494, whisper_loss=0.07994, over 20372.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01058, ecapa_loss=0.0001445, whisper_loss=0.0898, over 3895444.04 frames. ], batch size: 83, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:45:20,616 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 34 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-18 09:45:22,893 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 17 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-18 09:45:27,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3826730.0, ans=0.0 2024-08-18 09:45:28,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3826730.0, ans=0.125 2024-08-18 09:45:47,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3826930.0, ans=0.125 2024-08-18 09:45:54,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3826930.0, ans=0.125 2024-08-18 09:46:03,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3827030.0, ans=0.2 2024-08-18 09:46:07,955 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.737e+01 2.328e+01 2.590e+01 2.836e+01 3.998e+01, threshold=5.181e+01, percent-clipped=0.0 2024-08-18 09:46:12,741 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.26 vs. limit=15.0 2024-08-18 09:46:13,828 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-18 09:46:23,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=3827130.0, ans=10.0 2024-08-18 09:46:27,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=3827130.0, ans=15.0 2024-08-18 09:46:30,838 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 10600, loss[loss=0.07659, beats_loss=0.01005, ecapa_loss=0.0001714, whisper_loss=0.06482, over 21653.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01058, ecapa_loss=0.0001444, whisper_loss=0.08983, over 3921833.18 frames. ], batch size: 94, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:46:34,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3827230.0, ans=0.0 2024-08-18 09:46:37,911 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 24 from LS+wenet, 14 from Vox, 17 fro AS 2024-08-18 09:46:39,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3827230.0, ans=0.125 2024-08-18 09:46:42,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3827230.0, ans=0.125 2024-08-18 09:46:47,644 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 18 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-18 09:46:48,347 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.24 vs. limit=6.0 2024-08-18 09:46:55,546 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 28 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-18 09:47:04,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=3827430.0, ans=15.0 2024-08-18 09:47:10,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3827430.0, ans=0.1 2024-08-18 09:47:11,019 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 31 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-18 09:47:27,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3827630.0, ans=0.1 2024-08-18 09:47:29,343 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.12 vs. limit=10.0 2024-08-18 09:47:36,405 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-18 09:47:40,356 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 10650, loss[loss=0.09934, beats_loss=0.01124, ecapa_loss=0.0001284, whisper_loss=0.08682, over 19089.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01049, ecapa_loss=0.0001438, whisper_loss=0.09043, over 3895888.85 frames. ], batch size: 74, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:47:53,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3827830.0, ans=0.0 2024-08-18 09:47:53,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3827830.0, ans=0.125 2024-08-18 09:48:07,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3827930.0, ans=0.125 2024-08-18 09:48:13,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3827930.0, ans=0.0 2024-08-18 09:48:25,980 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.286e+01 2.479e+01 2.836e+01 4.249e+01, threshold=4.958e+01, percent-clipped=0.0 2024-08-18 09:48:40,173 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.96 vs. limit=22.5 2024-08-18 09:48:48,987 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 10700, loss[loss=0.09361, beats_loss=0.01054, ecapa_loss=0.0001632, whisper_loss=0.08144, over 14339.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01057, ecapa_loss=0.0001428, whisper_loss=0.09027, over 3902192.23 frames. ], batch size: 57, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:48:50,372 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-18 09:48:57,406 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 21 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-18 09:48:59,848 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-18 09:49:18,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3828430.0, ans=0.125 2024-08-18 09:49:24,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3828430.0, ans=0.0 2024-08-18 09:49:31,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3828530.0, ans=0.0 2024-08-18 09:49:33,195 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2024-08-18 09:49:44,543 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-18 09:49:58,724 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 10750, loss[loss=0.1065, beats_loss=0.009744, ecapa_loss=0.0001716, whisper_loss=0.09507, over 22707.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01056, ecapa_loss=0.0001429, whisper_loss=0.09055, over 3931192.82 frames. ], batch size: 88, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:50:00,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3828730.0, ans=0.125 2024-08-18 09:50:04,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3828730.0, ans=0.125 2024-08-18 09:50:07,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3828730.0, ans=0.125 2024-08-18 09:50:11,909 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-18 09:50:15,004 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.64 vs. limit=15.0 2024-08-18 09:50:27,370 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-18 09:50:43,103 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.343e+01 2.580e+01 2.880e+01 1.020e+02, threshold=5.160e+01, percent-clipped=2.0 2024-08-18 09:50:49,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3829030.0, ans=0.125 2024-08-18 09:50:51,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3829130.0, ans=0.2 2024-08-18 09:51:05,062 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 10800, loss[loss=0.1053, beats_loss=0.009732, ecapa_loss=0.0001554, whisper_loss=0.09398, over 21056.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01058, ecapa_loss=0.0001448, whisper_loss=0.09047, over 3912641.04 frames. ], batch size: 84, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:51:46,401 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 16 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-18 09:51:54,785 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-18 09:52:08,566 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 10850, loss[loss=0.1043, beats_loss=0.01167, ecapa_loss=0.0001307, whisper_loss=0.09134, over 21485.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0105, ecapa_loss=0.0001453, whisper_loss=0.09102, over 3911880.37 frames. ], batch size: 86, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:52:14,888 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 09:52:39,714 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.92 vs. limit=15.0 2024-08-18 09:52:46,154 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.99 vs. limit=12.0 2024-08-18 09:52:49,080 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.278e+01 2.571e+01 2.919e+01 2.090e+02, threshold=5.141e+01, percent-clipped=1.0 2024-08-18 09:53:05,355 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-18 09:53:10,067 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 10900, loss[loss=0.1033, beats_loss=0.009967, ecapa_loss=0.0001244, whisper_loss=0.09213, over 22932.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01051, ecapa_loss=0.0001457, whisper_loss=0.09114, over 3924252.48 frames. ], batch size: 88, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:53:26,834 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-08-18 09:53:35,618 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.20 vs. limit=15.0 2024-08-18 09:53:36,458 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-18 09:53:58,309 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.29 vs. limit=15.0 2024-08-18 09:54:12,680 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 10950, loss[loss=0.1232, beats_loss=0.007417, ecapa_loss=0.0001743, whisper_loss=0.114, over 20518.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0105, ecapa_loss=0.0001463, whisper_loss=0.09135, over 3910969.88 frames. ], batch size: 86, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:54:15,667 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-18 09:54:20,494 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 18 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-18 09:54:26,831 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.69 vs. limit=15.0 2024-08-18 09:54:36,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3830930.0, ans=0.125 2024-08-18 09:54:53,572 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.701e+01 2.311e+01 2.560e+01 2.813e+01 5.362e+01, threshold=5.120e+01, percent-clipped=1.0 2024-08-18 09:54:58,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3831030.0, ans=0.0 2024-08-18 09:54:59,656 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 20 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-18 09:55:01,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3831130.0, ans=0.05 2024-08-18 09:55:14,495 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 11000, loss[loss=0.09933, beats_loss=0.008304, ecapa_loss=0.00016, whisper_loss=0.08943, over 16752.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01046, ecapa_loss=0.0001462, whisper_loss=0.09151, over 3908310.47 frames. ], batch size: 67, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:55:22,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3831230.0, ans=0.1 2024-08-18 09:55:37,181 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.72 vs. limit=22.5 2024-08-18 09:55:41,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3831430.0, ans=0.1 2024-08-18 09:55:54,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3831530.0, ans=0.0 2024-08-18 09:55:56,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3831530.0, ans=0.125 2024-08-18 09:55:57,104 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-18 09:56:15,758 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 11050, loss[loss=0.0971, beats_loss=0.01257, ecapa_loss=0.0001167, whisper_loss=0.08337, over 22473.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01051, ecapa_loss=0.0001463, whisper_loss=0.0906, over 3903376.14 frames. ], batch size: 90, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:56:30,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3831830.0, ans=0.125 2024-08-18 09:56:45,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3831930.0, ans=0.125 2024-08-18 09:56:48,766 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 23 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-18 09:56:53,505 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-18 09:56:56,742 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.378e+01 2.551e+01 2.783e+01 4.873e+01, threshold=5.103e+01, percent-clipped=0.0 2024-08-18 09:57:00,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3832030.0, ans=0.0 2024-08-18 09:57:07,422 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=6.639e-02 2024-08-18 09:57:09,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3832130.0, ans=0.125 2024-08-18 09:57:17,710 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 11100, loss[loss=0.09808, beats_loss=0.01202, ecapa_loss=0.0001382, whisper_loss=0.08467, over 21604.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0105, ecapa_loss=0.0001463, whisper_loss=0.09021, over 3896925.35 frames. ], batch size: 88, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:57:19,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3832230.0, ans=0.1 2024-08-18 09:57:24,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3832230.0, ans=0.09899494936611666 2024-08-18 09:57:27,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3832230.0, ans=0.2 2024-08-18 09:57:37,727 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 38 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-18 09:57:38,879 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 27 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-18 09:57:43,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3832430.0, ans=0.2 2024-08-18 09:57:44,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3832430.0, ans=0.0 2024-08-18 09:57:47,875 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 22 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-18 09:57:50,715 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.12 vs. limit=15.0 2024-08-18 09:57:51,350 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-18 09:57:55,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3832530.0, ans=0.1 2024-08-18 09:57:56,615 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.88 vs. limit=22.5 2024-08-18 09:58:01,324 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-18 09:58:11,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3832630.0, ans=0.1 2024-08-18 09:58:19,931 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 11150, loss[loss=0.06351, beats_loss=0.01382, ecapa_loss=0.0001396, whisper_loss=0.0483, over 17562.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0105, ecapa_loss=0.0001453, whisper_loss=0.08977, over 3875195.47 frames. ], batch size: 73, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:58:21,399 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-18 09:58:23,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3832730.0, ans=0.09899494936611666 2024-08-18 09:58:27,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3832730.0, ans=0.125 2024-08-18 09:58:43,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3832930.0, ans=0.0 2024-08-18 09:58:46,454 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.07 vs. limit=15.0 2024-08-18 09:58:48,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3832930.0, ans=0.125 2024-08-18 09:58:54,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3832930.0, ans=0.0 2024-08-18 09:59:00,324 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.739e+01 2.313e+01 2.531e+01 2.915e+01 1.663e+02, threshold=5.062e+01, percent-clipped=1.0 2024-08-18 09:59:06,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3833030.0, ans=0.0 2024-08-18 09:59:17,827 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-18 09:59:19,068 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-18 09:59:21,583 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 11200, loss[loss=0.1047, beats_loss=0.01115, ecapa_loss=0.0001529, whisper_loss=0.09199, over 17661.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01043, ecapa_loss=0.0001459, whisper_loss=0.09014, over 3864075.77 frames. ], batch size: 71, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:59:45,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3833430.0, ans=0.125 2024-08-18 09:59:45,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3833430.0, ans=0.05 2024-08-18 09:59:47,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3833430.0, ans=0.125 2024-08-18 10:00:03,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3833530.0, ans=0.09899494936611666 2024-08-18 10:00:07,225 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-18 10:00:14,131 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.29 vs. limit=15.0 2024-08-18 10:00:23,228 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 11250, loss[loss=0.1021, beats_loss=0.007891, ecapa_loss=0.0001444, whisper_loss=0.09279, over 14743.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01045, ecapa_loss=0.000145, whisper_loss=0.08978, over 3877317.72 frames. ], batch size: 56, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:00:23,392 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-18 10:00:30,585 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 17 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-18 10:00:33,302 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-18 10:00:42,304 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.10 vs. limit=15.0 2024-08-18 10:00:45,433 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 12 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-18 10:00:52,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3833930.0, ans=0.125 2024-08-18 10:01:04,437 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.304e+01 2.562e+01 2.921e+01 1.559e+02, threshold=5.123e+01, percent-clipped=1.0 2024-08-18 10:01:15,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3834130.0, ans=0.0 2024-08-18 10:01:23,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3834130.0, ans=0.0 2024-08-18 10:01:25,734 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 11300, loss[loss=0.1054, beats_loss=0.01199, ecapa_loss=0.0001506, whisper_loss=0.09193, over 21598.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01036, ecapa_loss=0.0001455, whisper_loss=0.09033, over 3889611.56 frames. ], batch size: 86, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:01:32,189 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 16 from LS+wenet, 9 from Vox, 36 fro AS 2024-08-18 10:01:56,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3834430.0, ans=0.125 2024-08-18 10:02:00,938 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-18 10:02:01,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3834430.0, ans=0.125 2024-08-18 10:02:07,155 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 30 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-18 10:02:15,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3834630.0, ans=0.1 2024-08-18 10:02:28,505 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 11350, loss[loss=0.1117, beats_loss=0.01038, ecapa_loss=0.0001638, whisper_loss=0.09969, over 22062.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0103, ecapa_loss=0.0001463, whisper_loss=0.09113, over 3904365.19 frames. ], batch size: 90, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:02:29,851 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-18 10:02:37,033 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-18 10:02:42,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3834830.0, ans=0.0 2024-08-18 10:02:47,397 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-18 10:03:09,867 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.308e+01 2.551e+01 2.797e+01 2.772e+02, threshold=5.101e+01, percent-clipped=2.0 2024-08-18 10:03:20,838 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 31 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-18 10:03:30,661 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 11400, loss[loss=0.1035, beats_loss=0.009406, ecapa_loss=0.0001639, whisper_loss=0.09247, over 23325.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01037, ecapa_loss=0.0001456, whisper_loss=0.09067, over 3869293.55 frames. ], batch size: 94, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:03:35,471 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 25 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-18 10:03:36,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3835230.0, ans=0.2 2024-08-18 10:03:41,710 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-18 10:03:51,044 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.92 vs. limit=12.0 2024-08-18 10:03:52,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3835330.0, ans=0.0 2024-08-18 10:03:58,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3835430.0, ans=0.125 2024-08-18 10:03:59,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3835430.0, ans=0.125 2024-08-18 10:04:17,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3835530.0, ans=0.125 2024-08-18 10:04:18,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3835630.0, ans=0.125 2024-08-18 10:04:28,528 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.29 vs. limit=15.0 2024-08-18 10:04:32,502 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 11450, loss[loss=0.09549, beats_loss=0.01233, ecapa_loss=0.0001556, whisper_loss=0.0816, over 20302.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0104, ecapa_loss=0.0001449, whisper_loss=0.09084, over 3866620.17 frames. ], batch size: 88, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:04:38,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3835730.0, ans=0.0 2024-08-18 10:04:53,485 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 31 from Vox, 27 fro AS 2024-08-18 10:05:06,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3835930.0, ans=0.125 2024-08-18 10:05:13,161 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.297e+01 2.481e+01 2.767e+01 4.172e+01, threshold=4.962e+01, percent-clipped=0.0 2024-08-18 10:05:17,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3836030.0, ans=0.125 2024-08-18 10:05:20,242 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.82 vs. limit=15.0 2024-08-18 10:05:31,746 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 24 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-18 10:05:33,521 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.74 vs. limit=12.0 2024-08-18 10:05:33,990 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 11500, loss[loss=0.101, beats_loss=0.01081, ecapa_loss=0.0001502, whisper_loss=0.08867, over 19296.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01041, ecapa_loss=0.0001449, whisper_loss=0.09062, over 3847520.65 frames. ], batch size: 78, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:05:50,130 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.08 vs. limit=15.0 2024-08-18 10:05:53,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3836330.0, ans=0.1 2024-08-18 10:05:56,069 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-18 10:05:59,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3836430.0, ans=0.2 2024-08-18 10:06:02,427 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 21 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-18 10:06:21,434 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.92 vs. limit=12.0 2024-08-18 10:06:23,348 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-18 10:06:39,907 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 11550, loss[loss=0.1034, beats_loss=0.008466, ecapa_loss=0.0001483, whisper_loss=0.09346, over 15557.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01034, ecapa_loss=0.0001459, whisper_loss=0.09107, over 3857828.83 frames. ], batch size: 59, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:06:52,402 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.70 vs. limit=12.0 2024-08-18 10:07:00,519 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3836830.0, ans=0.125 2024-08-18 10:07:26,132 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.646e+01 2.337e+01 2.538e+01 2.781e+01 3.732e+01, threshold=5.076e+01, percent-clipped=0.0 2024-08-18 10:07:40,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3837130.0, ans=0.95 2024-08-18 10:07:52,150 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 11600, loss[loss=0.104, beats_loss=0.0108, ecapa_loss=0.0001126, whisper_loss=0.09204, over 16670.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01028, ecapa_loss=0.0001465, whisper_loss=0.09167, over 3865673.59 frames. ], batch size: 62, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:08:08,959 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 15 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-18 10:08:25,788 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 13 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-18 10:08:26,296 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.80 vs. limit=15.0 2024-08-18 10:08:48,013 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 17 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-18 10:09:06,943 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 11650, loss[loss=0.1119, beats_loss=0.009066, ecapa_loss=0.0001641, whisper_loss=0.1012, over 14401.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01035, ecapa_loss=0.0001457, whisper_loss=0.09144, over 3854646.88 frames. ], batch size: 57, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:09:08,570 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-18 10:09:16,729 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.67 vs. limit=12.0 2024-08-18 10:09:34,726 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-18 10:09:34,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3837830.0, ans=0.0 2024-08-18 10:09:36,317 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 21 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-18 10:09:39,174 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-18 10:09:40,733 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-18 10:09:56,752 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.353e+01 2.626e+01 3.025e+01 7.544e+01, threshold=5.251e+01, percent-clipped=1.0 2024-08-18 10:09:57,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3838030.0, ans=0.125 2024-08-18 10:10:14,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3838130.0, ans=0.0 2024-08-18 10:10:15,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3838130.0, ans=0.07 2024-08-18 10:10:21,908 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 11700, loss[loss=0.1171, beats_loss=0.01051, ecapa_loss=0.0001516, whisper_loss=0.105, over 21717.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01039, ecapa_loss=0.0001457, whisper_loss=0.09159, over 3881975.07 frames. ], batch size: 84, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:10:35,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3838330.0, ans=0.0 2024-08-18 10:10:43,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3838330.0, ans=0.0 2024-08-18 10:11:01,937 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 35 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-18 10:11:02,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3838430.0, ans=0.0 2024-08-18 10:11:19,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3838630.0, ans=0.0 2024-08-18 10:11:23,645 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 26 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-18 10:11:28,333 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-18 10:11:31,497 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-18 10:11:32,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3838630.0, ans=0.125 2024-08-18 10:11:34,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3838730.0, ans=0.125 2024-08-18 10:11:35,208 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 11750, loss[loss=0.09268, beats_loss=0.01215, ecapa_loss=0.0001252, whisper_loss=0.07929, over 22088.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01048, ecapa_loss=0.0001441, whisper_loss=0.09135, over 3910687.51 frames. ], batch size: 89, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:11:35,936 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.69 vs. limit=15.0 2024-08-18 10:11:37,174 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-18 10:11:38,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3838730.0, ans=0.125 2024-08-18 10:11:43,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3838730.0, ans=0.07 2024-08-18 10:11:58,265 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 24 from LS+wenet, 15 from Vox, 15 fro AS 2024-08-18 10:12:09,822 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-18 10:12:24,819 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.258e+01 2.471e+01 2.735e+01 3.598e+01, threshold=4.941e+01, percent-clipped=0.0 2024-08-18 10:12:34,092 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 37 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-18 10:12:42,354 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-18 10:12:51,153 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 11800, loss[loss=0.1119, beats_loss=0.01013, ecapa_loss=0.0001418, whisper_loss=0.1003, over 21479.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01047, ecapa_loss=0.0001441, whisper_loss=0.09158, over 3922557.69 frames. ], batch size: 88, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:12:59,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=3839230.0, ans=0.05 2024-08-18 10:13:00,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3839230.0, ans=0.125 2024-08-18 10:13:14,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3839330.0, ans=0.0 2024-08-18 10:13:17,307 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-18 10:13:33,185 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.00 vs. limit=15.0 2024-08-18 10:13:37,468 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 11 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-18 10:13:47,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3839630.0, ans=0.05 2024-08-18 10:13:48,918 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3839630.0, ans=0.1 2024-08-18 10:14:04,417 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 11850, loss[loss=0.08323, beats_loss=0.01335, ecapa_loss=0.0001124, whisper_loss=0.06876, over 17100.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01049, ecapa_loss=0.0001438, whisper_loss=0.09105, over 3905165.17 frames. ], batch size: 66, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:14:04,613 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 23 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-18 10:14:06,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3839730.0, ans=0.0 2024-08-18 10:14:21,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3839830.0, ans=0.95 2024-08-18 10:14:42,710 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-384000.pt 2024-08-18 10:14:52,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3840030.0, ans=0.125 2024-08-18 10:14:56,041 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.711e+01 2.290e+01 2.585e+01 2.979e+01 4.833e+01, threshold=5.171e+01, percent-clipped=0.0 2024-08-18 10:15:05,848 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-18 10:15:19,709 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 11900, loss[loss=0.1227, beats_loss=0.007482, ecapa_loss=0.0001781, whisper_loss=0.1135, over 18095.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01047, ecapa_loss=0.0001445, whisper_loss=0.09178, over 3898155.88 frames. ], batch size: 73, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:15:22,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3840230.0, ans=0.2 2024-08-18 10:15:24,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3840230.0, ans=0.0 2024-08-18 10:15:28,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3840230.0, ans=0.025 2024-08-18 10:15:34,097 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-18 10:15:41,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3840330.0, ans=0.125 2024-08-18 10:15:49,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3840430.0, ans=0.0 2024-08-18 10:15:50,649 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-18 10:16:07,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3840530.0, ans=0.2 2024-08-18 10:16:13,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3840530.0, ans=0.2 2024-08-18 10:16:20,215 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 18 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-18 10:16:30,002 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 11950, loss[loss=0.1079, beats_loss=0.008379, ecapa_loss=0.0001426, whisper_loss=0.09811, over 14628.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01039, ecapa_loss=0.0001463, whisper_loss=0.09155, over 3899743.44 frames. ], batch size: 57, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:16:33,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3840730.0, ans=0.04949747468305833 2024-08-18 10:16:47,227 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-18 10:16:53,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3840830.0, ans=0.125 2024-08-18 10:17:05,331 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.99 vs. limit=15.0 2024-08-18 10:17:15,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3841030.0, ans=0.125 2024-08-18 10:17:20,861 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.256e+01 2.516e+01 2.795e+01 5.453e+01, threshold=5.033e+01, percent-clipped=1.0 2024-08-18 10:17:22,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3841030.0, ans=0.0 2024-08-18 10:17:28,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3841130.0, ans=0.0 2024-08-18 10:17:30,407 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=14.80 vs. limit=15.0 2024-08-18 10:17:45,302 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 12000, loss[loss=0.1067, beats_loss=0.007855, ecapa_loss=0.0001651, whisper_loss=0.09719, over 17892.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01039, ecapa_loss=0.0001466, whisper_loss=0.09102, over 3870085.31 frames. ], batch size: 67, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:17:45,303 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-18 10:18:21,599 INFO [train_multi_KD3.py:1149] (0/4) Epoch 26, validation on ASR_libri: loss=0.2531, beats_loss=0, ecapa_loss=0.0005311, whisper_loss=0.2478, over 922467.00 frames. 2024-08-18 10:18:40,344 INFO [train_multi_KD3.py:1149] (0/4) Epoch 26, validation on SV_voxceleb1: loss=0.004077, beats_loss=0, ecapa_loss=0.0004077, whisper_loss=0, over 939242.00 frames. 2024-08-18 10:20:17,321 INFO [train_multi_KD3.py:1149] (0/4) Epoch 26, validation on AT_audioset: loss=0.02316, beats_loss=0.02316, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 10:20:17,325 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32389MB 2024-08-18 10:20:18,800 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-18 10:21:30,109 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 12050, loss[loss=0.1026, beats_loss=0.01015, ecapa_loss=0.000116, whisper_loss=0.09128, over 15360.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01048, ecapa_loss=0.0001445, whisper_loss=0.0901, over 3837287.27 frames. ], batch size: 59, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:21:37,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3841730.0, ans=0.125 2024-08-18 10:21:56,335 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.86 vs. limit=12.0 2024-08-18 10:21:59,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3841830.0, ans=0.125 2024-08-18 10:21:59,933 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.37 vs. limit=6.0 2024-08-18 10:22:05,559 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 36 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-18 10:22:05,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3841930.0, ans=0.05 2024-08-18 10:22:21,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3842030.0, ans=0.125 2024-08-18 10:22:24,312 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.275e+01 2.552e+01 2.886e+01 4.482e+01, threshold=5.104e+01, percent-clipped=0.0 2024-08-18 10:22:30,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3842030.0, ans=0.125 2024-08-18 10:22:33,226 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=18.36 vs. limit=15.0 2024-08-18 10:22:38,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3842130.0, ans=0.0 2024-08-18 10:22:40,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3842130.0, ans=0.2 2024-08-18 10:22:44,544 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-18 10:22:48,989 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 12100, loss[loss=0.09291, beats_loss=0.01246, ecapa_loss=0.0001225, whisper_loss=0.07923, over 23327.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01046, ecapa_loss=0.0001449, whisper_loss=0.09056, over 3865761.56 frames. ], batch size: 92, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:22:51,998 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.15 vs. limit=15.0 2024-08-18 10:23:10,587 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-18 10:23:17,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3842330.0, ans=0.125 2024-08-18 10:23:21,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3842430.0, ans=0.0 2024-08-18 10:23:22,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3842430.0, ans=0.1 2024-08-18 10:23:29,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3842430.0, ans=0.125 2024-08-18 10:23:45,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3842530.0, ans=0.0 2024-08-18 10:23:52,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3842630.0, ans=0.1 2024-08-18 10:23:57,369 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.86 vs. limit=15.0 2024-08-18 10:24:03,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3842730.0, ans=0.125 2024-08-18 10:24:04,678 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 12150, loss[loss=0.08583, beats_loss=0.01185, ecapa_loss=0.0001706, whisper_loss=0.07228, over 20580.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01055, ecapa_loss=0.0001452, whisper_loss=0.08968, over 3873121.30 frames. ], batch size: 90, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:24:15,527 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.42 vs. limit=15.0 2024-08-18 10:24:21,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3842830.0, ans=0.04949747468305833 2024-08-18 10:24:21,275 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.35 vs. limit=15.0 2024-08-18 10:24:23,034 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.06 vs. limit=22.5 2024-08-18 10:24:31,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3842830.0, ans=0.04949747468305833 2024-08-18 10:24:36,988 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-18 10:24:37,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3842930.0, ans=0.2 2024-08-18 10:24:39,657 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 17 from Vox, 50 fro AS 2024-08-18 10:24:43,249 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.77 vs. limit=15.0 2024-08-18 10:24:54,706 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.341e+01 2.544e+01 2.867e+01 3.722e+01, threshold=5.088e+01, percent-clipped=0.0 2024-08-18 10:25:06,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3843130.0, ans=0.125 2024-08-18 10:25:10,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3843130.0, ans=0.125 2024-08-18 10:25:12,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3843130.0, ans=0.125 2024-08-18 10:25:13,915 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 21 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-18 10:25:19,765 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 12200, loss[loss=0.08765, beats_loss=0.01309, ecapa_loss=0.0001034, whisper_loss=0.07352, over 20890.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0106, ecapa_loss=0.0001443, whisper_loss=0.08923, over 3877223.46 frames. ], batch size: 79, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:25:31,347 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 37 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-18 10:25:33,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3843230.0, ans=0.1 2024-08-18 10:25:49,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3843330.0, ans=0.125 2024-08-18 10:26:13,025 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 15 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-18 10:26:19,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3843530.0, ans=0.125 2024-08-18 10:26:28,761 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 27 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-18 10:26:36,339 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 15 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-18 10:26:41,635 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-18 10:26:42,593 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 12250, loss[loss=0.1081, beats_loss=0.01222, ecapa_loss=0.0001304, whisper_loss=0.09461, over 22294.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01055, ecapa_loss=0.000144, whisper_loss=0.09003, over 3902054.02 frames. ], batch size: 89, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:26:43,223 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 15 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-18 10:26:50,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3843730.0, ans=0.05 2024-08-18 10:26:52,042 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-18 10:26:54,764 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-18 10:26:55,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3843730.0, ans=0.2 2024-08-18 10:27:02,749 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-18 10:27:11,221 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 17 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-18 10:27:35,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3844030.0, ans=0.125 2024-08-18 10:27:36,177 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.791e+01 2.268e+01 2.517e+01 2.839e+01 6.711e+01, threshold=5.035e+01, percent-clipped=1.0 2024-08-18 10:27:45,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=3844130.0, ans=22.5 2024-08-18 10:28:01,906 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 12300, loss[loss=0.08359, beats_loss=0.01423, ecapa_loss=0.0001359, whisper_loss=0.068, over 17928.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01055, ecapa_loss=0.0001445, whisper_loss=0.08998, over 3865414.89 frames. ], batch size: 75, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:28:05,465 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.45 vs. limit=15.0 2024-08-18 10:28:26,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3844330.0, ans=0.2 2024-08-18 10:28:39,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3844430.0, ans=0.125 2024-08-18 10:28:42,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3844430.0, ans=0.0 2024-08-18 10:29:20,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3844630.0, ans=0.125 2024-08-18 10:29:23,532 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 12350, loss[loss=0.1263, beats_loss=0.008153, ecapa_loss=0.0001331, whisper_loss=0.1168, over 18301.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01054, ecapa_loss=0.000144, whisper_loss=0.09047, over 3852468.39 frames. ], batch size: 70, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:29:24,467 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 19 from LS+wenet, 6 from Vox, 28 fro AS 2024-08-18 10:29:33,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3844730.0, ans=0.0 2024-08-18 10:29:37,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3844730.0, ans=0.125 2024-08-18 10:29:39,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3844830.0, ans=0.0 2024-08-18 10:29:49,690 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 27 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-18 10:29:55,227 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 22 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-18 10:30:00,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3844930.0, ans=0.0 2024-08-18 10:30:05,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3844930.0, ans=0.125 2024-08-18 10:30:18,886 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 20 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-18 10:30:19,589 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.00 vs. limit=15.0 2024-08-18 10:30:19,974 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.330e+01 2.553e+01 2.743e+01 3.939e+01, threshold=5.106e+01, percent-clipped=0.0 2024-08-18 10:30:41,353 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 32 from Vox, 36 fro AS 2024-08-18 10:30:41,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3845130.0, ans=0.1 2024-08-18 10:30:46,894 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 12400, loss[loss=0.09427, beats_loss=0.01207, ecapa_loss=0.0001288, whisper_loss=0.08091, over 22872.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0105, ecapa_loss=0.0001444, whisper_loss=0.08962, over 3852729.10 frames. ], batch size: 93, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:30:51,607 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-18 10:30:54,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3845230.0, ans=0.125 2024-08-18 10:31:25,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3845430.0, ans=0.125 2024-08-18 10:31:26,472 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 29 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-18 10:31:35,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3845530.0, ans=0.125 2024-08-18 10:31:40,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.whiten.whitening_limit, batch_count=3845530.0, ans=12.0 2024-08-18 10:31:44,329 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 22 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-18 10:31:50,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3845630.0, ans=0.125 2024-08-18 10:31:54,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3845630.0, ans=0.0 2024-08-18 10:32:05,936 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 21 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-18 10:32:07,471 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 12450, loss[loss=0.1005, beats_loss=0.009615, ecapa_loss=0.0001313, whisper_loss=0.08958, over 15555.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01049, ecapa_loss=0.000144, whisper_loss=0.09003, over 3861135.58 frames. ], batch size: 61, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:32:15,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3845730.0, ans=0.125 2024-08-18 10:32:19,128 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.75 vs. limit=10.0 2024-08-18 10:32:25,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3845830.0, ans=0.125 2024-08-18 10:32:48,949 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.77 vs. limit=15.0 2024-08-18 10:32:54,983 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 21 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-18 10:33:04,944 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.360e+01 2.669e+01 3.014e+01 4.345e+01, threshold=5.338e+01, percent-clipped=0.0 2024-08-18 10:33:05,717 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.03 vs. limit=15.0 2024-08-18 10:33:10,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3846030.0, ans=0.125 2024-08-18 10:33:18,470 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.75 vs. limit=12.0 2024-08-18 10:33:31,216 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 12500, loss[loss=0.1204, beats_loss=0.009115, ecapa_loss=0.0001473, whisper_loss=0.1098, over 21932.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01043, ecapa_loss=0.0001437, whisper_loss=0.09017, over 3899283.85 frames. ], batch size: 87, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:33:49,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3846330.0, ans=0.0 2024-08-18 10:33:55,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3846330.0, ans=0.1 2024-08-18 10:34:16,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3846530.0, ans=0.125 2024-08-18 10:34:19,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3846530.0, ans=0.0 2024-08-18 10:34:25,555 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.20 vs. limit=15.0 2024-08-18 10:34:33,196 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-18 10:34:33,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3846630.0, ans=0.125 2024-08-18 10:34:46,845 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 12550, loss[loss=0.1119, beats_loss=0.008481, ecapa_loss=0.0001551, whisper_loss=0.1019, over 15831.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01048, ecapa_loss=0.0001433, whisper_loss=0.08972, over 3904642.36 frames. ], batch size: 61, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:34:59,178 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3846730.0, ans=0.1 2024-08-18 10:35:05,835 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 20 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-18 10:35:09,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3846830.0, ans=0.125 2024-08-18 10:35:11,031 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.92 vs. limit=15.0 2024-08-18 10:35:18,477 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.50 vs. limit=12.0 2024-08-18 10:35:26,641 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 35 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-18 10:35:41,240 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.783e+01 2.372e+01 2.623e+01 3.122e+01 3.895e+01, threshold=5.245e+01, percent-clipped=0.0 2024-08-18 10:35:57,693 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-18 10:35:57,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3847130.0, ans=0.0 2024-08-18 10:36:05,619 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 12600, loss[loss=0.1063, beats_loss=0.01265, ecapa_loss=0.0001219, whisper_loss=0.09244, over 23054.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0106, ecapa_loss=0.0001425, whisper_loss=0.08943, over 3915987.91 frames. ], batch size: 94, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:36:23,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3847330.0, ans=0.0 2024-08-18 10:37:11,781 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 24 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-18 10:37:27,602 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 12650, loss[loss=0.1153, beats_loss=0.01095, ecapa_loss=0.0001293, whisper_loss=0.103, over 22384.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01063, ecapa_loss=0.0001424, whisper_loss=0.08983, over 3903359.80 frames. ], batch size: 91, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:37:29,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3847730.0, ans=0.1 2024-08-18 10:37:42,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3847830.0, ans=0.125 2024-08-18 10:38:07,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3847930.0, ans=0.0 2024-08-18 10:38:20,865 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.252e+01 2.537e+01 2.887e+01 4.368e+01, threshold=5.074e+01, percent-clipped=0.0 2024-08-18 10:38:23,000 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 10:38:48,076 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 12700, loss[loss=0.09933, beats_loss=0.01136, ecapa_loss=0.0001699, whisper_loss=0.08627, over 22057.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01059, ecapa_loss=0.0001433, whisper_loss=0.09031, over 3886714.84 frames. ], batch size: 93, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:38:48,831 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.58 vs. limit=6.0 2024-08-18 10:39:30,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3848430.0, ans=0.0 2024-08-18 10:39:43,168 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-18 10:40:09,658 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 12750, loss[loss=0.09573, beats_loss=0.0127, ecapa_loss=0.0001134, whisper_loss=0.0819, over 20710.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01067, ecapa_loss=0.0001441, whisper_loss=0.08973, over 3908368.59 frames. ], batch size: 82, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:40:16,225 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 28 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-18 10:40:23,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3848730.0, ans=0.125 2024-08-18 10:40:28,109 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 22 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-18 10:40:37,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3848830.0, ans=0.125 2024-08-18 10:40:49,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3848930.0, ans=0.0 2024-08-18 10:41:04,852 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.297e+01 2.524e+01 2.809e+01 4.213e+01, threshold=5.048e+01, percent-clipped=0.0 2024-08-18 10:41:28,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3849230.0, ans=0.0 2024-08-18 10:41:29,563 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 12800, loss[loss=0.1223, beats_loss=0.00832, ecapa_loss=0.0001197, whisper_loss=0.1128, over 16818.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01061, ecapa_loss=0.0001455, whisper_loss=0.09009, over 3907579.92 frames. ], batch size: 63, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:41:34,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3849230.0, ans=0.125 2024-08-18 10:41:41,917 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 19 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-18 10:41:43,206 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-18 10:41:43,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3849230.0, ans=0.125 2024-08-18 10:42:08,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3849430.0, ans=0.125 2024-08-18 10:42:15,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3849430.0, ans=0.125 2024-08-18 10:42:43,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3849630.0, ans=0.2 2024-08-18 10:42:49,784 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 12850, loss[loss=0.09307, beats_loss=0.009882, ecapa_loss=0.0001494, whisper_loss=0.08169, over 19270.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01069, ecapa_loss=0.0001451, whisper_loss=0.08964, over 3890672.53 frames. ], batch size: 77, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:42:56,001 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.32 vs. limit=10.0 2024-08-18 10:43:22,529 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-18 10:43:39,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3850030.0, ans=0.125 2024-08-18 10:43:41,725 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.714e+01 2.317e+01 2.573e+01 2.906e+01 6.087e+01, threshold=5.147e+01, percent-clipped=1.0 2024-08-18 10:43:50,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3850130.0, ans=0.1 2024-08-18 10:44:04,124 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 12900, loss[loss=0.08232, beats_loss=0.0131, ecapa_loss=0.0001269, whisper_loss=0.06795, over 21487.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01064, ecapa_loss=0.0001451, whisper_loss=0.08973, over 3886142.93 frames. ], batch size: 87, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:44:04,285 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 28 from LS+wenet, 27 from Vox, 23 fro AS 2024-08-18 10:44:14,742 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 33 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-18 10:45:10,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3850630.0, ans=0.0 2024-08-18 10:45:14,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3850630.0, ans=0.0 2024-08-18 10:45:23,095 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 12950, loss[loss=0.1089, beats_loss=0.009626, ecapa_loss=0.000171, whisper_loss=0.09754, over 19566.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01064, ecapa_loss=0.0001441, whisper_loss=0.09004, over 3887350.57 frames. ], batch size: 80, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:45:31,486 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 17 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-18 10:45:44,650 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.41 vs. limit=22.5 2024-08-18 10:46:02,708 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-18 10:46:07,930 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 25 from LS+wenet, 16 from Vox, 13 fro AS 2024-08-18 10:46:08,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3851030.0, ans=0.125 2024-08-18 10:46:09,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3851030.0, ans=0.0 2024-08-18 10:46:12,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3851030.0, ans=0.125 2024-08-18 10:46:12,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3851030.0, ans=0.0 2024-08-18 10:46:13,011 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.206e+01 2.482e+01 2.888e+01 5.100e+01, threshold=4.963e+01, percent-clipped=0.0 2024-08-18 10:46:19,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3851030.0, ans=0.0 2024-08-18 10:46:20,548 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-18 10:46:26,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3851130.0, ans=0.2 2024-08-18 10:46:35,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3851230.0, ans=0.125 2024-08-18 10:46:36,862 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 13000, loss[loss=0.09869, beats_loss=0.0112, ecapa_loss=0.0001314, whisper_loss=0.08618, over 17981.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01064, ecapa_loss=0.0001443, whisper_loss=0.08992, over 3902355.51 frames. ], batch size: 72, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:46:37,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3851230.0, ans=0.2 2024-08-18 10:46:38,760 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3851230.0, ans=0.125 2024-08-18 10:46:57,481 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-18 10:47:10,665 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 19 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-18 10:47:12,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3851430.0, ans=0.0 2024-08-18 10:47:18,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3851430.0, ans=0.0 2024-08-18 10:47:21,141 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.029e+01 2024-08-18 10:47:24,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3851530.0, ans=0.2 2024-08-18 10:47:27,521 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-18 10:47:31,120 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.41 vs. limit=12.0 2024-08-18 10:47:44,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3851630.0, ans=0.125 2024-08-18 10:47:48,918 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 20 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-18 10:47:50,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3851630.0, ans=0.0 2024-08-18 10:47:53,381 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 13050, loss[loss=0.1056, beats_loss=0.009886, ecapa_loss=0.0001396, whisper_loss=0.09432, over 20496.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01061, ecapa_loss=0.0001446, whisper_loss=0.0902, over 3887602.41 frames. ], batch size: 80, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:47:53,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3851730.0, ans=0.125 2024-08-18 10:48:03,121 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.62 vs. limit=15.0 2024-08-18 10:48:07,421 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 24 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-18 10:48:47,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3852030.0, ans=0.125 2024-08-18 10:48:48,286 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.555e+01 2.212e+01 2.477e+01 2.743e+01 3.903e+01, threshold=4.954e+01, percent-clipped=0.0 2024-08-18 10:48:50,531 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 27 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-18 10:48:54,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3852030.0, ans=0.1 2024-08-18 10:48:59,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3852130.0, ans=0.125 2024-08-18 10:49:15,856 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 13100, loss[loss=0.08315, beats_loss=0.01078, ecapa_loss=0.0001321, whisper_loss=0.07105, over 17376.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0106, ecapa_loss=0.0001444, whisper_loss=0.08987, over 3878012.92 frames. ], batch size: 70, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:49:22,927 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.10 vs. limit=6.0 2024-08-18 10:49:26,813 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 21 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-18 10:50:03,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3852530.0, ans=0.0 2024-08-18 10:50:05,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3852530.0, ans=0.125 2024-08-18 10:50:28,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3852630.0, ans=0.05 2024-08-18 10:50:29,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3852730.0, ans=10.0 2024-08-18 10:50:30,864 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 13150, loss[loss=0.1048, beats_loss=0.008548, ecapa_loss=0.0001402, whisper_loss=0.09484, over 17994.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01059, ecapa_loss=0.000143, whisper_loss=0.08999, over 3863117.73 frames. ], batch size: 71, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:50:36,341 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 8 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-18 10:50:38,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3852730.0, ans=0.0 2024-08-18 10:50:44,016 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 19 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-18 10:50:50,724 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 19 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-18 10:51:12,538 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.48 vs. limit=15.0 2024-08-18 10:51:15,591 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.22 vs. limit=12.0 2024-08-18 10:51:19,062 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.841e+01 2.362e+01 2.528e+01 2.819e+01 1.566e+02, threshold=5.056e+01, percent-clipped=2.0 2024-08-18 10:51:29,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3853130.0, ans=0.0 2024-08-18 10:51:32,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3853130.0, ans=0.2 2024-08-18 10:51:37,920 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 21 from LS+wenet, 23 from Vox, 49 fro AS 2024-08-18 10:51:42,065 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 13200, loss[loss=0.1078, beats_loss=0.009844, ecapa_loss=0.00014, whisper_loss=0.09657, over 20012.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01066, ecapa_loss=0.0001424, whisper_loss=0.08916, over 3839672.37 frames. ], batch size: 80, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:51:45,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3853230.0, ans=0.125 2024-08-18 10:52:07,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3853330.0, ans=0.125 2024-08-18 10:52:17,074 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.99 vs. limit=22.5 2024-08-18 10:52:50,703 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.18 vs. limit=15.0 2024-08-18 10:52:55,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3853630.0, ans=0.125 2024-08-18 10:52:59,116 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 13250, loss[loss=0.1174, beats_loss=0.01218, ecapa_loss=0.0001019, whisper_loss=0.1042, over 24241.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01066, ecapa_loss=0.0001432, whisper_loss=0.08971, over 3869542.81 frames. ], batch size: 92, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:53:38,975 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-18 10:53:39,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3853930.0, ans=0.0 2024-08-18 10:53:40,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3854030.0, ans=0.125 2024-08-18 10:53:43,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3854030.0, ans=0.0 2024-08-18 10:53:47,023 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.315e+01 2.671e+01 3.107e+01 9.539e+01, threshold=5.342e+01, percent-clipped=1.0 2024-08-18 10:53:47,451 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 10:53:47,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3854030.0, ans=0.1 2024-08-18 10:53:47,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3854030.0, ans=0.1 2024-08-18 10:54:09,446 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 13300, loss[loss=0.1436, beats_loss=0.007165, ecapa_loss=0.0001481, whisper_loss=0.1349, over 23700.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01063, ecapa_loss=0.000143, whisper_loss=0.08937, over 3865331.42 frames. ], batch size: 89, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:54:14,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3854230.0, ans=0.125 2024-08-18 10:54:16,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3854230.0, ans=0.125 2024-08-18 10:54:16,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3854230.0, ans=0.1 2024-08-18 10:54:28,909 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 17 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-18 10:54:43,182 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-18 10:54:48,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3854430.0, ans=0.0 2024-08-18 10:54:50,921 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-18 10:54:53,713 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 22 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-18 10:55:19,830 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 13350, loss[loss=0.09194, beats_loss=0.01275, ecapa_loss=0.0001561, whisper_loss=0.07763, over 22290.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01062, ecapa_loss=0.0001426, whisper_loss=0.08992, over 3903451.39 frames. ], batch size: 91, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:55:29,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3854730.0, ans=0.0 2024-08-18 10:55:39,810 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-18 10:55:41,584 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 21 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-18 10:55:53,467 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.39 vs. limit=15.0 2024-08-18 10:55:55,311 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 24 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-18 10:56:05,991 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.282e+01 2.499e+01 2.710e+01 4.906e+01, threshold=4.999e+01, percent-clipped=0.0 2024-08-18 10:56:14,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3855130.0, ans=0.0 2024-08-18 10:56:19,957 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 32 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-18 10:56:28,233 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 13400, loss[loss=0.1034, beats_loss=0.01085, ecapa_loss=0.0001253, whisper_loss=0.09132, over 18304.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0106, ecapa_loss=0.0001428, whisper_loss=0.09015, over 3896540.04 frames. ], batch size: 70, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:56:30,308 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.42 vs. limit=15.0 2024-08-18 10:56:47,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3855330.0, ans=0.0 2024-08-18 10:56:57,987 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 27 from LS+wenet, 12 from Vox, 41 fro AS 2024-08-18 10:57:01,838 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 27 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-18 10:57:05,030 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 25 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-18 10:57:16,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3855530.0, ans=0.07 2024-08-18 10:57:18,068 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-18 10:57:21,928 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-18 10:57:23,818 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.78 vs. limit=15.0 2024-08-18 10:57:34,854 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 10:57:36,965 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 13450, loss[loss=0.1128, beats_loss=0.009407, ecapa_loss=0.0001595, whisper_loss=0.1018, over 18449.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01057, ecapa_loss=0.0001425, whisper_loss=0.08974, over 3857773.95 frames. ], batch size: 69, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:57:40,345 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.44 vs. limit=15.0 2024-08-18 10:57:44,753 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-18 10:57:54,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=3855830.0, ans=0.02 2024-08-18 10:58:19,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3856030.0, ans=0.1 2024-08-18 10:58:20,732 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.386e+01 2.619e+01 2.833e+01 4.407e+01, threshold=5.238e+01, percent-clipped=0.0 2024-08-18 10:58:20,896 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-18 10:58:22,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3856030.0, ans=0.07 2024-08-18 10:58:40,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3856230.0, ans=0.0 2024-08-18 10:58:41,441 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 13500, loss[loss=0.08391, beats_loss=0.01155, ecapa_loss=0.0001629, whisper_loss=0.07073, over 17515.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01052, ecapa_loss=0.0001439, whisper_loss=0.08964, over 3853649.72 frames. ], batch size: 71, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:58:54,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3856330.0, ans=0.0 2024-08-18 10:58:54,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3856330.0, ans=0.125 2024-08-18 10:59:02,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3856330.0, ans=10.0 2024-08-18 10:59:14,182 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 19 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-18 10:59:30,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3856530.0, ans=0.125 2024-08-18 10:59:31,809 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.42 vs. limit=15.0 2024-08-18 10:59:36,204 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 26 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-18 10:59:36,452 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3856630.0, ans=0.05 2024-08-18 10:59:47,086 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 13550, loss[loss=0.09122, beats_loss=0.01222, ecapa_loss=0.0001272, whisper_loss=0.07772, over 15955.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01053, ecapa_loss=0.0001432, whisper_loss=0.09012, over 3851050.39 frames. ], batch size: 62, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:59:53,666 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-18 10:59:59,133 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-18 11:00:02,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3856830.0, ans=0.125 2024-08-18 11:00:03,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3856830.0, ans=0.125 2024-08-18 11:00:16,821 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 26 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-18 11:00:18,032 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 14 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-18 11:00:25,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3857030.0, ans=0.125 2024-08-18 11:00:30,648 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.267e+01 2.460e+01 2.738e+01 4.250e+01, threshold=4.920e+01, percent-clipped=0.0 2024-08-18 11:00:51,473 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 13600, loss[loss=0.1026, beats_loss=0.01222, ecapa_loss=0.0001122, whisper_loss=0.08928, over 19626.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01046, ecapa_loss=0.0001455, whisper_loss=0.09033, over 3846834.88 frames. ], batch size: 74, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:00:59,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3857230.0, ans=0.0 2024-08-18 11:01:05,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3857330.0, ans=0.2 2024-08-18 11:01:12,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3857330.0, ans=0.125 2024-08-18 11:01:14,932 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3857330.0, ans=0.125 2024-08-18 11:01:20,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3857430.0, ans=0.125 2024-08-18 11:01:57,866 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 13650, loss[loss=0.102, beats_loss=0.01042, ecapa_loss=0.0001683, whisper_loss=0.08986, over 17978.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01054, ecapa_loss=0.0001453, whisper_loss=0.09068, over 3875363.65 frames. ], batch size: 75, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:02:04,647 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 22 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-18 11:02:15,570 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-18 11:02:18,556 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.30 vs. limit=15.0 2024-08-18 11:02:32,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3857930.0, ans=0.0 2024-08-18 11:02:39,912 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.71 vs. limit=12.0 2024-08-18 11:02:43,483 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.717e+01 2.270e+01 2.557e+01 2.779e+01 4.099e+01, threshold=5.114e+01, percent-clipped=0.0 2024-08-18 11:02:57,733 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-18 11:03:01,128 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.87 vs. limit=15.0 2024-08-18 11:03:03,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3858130.0, ans=0.1 2024-08-18 11:03:05,600 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 13700, loss[loss=0.08198, beats_loss=0.01452, ecapa_loss=0.0001105, whisper_loss=0.06636, over 21390.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01055, ecapa_loss=0.0001451, whisper_loss=0.09004, over 3849995.59 frames. ], batch size: 86, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:03:10,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3858230.0, ans=0.125 2024-08-18 11:03:18,208 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.17 vs. limit=12.0 2024-08-18 11:03:28,803 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 24 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-18 11:03:29,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3858330.0, ans=0.0 2024-08-18 11:03:34,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3858430.0, ans=0.125 2024-08-18 11:03:35,126 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-18 11:03:39,973 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 17 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-18 11:03:44,344 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-18 11:04:07,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3858630.0, ans=0.2 2024-08-18 11:04:15,447 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 13750, loss[loss=0.1212, beats_loss=0.009591, ecapa_loss=0.0001425, whisper_loss=0.1102, over 23750.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01048, ecapa_loss=0.0001459, whisper_loss=0.09054, over 3848675.96 frames. ], batch size: 91, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:04:22,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3858730.0, ans=0.125 2024-08-18 11:04:28,518 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.19 vs. limit=15.0 2024-08-18 11:04:29,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3858830.0, ans=0.0 2024-08-18 11:04:29,875 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.08 vs. limit=22.5 2024-08-18 11:04:35,570 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 25 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-18 11:04:55,239 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 14 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-18 11:04:58,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3859030.0, ans=0.1 2024-08-18 11:05:01,849 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.684e+01 2.313e+01 2.596e+01 3.019e+01 1.808e+02, threshold=5.192e+01, percent-clipped=2.0 2024-08-18 11:05:04,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3859030.0, ans=0.0 2024-08-18 11:05:07,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3859030.0, ans=0.125 2024-08-18 11:05:08,759 WARNING [optim.py:496] (0/4) Scaling gradients by 0.014583197422325611, model_norm_threshold=51.91682815551758 2024-08-18 11:05:08,927 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.15, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.923e+06, grad_sumsq=1.923e+06, orig_rms_sq=1.000e+00 2024-08-18 11:05:09,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3859130.0, ans=10.0 2024-08-18 11:05:12,370 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-18 11:05:15,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3859130.0, ans=0.125 2024-08-18 11:05:26,893 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 13800, loss[loss=0.1142, beats_loss=0.01244, ecapa_loss=0.0001106, whisper_loss=0.1007, over 16786.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0105, ecapa_loss=0.0001455, whisper_loss=0.0905, over 3835009.20 frames. ], batch size: 66, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:05:28,250 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-18 11:05:45,475 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.93 vs. limit=15.0 2024-08-18 11:05:47,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3859330.0, ans=0.1 2024-08-18 11:05:50,991 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 25 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-18 11:05:51,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3859330.0, ans=0.125 2024-08-18 11:05:58,019 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-18 11:05:59,113 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 36 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-18 11:05:59,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3859430.0, ans=0.1 2024-08-18 11:06:04,142 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 15 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-18 11:06:06,972 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 27 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-18 11:06:09,709 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 17 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-18 11:06:10,869 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 33 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-18 11:06:15,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3859530.0, ans=0.2 2024-08-18 11:06:15,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3859530.0, ans=0.2 2024-08-18 11:06:28,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3859630.0, ans=0.1 2024-08-18 11:06:39,816 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 13850, loss[loss=0.1036, beats_loss=0.01256, ecapa_loss=0.000121, whisper_loss=0.08987, over 23972.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0105, ecapa_loss=0.0001438, whisper_loss=0.09131, over 3863036.75 frames. ], batch size: 94, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:06:52,622 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.46 vs. limit=15.0 2024-08-18 11:07:15,578 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.79 vs. limit=15.0 2024-08-18 11:07:15,772 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.00 vs. limit=10.0 2024-08-18 11:07:18,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3859930.0, ans=0.0 2024-08-18 11:07:21,823 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 19 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-18 11:07:24,329 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.13 vs. limit=15.0 2024-08-18 11:07:25,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3859930.0, ans=0.125 2024-08-18 11:07:39,489 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.736e+01 2.397e+01 2.640e+01 3.060e+01 3.560e+03, threshold=5.281e+01, percent-clipped=3.0 2024-08-18 11:08:10,222 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 13900, loss[loss=0.08781, beats_loss=0.008466, ecapa_loss=0.0001137, whisper_loss=0.0782, over 15772.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01048, ecapa_loss=0.000144, whisper_loss=0.09109, over 3864990.33 frames. ], batch size: 57, lr: 2.34e-03, grad_scale: 1.152921504606847e+18 2024-08-18 11:08:27,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3860330.0, ans=0.125 2024-08-18 11:08:41,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3860330.0, ans=0.0 2024-08-18 11:08:49,072 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 28 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-18 11:08:51,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3860430.0, ans=0.125 2024-08-18 11:08:58,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3860430.0, ans=0.0 2024-08-18 11:09:11,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3860530.0, ans=0.0 2024-08-18 11:09:13,764 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.23 vs. limit=15.0 2024-08-18 11:09:45,245 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.09 vs. limit=6.0 2024-08-18 11:09:51,888 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 13950, loss[loss=0.07641, beats_loss=0.01248, ecapa_loss=0.0001127, whisper_loss=0.06281, over 20535.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01043, ecapa_loss=0.0001438, whisper_loss=0.0917, over 3891839.11 frames. ], batch size: 81, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:10:01,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3860730.0, ans=0.125 2024-08-18 11:10:13,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3860830.0, ans=0.125 2024-08-18 11:10:21,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3860830.0, ans=0.125 2024-08-18 11:10:54,833 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.61 vs. limit=15.0 2024-08-18 11:11:08,962 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.328e+01 2.570e+01 2.839e+01 4.020e+01, threshold=5.139e+01, percent-clipped=0.0 2024-08-18 11:11:09,436 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-18 11:11:25,280 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 20 from LS+wenet, 17 from Vox, 57 fro AS 2024-08-18 11:11:25,519 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3861130.0, ans=0.125 2024-08-18 11:11:41,659 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 14000, loss[loss=0.1261, beats_loss=0.008811, ecapa_loss=0.0001376, whisper_loss=0.1159, over 23982.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01057, ecapa_loss=0.0001434, whisper_loss=0.09038, over 3879139.39 frames. ], batch size: 94, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:12:05,485 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.57 vs. limit=15.0 2024-08-18 11:12:22,733 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 25 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-18 11:12:24,898 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.90 vs. limit=12.0 2024-08-18 11:12:43,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3861430.0, ans=0.0 2024-08-18 11:12:45,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3861530.0, ans=0.0 2024-08-18 11:13:04,259 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-18 11:13:06,114 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-18 11:13:06,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3861630.0, ans=0.125 2024-08-18 11:13:15,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3861630.0, ans=0.0 2024-08-18 11:13:28,664 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 14050, loss[loss=0.1157, beats_loss=0.01026, ecapa_loss=0.0001549, whisper_loss=0.1039, over 21616.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01055, ecapa_loss=0.0001428, whisper_loss=0.09098, over 3878075.18 frames. ], batch size: 89, lr: 2.33e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:13:53,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3861830.0, ans=0.2 2024-08-18 11:14:03,463 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 15 from Vox, 47 fro AS 2024-08-18 11:14:12,907 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.18 vs. limit=10.0 2024-08-18 11:14:25,803 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.314e+01 2.539e+01 2.739e+01 6.784e+01, threshold=5.079e+01, percent-clipped=1.0 2024-08-18 11:14:31,841 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2024-08-18 11:14:40,417 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.66 vs. limit=15.0 2024-08-18 11:14:45,962 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 14100, loss[loss=0.107, beats_loss=0.009928, ecapa_loss=0.0001522, whisper_loss=0.09554, over 13589.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01059, ecapa_loss=0.0001438, whisper_loss=0.09019, over 3861182.92 frames. ], batch size: 55, lr: 2.33e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:14:49,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3862230.0, ans=0.2 2024-08-18 11:14:56,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3862230.0, ans=0.125 2024-08-18 11:14:57,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3862230.0, ans=0.125 2024-08-18 11:15:01,402 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2024-08-18 11:15:25,110 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 27 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-18 11:15:27,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3862530.0, ans=0.125 2024-08-18 11:15:39,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3862530.0, ans=0.125 2024-08-18 11:15:45,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3862630.0, ans=0.125 2024-08-18 11:15:49,292 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-18 11:15:50,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3862630.0, ans=0.125 2024-08-18 11:15:53,734 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 29 from LS+wenet, 32 from Vox, 33 fro AS 2024-08-18 11:15:56,420 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 14150, loss[loss=0.1102, beats_loss=0.0111, ecapa_loss=0.00014, whisper_loss=0.09769, over 19965.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01064, ecapa_loss=0.0001438, whisper_loss=0.09013, over 3847990.65 frames. ], batch size: 79, lr: 2.33e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:15:57,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3862730.0, ans=0.1 2024-08-18 11:15:57,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3862730.0, ans=0.125 2024-08-18 11:16:08,155 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-18 11:16:09,678 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 21 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-18 11:16:19,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3862830.0, ans=0.125 2024-08-18 11:16:30,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3862930.0, ans=0.125 2024-08-18 11:16:37,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3862930.0, ans=0.125 2024-08-18 11:16:42,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3863030.0, ans=0.125 2024-08-18 11:16:47,828 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.320e+01 2.571e+01 2.843e+01 2.344e+02, threshold=5.141e+01, percent-clipped=2.0 2024-08-18 11:16:54,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3863130.0, ans=0.125 2024-08-18 11:17:00,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3863130.0, ans=0.1 2024-08-18 11:17:11,325 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 14200, loss[loss=0.1057, beats_loss=0.008929, ecapa_loss=0.0001402, whisper_loss=0.09535, over 18440.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01071, ecapa_loss=0.0001427, whisper_loss=0.09015, over 3889280.32 frames. ], batch size: 70, lr: 2.33e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:17:13,962 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.84 vs. limit=15.0 2024-08-18 11:17:28,386 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 17 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-18 11:17:30,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3863330.0, ans=0.125 2024-08-18 11:17:57,377 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 25 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-18 11:18:05,019 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 30 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-18 11:18:22,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3863630.0, ans=0.2 2024-08-18 11:18:25,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3863630.0, ans=0.125 2024-08-18 11:18:27,931 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 14250, loss[loss=0.1155, beats_loss=0.008423, ecapa_loss=0.0001328, whisper_loss=0.1058, over 20158.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01071, ecapa_loss=0.0001425, whisper_loss=0.08979, over 3903451.34 frames. ], batch size: 75, lr: 2.33e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:18:28,087 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-18 11:18:49,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3863830.0, ans=0.125 2024-08-18 11:18:56,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3863830.0, ans=0.0 2024-08-18 11:18:57,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3863830.0, ans=0.125 2024-08-18 11:19:06,009 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-18 11:19:23,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3864030.0, ans=0.2 2024-08-18 11:19:23,869 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.369e+01 2.559e+01 2.841e+01 3.701e+01, threshold=5.118e+01, percent-clipped=0.0 2024-08-18 11:19:45,167 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.40 vs. limit=12.0 2024-08-18 11:19:45,666 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 14300, loss[loss=0.09947, beats_loss=0.01218, ecapa_loss=0.0001217, whisper_loss=0.08607, over 23092.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01067, ecapa_loss=0.0001423, whisper_loss=0.0895, over 3911709.06 frames. ], batch size: 93, lr: 2.33e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:19:54,223 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-18 11:20:01,742 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.56 vs. limit=15.0 2024-08-18 11:20:04,669 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.10 vs. limit=15.0 2024-08-18 11:20:09,593 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3864330.0, ans=0.125 2024-08-18 11:20:16,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3864330.0, ans=0.125 2024-08-18 11:20:20,807 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.91 vs. limit=22.5 2024-08-18 11:20:23,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3864430.0, ans=0.0 2024-08-18 11:20:50,482 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 20 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-18 11:20:53,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3864630.0, ans=0.125 2024-08-18 11:21:04,463 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 14350, loss[loss=0.1128, beats_loss=0.007643, ecapa_loss=0.0001655, whisper_loss=0.1035, over 16717.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01059, ecapa_loss=0.000144, whisper_loss=0.08972, over 3895724.95 frames. ], batch size: 67, lr: 2.33e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:21:04,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3864730.0, ans=0.0 2024-08-18 11:21:09,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3864730.0, ans=0.125 2024-08-18 11:21:34,131 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 24 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-18 11:21:37,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3864930.0, ans=0.125 2024-08-18 11:21:56,644 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.72 vs. limit=15.0 2024-08-18 11:21:57,157 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.333e+01 2.599e+01 2.804e+01 6.490e+01, threshold=5.198e+01, percent-clipped=1.0 2024-08-18 11:21:57,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3865030.0, ans=0.05 2024-08-18 11:22:11,083 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 31 from LS+wenet, 11 from Vox, 35 fro AS 2024-08-18 11:22:15,287 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 32 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-18 11:22:18,686 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 14400, loss[loss=0.1185, beats_loss=0.008535, ecapa_loss=0.0001731, whisper_loss=0.1082, over 18834.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01059, ecapa_loss=0.0001448, whisper_loss=0.08955, over 3887143.22 frames. ], batch size: 77, lr: 2.33e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:22:19,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3865230.0, ans=0.2 2024-08-18 11:22:24,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3865230.0, ans=0.125 2024-08-18 11:22:28,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3865230.0, ans=0.1 2024-08-18 11:22:46,278 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 20 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-18 11:22:54,224 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.15 vs. limit=15.0 2024-08-18 11:23:11,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3865530.0, ans=0.04949747468305833 2024-08-18 11:23:31,516 INFO [train_multi_KD3.py:1116] (0/4) Epoch 26, batch 14450, loss[loss=0.103, beats_loss=0.01278, ecapa_loss=0.000114, whisper_loss=0.0891, over 23120.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0106, ecapa_loss=0.0001445, whisper_loss=0.08963, over 3857731.64 frames. ], batch size: 91, lr: 2.33e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:24:15,979 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-18 11:24:19,450 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.265e+01 2.482e+01 2.875e+01 2.050e+02, threshold=4.963e+01, percent-clipped=1.0 2024-08-18 11:24:30,275 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.31 vs. limit=15.0 2024-08-18 11:24:35,798 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/epoch-26.pt 2024-08-18 11:25:14,271 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 0, loss[loss=0.06438, beats_loss=0.01225, ecapa_loss=0.000156, whisper_loss=0.05057, over 14322.00 frames. ], tot_loss[loss=0.06438, beats_loss=0.01225, ecapa_loss=0.000156, whisper_loss=0.05057, over 14322.00 frames. ], batch size: 63, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:25:14,273 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-18 11:25:51,139 INFO [train_multi_KD3.py:1149] (0/4) Epoch 27, validation on ASR_libri: loss=0.2537, beats_loss=0, ecapa_loss=0.0005188, whisper_loss=0.2485, over 922467.00 frames. 2024-08-18 11:26:06,011 INFO [train_multi_KD3.py:1149] (0/4) Epoch 27, validation on SV_voxceleb1: loss=0.004147, beats_loss=0, ecapa_loss=0.0004147, whisper_loss=0, over 939242.00 frames. 2024-08-18 11:27:48,077 INFO [train_multi_KD3.py:1149] (0/4) Epoch 27, validation on AT_audioset: loss=0.0231, beats_loss=0.0231, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 11:27:48,081 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32389MB 2024-08-18 11:28:31,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3866290.0, ans=0.125 2024-08-18 11:28:44,150 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-18 11:28:44,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3866390.0, ans=0.2 2024-08-18 11:29:06,814 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-18 11:29:11,541 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 16 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-18 11:29:24,474 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 19 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-18 11:29:46,758 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 50, loss[loss=0.09667, beats_loss=0.009296, ecapa_loss=0.0001342, whisper_loss=0.08603, over 20629.00 frames. ], tot_loss[loss=0.0962, beats_loss=0.009778, ecapa_loss=0.000155, whisper_loss=0.08487, over 854140.35 frames. ], batch size: 81, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:30:13,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3866790.0, ans=0.1 2024-08-18 11:30:27,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3866790.0, ans=0.05 2024-08-18 11:30:36,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3866890.0, ans=0.0 2024-08-18 11:30:57,940 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.412e+01 2024-08-18 11:31:08,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3866990.0, ans=0.1 2024-08-18 11:31:08,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3866990.0, ans=0.2 2024-08-18 11:31:11,995 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.004e+01 2.562e+01 2.806e+01 3.203e+01 5.774e+01, threshold=5.612e+01, percent-clipped=2.0 2024-08-18 11:31:15,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3867090.0, ans=0.125 2024-08-18 11:31:15,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3867090.0, ans=0.125 2024-08-18 11:31:21,162 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.47 vs. limit=22.5 2024-08-18 11:31:27,051 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3867090.0, ans=0.125 2024-08-18 11:31:28,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3867090.0, ans=0.035 2024-08-18 11:31:35,751 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 100, loss[loss=0.0933, beats_loss=0.01135, ecapa_loss=0.000118, whisper_loss=0.08077, over 22240.00 frames. ], tot_loss[loss=0.09934, beats_loss=0.009406, ecapa_loss=0.0001487, whisper_loss=0.08844, over 1512373.99 frames. ], batch size: 86, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:32:02,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3867290.0, ans=0.0 2024-08-18 11:32:13,688 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-18 11:32:15,915 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 24 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-18 11:32:20,302 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 26 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-18 11:32:42,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3867490.0, ans=0.0 2024-08-18 11:32:44,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3867490.0, ans=0.125 2024-08-18 11:33:16,686 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 150, loss[loss=0.1146, beats_loss=0.01094, ecapa_loss=0.0001281, whisper_loss=0.1024, over 22208.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.009364, ecapa_loss=0.0001458, whisper_loss=0.08982, over 2004170.91 frames. ], batch size: 84, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:33:20,063 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 31 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-18 11:33:23,484 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 15 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-18 11:33:39,015 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 19 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-18 11:33:43,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3867790.0, ans=0.1 2024-08-18 11:34:04,590 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.01 vs. limit=15.0 2024-08-18 11:34:16,876 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.470e+01 2.705e+01 3.027e+01 2.809e+02, threshold=5.410e+01, percent-clipped=1.0 2024-08-18 11:34:20,041 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.42 vs. limit=6.0 2024-08-18 11:34:33,635 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 200, loss[loss=0.09404, beats_loss=0.01022, ecapa_loss=0.0001648, whisper_loss=0.08217, over 14148.00 frames. ], tot_loss[loss=0.101, beats_loss=0.00957, ecapa_loss=0.0001473, whisper_loss=0.08993, over 2372814.88 frames. ], batch size: 56, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:34:39,989 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.67 vs. limit=22.5 2024-08-18 11:34:46,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3868290.0, ans=0.0 2024-08-18 11:34:53,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3868290.0, ans=0.5 2024-08-18 11:35:18,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=3868490.0, ans=0.05 2024-08-18 11:35:32,443 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 28 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-18 11:35:38,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3868590.0, ans=0.0 2024-08-18 11:35:43,958 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 250, loss[loss=0.1224, beats_loss=0.00771, ecapa_loss=0.0001619, whisper_loss=0.113, over 14807.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.009788, ecapa_loss=0.0001467, whisper_loss=0.09049, over 2682828.27 frames. ], batch size: 57, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:36:03,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3868790.0, ans=0.125 2024-08-18 11:36:08,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3868790.0, ans=0.0 2024-08-18 11:36:18,135 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.21 vs. limit=15.0 2024-08-18 11:36:19,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3868890.0, ans=0.0 2024-08-18 11:36:21,297 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 26 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-18 11:36:36,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3868990.0, ans=0.0 2024-08-18 11:36:37,655 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.770e+01 2.252e+01 2.480e+01 2.798e+01 3.781e+01, threshold=4.961e+01, percent-clipped=0.0 2024-08-18 11:36:52,184 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 300, loss[loss=0.1177, beats_loss=0.007468, ecapa_loss=0.0001799, whisper_loss=0.1084, over 19115.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01002, ecapa_loss=0.0001462, whisper_loss=0.09002, over 2961282.88 frames. ], batch size: 76, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:36:57,526 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.32 vs. limit=15.0 2024-08-18 11:37:03,068 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 24 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-18 11:37:11,437 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.42 vs. limit=12.0 2024-08-18 11:37:30,042 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.42 vs. limit=15.0 2024-08-18 11:37:36,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3869490.0, ans=0.1 2024-08-18 11:37:41,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3869490.0, ans=0.125 2024-08-18 11:37:43,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3869490.0, ans=0.1 2024-08-18 11:37:45,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3869590.0, ans=0.125 2024-08-18 11:37:48,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3869590.0, ans=0.1 2024-08-18 11:37:54,236 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.63 vs. limit=15.0 2024-08-18 11:38:00,743 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 350, loss[loss=0.1272, beats_loss=0.01076, ecapa_loss=0.0001104, whisper_loss=0.1154, over 18879.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01013, ecapa_loss=0.0001447, whisper_loss=0.09054, over 3186753.70 frames. ], batch size: 71, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:38:05,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3869690.0, ans=0.125 2024-08-18 11:38:11,050 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 21 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-18 11:38:11,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3869690.0, ans=0.0 2024-08-18 11:38:12,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3869690.0, ans=0.125 2024-08-18 11:38:14,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3869790.0, ans=0.0 2024-08-18 11:38:33,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3869890.0, ans=0.0 2024-08-18 11:38:40,420 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 27 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-18 11:38:44,401 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-18 11:38:49,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3869990.0, ans=0.125 2024-08-18 11:38:53,104 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.198e+01 2.411e+01 2.717e+01 4.096e+01, threshold=4.822e+01, percent-clipped=0.0 2024-08-18 11:38:55,780 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 24 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-18 11:39:04,169 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.76 vs. limit=22.5 2024-08-18 11:39:07,795 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 400, loss[loss=0.09069, beats_loss=0.01099, ecapa_loss=0.0001737, whisper_loss=0.07796, over 22129.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01026, ecapa_loss=0.0001446, whisper_loss=0.08972, over 3344740.28 frames. ], batch size: 94, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:39:17,275 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 25 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-18 11:39:24,005 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 15 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-18 11:39:28,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3870290.0, ans=0.125 2024-08-18 11:39:53,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3870490.0, ans=0.025 2024-08-18 11:40:02,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3870590.0, ans=0.1 2024-08-18 11:40:15,997 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 450, loss[loss=0.08489, beats_loss=0.0084, ecapa_loss=0.0001705, whisper_loss=0.07478, over 13047.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01029, ecapa_loss=0.0001463, whisper_loss=0.08902, over 3464014.37 frames. ], batch size: 53, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:40:17,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3870690.0, ans=0.0 2024-08-18 11:40:24,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3870690.0, ans=0.0 2024-08-18 11:40:44,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3870890.0, ans=0.0 2024-08-18 11:40:54,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3870890.0, ans=0.125 2024-08-18 11:40:57,983 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 24 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-18 11:41:08,577 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.334e+01 2.651e+01 3.139e+01 3.582e+02, threshold=5.301e+01, percent-clipped=3.0 2024-08-18 11:41:23,189 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 500, loss[loss=0.07693, beats_loss=0.01214, ecapa_loss=0.0001361, whisper_loss=0.06343, over 18723.00 frames. ], tot_loss[loss=0.1, beats_loss=0.01034, ecapa_loss=0.0001458, whisper_loss=0.08825, over 3525508.47 frames. ], batch size: 77, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:41:24,512 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-18 11:41:26,918 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 21 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-18 11:41:28,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3871190.0, ans=0.125 2024-08-18 11:41:29,842 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.88 vs. limit=22.5 2024-08-18 11:41:33,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3871190.0, ans=0.1 2024-08-18 11:41:50,671 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-18 11:41:57,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3871390.0, ans=0.1 2024-08-18 11:42:13,211 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.49 vs. limit=15.0 2024-08-18 11:42:17,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3871590.0, ans=0.0 2024-08-18 11:42:20,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3871590.0, ans=0.125 2024-08-18 11:42:28,766 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 550, loss[loss=0.1036, beats_loss=0.008978, ecapa_loss=0.0001532, whisper_loss=0.09312, over 20319.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01029, ecapa_loss=0.0001451, whisper_loss=0.089, over 3604867.20 frames. ], batch size: 76, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:42:35,988 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.72 vs. limit=22.5 2024-08-18 11:42:44,702 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-18 11:42:56,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3871890.0, ans=0.2 2024-08-18 11:43:20,153 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.708e+01 2.358e+01 2.658e+01 2.869e+01 1.652e+02, threshold=5.315e+01, percent-clipped=4.0 2024-08-18 11:43:21,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3872090.0, ans=0.125 2024-08-18 11:43:26,732 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-18 11:43:27,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3872090.0, ans=0.0 2024-08-18 11:43:34,852 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 600, loss[loss=0.1136, beats_loss=0.01146, ecapa_loss=0.0001043, whisper_loss=0.1011, over 23840.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01033, ecapa_loss=0.0001435, whisper_loss=0.08927, over 3657022.04 frames. ], batch size: 89, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:43:38,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3872190.0, ans=0.0 2024-08-18 11:43:41,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3872190.0, ans=0.125 2024-08-18 11:43:44,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3872190.0, ans=0.125 2024-08-18 11:43:47,563 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 28 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-18 11:43:47,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3872290.0, ans=0.125 2024-08-18 11:43:57,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3872290.0, ans=0.125 2024-08-18 11:44:04,858 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.25 vs. limit=15.0 2024-08-18 11:44:18,845 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.76 vs. limit=15.0 2024-08-18 11:44:21,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3872490.0, ans=0.0 2024-08-18 11:44:29,178 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-18 11:44:33,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3872590.0, ans=0.125 2024-08-18 11:44:43,873 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 650, loss[loss=0.08468, beats_loss=0.01225, ecapa_loss=0.000116, whisper_loss=0.07127, over 22230.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01034, ecapa_loss=0.0001435, whisper_loss=0.08944, over 3704439.56 frames. ], batch size: 88, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:44:49,924 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.94 vs. limit=15.0 2024-08-18 11:44:53,593 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3872690.0, ans=0.1 2024-08-18 11:44:55,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3872690.0, ans=0.0 2024-08-18 11:44:56,063 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 18 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-18 11:44:56,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3872790.0, ans=0.125 2024-08-18 11:45:16,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3872890.0, ans=0.125 2024-08-18 11:45:37,151 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.251e+01 2.525e+01 2.780e+01 3.539e+01, threshold=5.050e+01, percent-clipped=0.0 2024-08-18 11:45:51,728 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 700, loss[loss=0.09322, beats_loss=0.01118, ecapa_loss=0.0001174, whisper_loss=0.08087, over 16795.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01035, ecapa_loss=0.0001434, whisper_loss=0.09027, over 3724229.49 frames. ], batch size: 64, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:45:56,326 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.35 vs. limit=15.0 2024-08-18 11:46:23,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3873390.0, ans=0.04949747468305833 2024-08-18 11:46:26,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3873390.0, ans=0.0 2024-08-18 11:46:36,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=3873490.0, ans=0.5 2024-08-18 11:46:52,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3873590.0, ans=0.125 2024-08-18 11:46:52,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3873590.0, ans=0.125 2024-08-18 11:46:52,332 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.50 vs. limit=15.0 2024-08-18 11:46:59,020 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.465e+01 2024-08-18 11:46:59,749 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 750, loss[loss=0.0906, beats_loss=0.01227, ecapa_loss=0.0001151, whisper_loss=0.07719, over 15138.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01032, ecapa_loss=0.0001439, whisper_loss=0.09061, over 3733222.18 frames. ], batch size: 59, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:46:59,923 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-18 11:47:00,203 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.139e+01 2024-08-18 11:47:07,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3873690.0, ans=0.0 2024-08-18 11:47:13,665 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.809e+01 2024-08-18 11:47:15,768 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 13 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-18 11:47:28,470 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 18 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-18 11:47:31,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3873890.0, ans=0.125 2024-08-18 11:47:40,701 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.310e+00 2024-08-18 11:47:49,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3873990.0, ans=0.1 2024-08-18 11:47:53,146 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.637e+01 2.240e+01 2.488e+01 2.790e+01 6.250e+01, threshold=4.975e+01, percent-clipped=1.0 2024-08-18 11:48:07,632 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 800, loss[loss=0.1089, beats_loss=0.009603, ecapa_loss=0.0001226, whisper_loss=0.09804, over 18323.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01035, ecapa_loss=0.0001441, whisper_loss=0.08977, over 3726933.52 frames. ], batch size: 68, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:48:26,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3874290.0, ans=0.125 2024-08-18 11:48:39,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3874390.0, ans=0.2 2024-08-18 11:48:49,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3874490.0, ans=0.1 2024-08-18 11:48:58,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3874490.0, ans=0.1 2024-08-18 11:49:00,058 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-18 11:49:06,847 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 26 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-18 11:49:12,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3874590.0, ans=0.125 2024-08-18 11:49:17,197 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 850, loss[loss=0.09531, beats_loss=0.01254, ecapa_loss=0.0001003, whisper_loss=0.08176, over 20751.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01037, ecapa_loss=0.0001433, whisper_loss=0.08926, over 3746162.38 frames. ], batch size: 81, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:49:25,608 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-18 11:49:27,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3874690.0, ans=0.125 2024-08-18 11:49:32,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3874790.0, ans=0.1 2024-08-18 11:49:42,595 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-18 11:49:43,279 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.55 vs. limit=12.0 2024-08-18 11:49:57,661 WARNING [optim.py:496] (0/4) Scaling gradients by 0.08740431070327759, model_norm_threshold=49.75291061401367 2024-08-18 11:49:57,865 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.3.norm.log_scale with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.157e+04, grad_sumsq=5.157e+04, orig_rms_sq=1.000e+00 2024-08-18 11:49:59,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3874990.0, ans=0.2 2024-08-18 11:49:59,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3874990.0, ans=0.0 2024-08-18 11:50:11,651 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.772e+01 2.302e+01 2.593e+01 2.885e+01 5.692e+02, threshold=5.187e+01, percent-clipped=2.0 2024-08-18 11:50:27,162 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 900, loss[loss=0.08628, beats_loss=0.01033, ecapa_loss=0.0001686, whisper_loss=0.07427, over 18827.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0104, ecapa_loss=0.0001428, whisper_loss=0.08942, over 3736820.11 frames. ], batch size: 77, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:50:28,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3875190.0, ans=0.2 2024-08-18 11:50:50,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3875290.0, ans=0.125 2024-08-18 11:51:06,387 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 18 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-18 11:51:19,265 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.62 vs. limit=15.0 2024-08-18 11:51:26,592 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.30 vs. limit=12.0 2024-08-18 11:51:28,195 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.10 vs. limit=6.0 2024-08-18 11:51:35,071 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.02 vs. limit=15.0 2024-08-18 11:51:36,850 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 950, loss[loss=0.07378, beats_loss=0.01257, ecapa_loss=0.0001588, whisper_loss=0.05962, over 22424.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01046, ecapa_loss=0.0001419, whisper_loss=0.08864, over 3770742.06 frames. ], batch size: 91, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:51:37,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=3875690.0, ans=0.05 2024-08-18 11:51:37,379 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.142e-03 2024-08-18 11:51:57,131 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.31 vs. limit=15.0 2024-08-18 11:52:05,153 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.04 vs. limit=10.0 2024-08-18 11:52:05,991 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 11 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-18 11:52:16,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3875890.0, ans=0.2 2024-08-18 11:52:20,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3875990.0, ans=0.1 2024-08-18 11:52:21,222 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-18 11:52:30,840 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.712e+01 2.296e+01 2.523e+01 2.746e+01 1.713e+02, threshold=5.046e+01, percent-clipped=1.0 2024-08-18 11:52:44,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3876090.0, ans=0.125 2024-08-18 11:52:46,850 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 1000, loss[loss=0.1149, beats_loss=0.009256, ecapa_loss=0.0001454, whisper_loss=0.1042, over 16114.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01046, ecapa_loss=0.0001414, whisper_loss=0.08887, over 3758179.53 frames. ], batch size: 63, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:53:10,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3876290.0, ans=0.04949747468305833 2024-08-18 11:53:34,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3876490.0, ans=0.1 2024-08-18 11:53:37,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3876490.0, ans=0.125 2024-08-18 11:53:39,365 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.92 vs. limit=15.0 2024-08-18 11:53:57,817 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 1050, loss[loss=0.1132, beats_loss=0.009464, ecapa_loss=0.0001753, whisper_loss=0.102, over 22747.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01039, ecapa_loss=0.0001419, whisper_loss=0.08921, over 3761641.59 frames. ], batch size: 90, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:54:11,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3876790.0, ans=0.0 2024-08-18 11:54:24,312 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 22 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-18 11:54:25,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3876890.0, ans=0.0 2024-08-18 11:54:35,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3876890.0, ans=0.125 2024-08-18 11:54:48,464 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 22 from LS+wenet, 15 from Vox, 17 fro AS 2024-08-18 11:54:54,094 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.342e+01 2.695e+01 2.899e+01 4.255e+01, threshold=5.389e+01, percent-clipped=0.0 2024-08-18 11:54:56,051 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3877090.0, ans=0.125 2024-08-18 11:55:10,013 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 1100, loss[loss=0.09935, beats_loss=0.009386, ecapa_loss=0.000145, whisper_loss=0.08851, over 16647.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01046, ecapa_loss=0.0001412, whisper_loss=0.08854, over 3745777.23 frames. ], batch size: 65, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:55:24,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3877290.0, ans=0.0 2024-08-18 11:55:27,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3877290.0, ans=0.07 2024-08-18 11:55:31,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3877290.0, ans=0.125 2024-08-18 11:55:53,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3877490.0, ans=0.125 2024-08-18 11:56:00,389 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.36 vs. limit=6.0 2024-08-18 11:56:15,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3877590.0, ans=0.1 2024-08-18 11:56:23,637 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 1150, loss[loss=0.05372, beats_loss=0.008938, ecapa_loss=0.0001458, whisper_loss=0.04332, over 14965.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01051, ecapa_loss=0.00014, whisper_loss=0.08876, over 3783250.40 frames. ], batch size: 57, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:56:28,667 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.74 vs. limit=22.5 2024-08-18 11:56:29,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3877690.0, ans=0.0 2024-08-18 11:56:32,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3877690.0, ans=0.0 2024-08-18 11:56:40,731 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.32 vs. limit=15.0 2024-08-18 11:57:10,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3877990.0, ans=0.125 2024-08-18 11:57:13,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3877990.0, ans=0.125 2024-08-18 11:57:18,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3877990.0, ans=0.125 2024-08-18 11:57:19,249 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.346e+01 2.566e+01 2.901e+01 4.362e+01, threshold=5.132e+01, percent-clipped=0.0 2024-08-18 11:57:23,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3878090.0, ans=0.125 2024-08-18 11:57:29,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3878090.0, ans=0.0 2024-08-18 11:57:32,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3878090.0, ans=0.1 2024-08-18 11:57:34,926 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 1200, loss[loss=0.09437, beats_loss=0.01203, ecapa_loss=0.0001031, whisper_loss=0.08131, over 24565.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01061, ecapa_loss=0.0001399, whisper_loss=0.08833, over 3783485.45 frames. ], batch size: 95, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:57:51,049 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-18 11:57:58,156 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-18 11:58:00,195 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.02 vs. limit=15.0 2024-08-18 11:58:13,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3878390.0, ans=0.125 2024-08-18 11:58:13,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3878390.0, ans=0.125 2024-08-18 11:58:23,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3878490.0, ans=0.0 2024-08-18 11:58:46,587 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 1250, loss[loss=0.1112, beats_loss=0.009881, ecapa_loss=0.0001407, whisper_loss=0.09993, over 18845.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01061, ecapa_loss=0.0001403, whisper_loss=0.08804, over 3826938.35 frames. ], batch size: 77, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:58:50,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3878690.0, ans=0.125 2024-08-18 11:59:02,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3878790.0, ans=0.0 2024-08-18 11:59:20,228 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-18 11:59:35,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3878990.0, ans=0.0 2024-08-18 11:59:40,453 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 25 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-18 11:59:43,006 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.277e+01 2.561e+01 2.801e+01 1.202e+02, threshold=5.122e+01, percent-clipped=2.0 2024-08-18 11:59:59,272 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 1300, loss[loss=0.1111, beats_loss=0.00988, ecapa_loss=0.0001249, whisper_loss=0.09994, over 19997.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01059, ecapa_loss=0.0001412, whisper_loss=0.08848, over 3828643.37 frames. ], batch size: 78, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:00:05,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten.whitening_limit, batch_count=3879190.0, ans=22.5 2024-08-18 12:00:33,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3879390.0, ans=0.1 2024-08-18 12:00:36,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3879390.0, ans=0.025 2024-08-18 12:00:36,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=3879390.0, ans=6.0 2024-08-18 12:00:53,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3879490.0, ans=0.025 2024-08-18 12:00:58,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=3879590.0, ans=0.025 2024-08-18 12:01:12,029 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 1350, loss[loss=0.09896, beats_loss=0.01175, ecapa_loss=8.817e-05, whisper_loss=0.08634, over 20859.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01048, ecapa_loss=0.0001426, whisper_loss=0.08929, over 3829651.56 frames. ], batch size: 78, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:01:25,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3879690.0, ans=0.125 2024-08-18 12:01:53,271 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.57 vs. limit=22.5 2024-08-18 12:01:54,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3879890.0, ans=0.125 2024-08-18 12:01:58,568 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-388000.pt 2024-08-18 12:02:15,304 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.267e+01 2.490e+01 2.849e+01 7.961e+01, threshold=4.979e+01, percent-clipped=1.0 2024-08-18 12:02:31,455 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 1400, loss[loss=0.09075, beats_loss=0.008199, ecapa_loss=0.0001334, whisper_loss=0.08122, over 15773.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01053, ecapa_loss=0.0001421, whisper_loss=0.08849, over 3803691.38 frames. ], batch size: 57, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:02:38,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3880190.0, ans=0.2 2024-08-18 12:03:14,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=3880390.0, ans=0.02 2024-08-18 12:03:18,794 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-18 12:03:20,265 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-18 12:03:24,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3880490.0, ans=0.125 2024-08-18 12:03:50,382 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 22 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-18 12:04:14,357 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 1450, loss[loss=0.1107, beats_loss=0.01083, ecapa_loss=0.0001373, whisper_loss=0.09848, over 18506.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.0105, ecapa_loss=0.0001425, whisper_loss=0.08886, over 3839443.53 frames. ], batch size: 73, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:04:30,560 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-18 12:04:33,529 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.434e+00 2024-08-18 12:04:56,456 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-18 12:05:07,527 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.25 vs. limit=15.0 2024-08-18 12:05:14,247 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.730e+01 2.279e+01 2.479e+01 2.751e+01 4.183e+01, threshold=4.959e+01, percent-clipped=0.0 2024-08-18 12:05:16,592 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 19 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-18 12:05:21,637 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.86 vs. limit=15.0 2024-08-18 12:05:27,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3881090.0, ans=0.125 2024-08-18 12:05:29,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3881190.0, ans=0.125 2024-08-18 12:05:30,744 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 1500, loss[loss=0.1069, beats_loss=0.01029, ecapa_loss=9.99e-05, whisper_loss=0.09558, over 15938.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01047, ecapa_loss=0.0001424, whisper_loss=0.08849, over 3814380.69 frames. ], batch size: 57, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:05:30,909 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-18 12:05:46,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3881290.0, ans=0.0 2024-08-18 12:05:56,277 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.30 vs. limit=15.0 2024-08-18 12:06:08,674 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-18 12:06:28,343 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-18 12:06:34,400 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.24 vs. limit=12.0 2024-08-18 12:06:35,713 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.99 vs. limit=15.0 2024-08-18 12:06:44,518 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 1550, loss[loss=0.09704, beats_loss=0.01139, ecapa_loss=0.0001412, whisper_loss=0.08424, over 23463.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01044, ecapa_loss=0.0001417, whisper_loss=0.08821, over 3802134.49 frames. ], batch size: 92, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:06:50,052 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 24 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-18 12:07:05,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3881790.0, ans=0.125 2024-08-18 12:07:41,067 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 22 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-18 12:07:44,483 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 24 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-18 12:07:44,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3882090.0, ans=0.1 2024-08-18 12:07:45,730 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.226e+01 2.364e+01 2.655e+01 3.408e+01, threshold=4.729e+01, percent-clipped=0.0 2024-08-18 12:08:00,514 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 1600, loss[loss=0.1034, beats_loss=0.01126, ecapa_loss=0.0001701, whisper_loss=0.09047, over 21091.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01042, ecapa_loss=0.0001412, whisper_loss=0.08832, over 3813450.08 frames. ], batch size: 89, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:08:04,161 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=8.087e-03 2024-08-18 12:08:16,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3882290.0, ans=0.1 2024-08-18 12:08:18,254 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.95 vs. limit=15.0 2024-08-18 12:08:26,853 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 12 from Vox, 37 fro AS 2024-08-18 12:08:35,173 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-18 12:08:46,261 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 34 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-18 12:08:52,389 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.88 vs. limit=22.5 2024-08-18 12:08:55,132 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 24 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-18 12:08:56,613 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-18 12:09:10,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3882590.0, ans=0.1 2024-08-18 12:09:16,005 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 1650, loss[loss=0.1282, beats_loss=0.008927, ecapa_loss=0.0001298, whisper_loss=0.1179, over 17374.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01035, ecapa_loss=0.0001418, whisper_loss=0.08893, over 3814108.35 frames. ], batch size: 67, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:09:24,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3882690.0, ans=0.0 2024-08-18 12:09:24,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3882690.0, ans=0.125 2024-08-18 12:09:32,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3882790.0, ans=0.1 2024-08-18 12:09:39,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3882790.0, ans=0.125 2024-08-18 12:10:06,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3882990.0, ans=0.0 2024-08-18 12:10:06,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3882990.0, ans=0.0 2024-08-18 12:10:12,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3883090.0, ans=0.125 2024-08-18 12:10:13,364 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.497e+01 2.361e+01 2.617e+01 2.894e+01 3.984e+01, threshold=5.235e+01, percent-clipped=0.0 2024-08-18 12:10:19,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3883090.0, ans=0.125 2024-08-18 12:10:22,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3883090.0, ans=0.1 2024-08-18 12:10:27,970 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 1700, loss[loss=0.1105, beats_loss=0.00798, ecapa_loss=0.000165, whisper_loss=0.1009, over 20254.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01038, ecapa_loss=0.0001412, whisper_loss=0.08897, over 3803210.38 frames. ], batch size: 79, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:10:29,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3883190.0, ans=0.125 2024-08-18 12:10:38,912 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-18 12:10:46,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3883290.0, ans=0.125 2024-08-18 12:10:48,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3883290.0, ans=0.125 2024-08-18 12:10:49,440 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 14 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-18 12:11:01,220 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.43 vs. limit=22.5 2024-08-18 12:11:02,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3883390.0, ans=0.125 2024-08-18 12:11:12,147 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 17 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-18 12:11:16,002 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 34 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-18 12:11:17,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3883490.0, ans=0.1 2024-08-18 12:11:20,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3883490.0, ans=0.125 2024-08-18 12:11:23,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3883590.0, ans=0.0 2024-08-18 12:11:31,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3883590.0, ans=0.0 2024-08-18 12:11:38,434 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 1750, loss[loss=0.0901, beats_loss=0.008706, ecapa_loss=0.0001559, whisper_loss=0.07983, over 16325.00 frames. ], tot_loss[loss=0.1, beats_loss=0.01039, ecapa_loss=0.0001423, whisper_loss=0.08821, over 3788454.78 frames. ], batch size: 66, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:11:47,501 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.77 vs. limit=15.0 2024-08-18 12:11:48,381 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 24 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-18 12:11:48,821 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.01 vs. limit=22.5 2024-08-18 12:11:49,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3883690.0, ans=0.0 2024-08-18 12:12:16,048 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-18 12:12:20,074 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-18 12:12:20,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3883990.0, ans=0.1 2024-08-18 12:12:20,610 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.45 vs. limit=15.0 2024-08-18 12:12:31,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3883990.0, ans=0.125 2024-08-18 12:12:34,932 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.272e+01 2.549e+01 2.826e+01 1.079e+02, threshold=5.098e+01, percent-clipped=1.0 2024-08-18 12:12:44,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3884090.0, ans=0.125 2024-08-18 12:12:47,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3884190.0, ans=0.0 2024-08-18 12:12:48,430 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 1800, loss[loss=0.1088, beats_loss=0.01057, ecapa_loss=0.0001271, whisper_loss=0.09694, over 24576.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01039, ecapa_loss=0.000141, whisper_loss=0.08867, over 3810800.28 frames. ], batch size: 94, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:12:51,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3884190.0, ans=0.0 2024-08-18 12:12:54,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3884190.0, ans=0.0 2024-08-18 12:13:07,895 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-18 12:13:18,351 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-18 12:13:30,483 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 17 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-18 12:13:48,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3884590.0, ans=0.0 2024-08-18 12:13:50,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3884590.0, ans=0.125 2024-08-18 12:13:53,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3884590.0, ans=0.125 2024-08-18 12:13:58,574 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 1850, loss[loss=0.1192, beats_loss=0.009718, ecapa_loss=0.0001397, whisper_loss=0.108, over 15737.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01039, ecapa_loss=0.000141, whisper_loss=0.08882, over 3784418.33 frames. ], batch size: 63, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:14:08,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3884690.0, ans=0.125 2024-08-18 12:14:25,685 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.85 vs. limit=15.0 2024-08-18 12:14:33,062 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-18 12:14:34,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3884890.0, ans=0.0 2024-08-18 12:14:36,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3884890.0, ans=0.1 2024-08-18 12:14:39,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3884990.0, ans=0.0 2024-08-18 12:14:47,098 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.46 vs. limit=8.0 2024-08-18 12:14:48,832 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 25 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-18 12:14:50,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3884990.0, ans=0.0 2024-08-18 12:14:51,705 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 21 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-18 12:14:54,283 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.251e+01 2.486e+01 2.812e+01 3.198e+02, threshold=4.971e+01, percent-clipped=2.0 2024-08-18 12:15:08,234 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 1900, loss[loss=0.1023, beats_loss=0.009169, ecapa_loss=0.0001388, whisper_loss=0.09177, over 14328.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01038, ecapa_loss=0.0001398, whisper_loss=0.08869, over 3792990.83 frames. ], batch size: 55, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:15:28,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3885290.0, ans=0.1 2024-08-18 12:15:32,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3885290.0, ans=0.125 2024-08-18 12:15:33,262 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-18 12:15:54,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3885490.0, ans=0.0 2024-08-18 12:15:59,683 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 17 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-18 12:16:13,476 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 23 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-18 12:16:17,363 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 1950, loss[loss=0.07332, beats_loss=0.009235, ecapa_loss=0.0001872, whisper_loss=0.06221, over 14446.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01041, ecapa_loss=0.0001402, whisper_loss=0.0886, over 3798083.21 frames. ], batch size: 61, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:16:30,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3885790.0, ans=0.0 2024-08-18 12:16:39,030 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 34 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-18 12:16:47,308 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2024-08-18 12:16:49,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3885890.0, ans=0.125 2024-08-18 12:16:55,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3885890.0, ans=0.125 2024-08-18 12:17:12,596 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.75 vs. limit=12.0 2024-08-18 12:17:14,426 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.298e+01 2.562e+01 2.933e+01 2.107e+02, threshold=5.124e+01, percent-clipped=3.0 2024-08-18 12:17:20,532 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 17 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-18 12:17:29,253 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 2000, loss[loss=0.1111, beats_loss=0.01098, ecapa_loss=0.0001455, whisper_loss=0.09867, over 20272.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01039, ecapa_loss=0.0001399, whisper_loss=0.0893, over 3796027.29 frames. ], batch size: 80, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:17:43,731 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.81 vs. limit=5.0 2024-08-18 12:17:48,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3886290.0, ans=0.0 2024-08-18 12:17:49,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3886290.0, ans=0.2 2024-08-18 12:18:03,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3886390.0, ans=0.1 2024-08-18 12:18:40,183 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 2050, loss[loss=0.1026, beats_loss=0.007824, ecapa_loss=0.0001554, whisper_loss=0.09322, over 15267.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01041, ecapa_loss=0.0001395, whisper_loss=0.08963, over 3812897.99 frames. ], batch size: 60, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:18:42,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3886690.0, ans=0.0 2024-08-18 12:18:44,823 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 31 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-18 12:18:46,112 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 28 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-18 12:18:59,777 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 30 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-18 12:19:05,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3886790.0, ans=0.0 2024-08-18 12:19:10,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3886890.0, ans=0.125 2024-08-18 12:19:23,673 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-18 12:19:27,211 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.79 vs. limit=15.0 2024-08-18 12:19:30,803 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 30 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-18 12:19:33,727 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.13 vs. limit=15.0 2024-08-18 12:19:36,188 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.270e+01 2.543e+01 2.879e+01 5.540e+01, threshold=5.086e+01, percent-clipped=1.0 2024-08-18 12:19:43,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3887090.0, ans=0.125 2024-08-18 12:19:44,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=3887090.0, ans=22.5 2024-08-18 12:19:50,517 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 2100, loss[loss=0.09519, beats_loss=0.008326, ecapa_loss=0.0001198, whisper_loss=0.08567, over 15082.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01044, ecapa_loss=0.0001394, whisper_loss=0.08977, over 3833563.85 frames. ], batch size: 53, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:19:57,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3887190.0, ans=0.0 2024-08-18 12:20:12,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3887290.0, ans=0.125 2024-08-18 12:20:13,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=3887290.0, ans=0.5 2024-08-18 12:20:19,480 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 18 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-18 12:20:32,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3887490.0, ans=0.125 2024-08-18 12:20:38,617 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-18 12:20:42,624 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-18 12:20:42,918 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3887490.0, ans=0.125 2024-08-18 12:20:59,990 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 2150, loss[loss=0.09977, beats_loss=0.01136, ecapa_loss=0.0001288, whisper_loss=0.08712, over 22480.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0106, ecapa_loss=0.0001389, whisper_loss=0.08929, over 3848825.05 frames. ], batch size: 89, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:21:20,140 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 26 from LS+wenet, 9 from Vox, 28 fro AS 2024-08-18 12:21:21,338 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 29 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-18 12:21:58,473 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.293e+01 2.488e+01 2.854e+01 6.681e+01, threshold=4.977e+01, percent-clipped=1.0 2024-08-18 12:22:07,299 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 16 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-18 12:22:12,552 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 2200, loss[loss=0.08656, beats_loss=0.01126, ecapa_loss=0.0001713, whisper_loss=0.07359, over 18412.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01058, ecapa_loss=0.000139, whisper_loss=0.08991, over 3825819.40 frames. ], batch size: 78, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:22:25,010 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 31 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-18 12:22:42,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3888390.0, ans=0.125 2024-08-18 12:22:50,185 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-18 12:22:51,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3888390.0, ans=0.1 2024-08-18 12:22:54,533 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.18 vs. limit=12.0 2024-08-18 12:22:58,909 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.77 vs. limit=22.5 2024-08-18 12:23:13,986 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 16 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-18 12:23:16,754 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 24 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-18 12:23:20,247 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.25 vs. limit=15.0 2024-08-18 12:23:23,885 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 2250, loss[loss=0.07932, beats_loss=0.01137, ecapa_loss=0.0001382, whisper_loss=0.06656, over 15861.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01064, ecapa_loss=0.0001405, whisper_loss=0.08975, over 3844147.38 frames. ], batch size: 67, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:23:28,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3888690.0, ans=0.125 2024-08-18 12:23:38,316 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-18 12:23:44,462 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 25 from LS+wenet, 25 from Vox, 22 fro AS 2024-08-18 12:23:50,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3888790.0, ans=0.0 2024-08-18 12:23:57,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3888890.0, ans=0.125 2024-08-18 12:24:10,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn2.whiten.whitening_limit, batch_count=3888990.0, ans=22.5 2024-08-18 12:24:26,488 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.345e+01 2.551e+01 2.862e+01 1.228e+02, threshold=5.102e+01, percent-clipped=1.0 2024-08-18 12:24:42,544 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 2300, loss[loss=0.09705, beats_loss=0.009266, ecapa_loss=0.0001317, whisper_loss=0.08647, over 13915.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01058, ecapa_loss=0.0001412, whisper_loss=0.09025, over 3856521.92 frames. ], batch size: 53, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:24:46,867 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 12:24:56,134 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.58 vs. limit=22.5 2024-08-18 12:25:17,497 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.45 vs. limit=22.5 2024-08-18 12:25:20,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3889390.0, ans=0.125 2024-08-18 12:25:33,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3889490.0, ans=0.125 2024-08-18 12:25:45,295 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-18 12:25:52,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3889590.0, ans=0.125 2024-08-18 12:26:01,343 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 2350, loss[loss=0.1169, beats_loss=0.009252, ecapa_loss=0.0001509, whisper_loss=0.1062, over 23550.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01053, ecapa_loss=0.0001414, whisper_loss=0.09078, over 3883014.89 frames. ], batch size: 92, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:26:09,338 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 29 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-18 12:26:09,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3889690.0, ans=0.125 2024-08-18 12:26:18,425 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-18 12:26:32,425 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 12:26:38,929 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=5.219e-03 2024-08-18 12:26:47,676 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 9 from Vox, 27 fro AS 2024-08-18 12:26:51,103 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-18 12:26:54,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3889990.0, ans=0.0 2024-08-18 12:27:03,897 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.25 vs. limit=10.0 2024-08-18 12:27:03,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=3890090.0, ans=10.0 2024-08-18 12:27:04,491 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.417e+01 2.635e+01 3.019e+01 1.167e+02, threshold=5.271e+01, percent-clipped=1.0 2024-08-18 12:27:12,607 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=10.06 vs. limit=12.0 2024-08-18 12:27:13,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3890090.0, ans=0.125 2024-08-18 12:27:18,843 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 2400, loss[loss=0.1025, beats_loss=0.01261, ecapa_loss=9.538e-05, whisper_loss=0.08895, over 23216.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01046, ecapa_loss=0.0001408, whisper_loss=0.09129, over 3867350.79 frames. ], batch size: 88, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:27:23,760 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 22 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-18 12:27:36,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3890290.0, ans=0.0 2024-08-18 12:27:46,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3890390.0, ans=0.1 2024-08-18 12:27:48,882 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.84 vs. limit=15.0 2024-08-18 12:27:55,673 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-18 12:28:01,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3890490.0, ans=0.0 2024-08-18 12:28:06,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3890490.0, ans=0.1 2024-08-18 12:28:21,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3890590.0, ans=0.0 2024-08-18 12:28:30,916 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 2450, loss[loss=0.1115, beats_loss=0.01115, ecapa_loss=0.0001516, whisper_loss=0.09888, over 17042.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01044, ecapa_loss=0.0001401, whisper_loss=0.09161, over 3874214.24 frames. ], batch size: 71, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:29:21,809 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-18 12:29:21,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3890990.0, ans=0.0 2024-08-18 12:29:27,932 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.266e+01 2.434e+01 2.760e+01 4.670e+01, threshold=4.867e+01, percent-clipped=0.0 2024-08-18 12:29:28,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3891090.0, ans=0.0 2024-08-18 12:29:36,870 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.50 vs. limit=15.0 2024-08-18 12:29:42,803 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 2500, loss[loss=0.1308, beats_loss=0.009436, ecapa_loss=0.0001274, whisper_loss=0.1201, over 24757.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01038, ecapa_loss=0.0001412, whisper_loss=0.09179, over 3873165.77 frames. ], batch size: 91, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:29:49,647 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 27 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-18 12:29:53,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3891190.0, ans=0.125 2024-08-18 12:29:54,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3891190.0, ans=0.125 2024-08-18 12:30:01,535 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-18 12:30:06,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3891290.0, ans=0.125 2024-08-18 12:30:17,716 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 18 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-18 12:30:24,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3891490.0, ans=0.125 2024-08-18 12:30:48,210 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 27 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-18 12:30:50,790 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 29 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-18 12:30:51,873 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 2550, loss[loss=0.1109, beats_loss=0.01001, ecapa_loss=0.0001196, whisper_loss=0.09972, over 20113.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01039, ecapa_loss=0.000141, whisper_loss=0.092, over 3895727.44 frames. ], batch size: 80, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:31:09,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3891790.0, ans=0.125 2024-08-18 12:31:16,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3891790.0, ans=0.125 2024-08-18 12:31:28,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3891890.0, ans=0.125 2024-08-18 12:31:30,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3891990.0, ans=0.125 2024-08-18 12:31:40,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3891990.0, ans=0.125 2024-08-18 12:31:43,209 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.892e+01 2.325e+01 2.537e+01 2.934e+01 3.751e+01, threshold=5.074e+01, percent-clipped=0.0 2024-08-18 12:31:52,079 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-18 12:31:55,533 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 2600, loss[loss=0.09624, beats_loss=0.01244, ecapa_loss=0.0001219, whisper_loss=0.08258, over 19435.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01046, ecapa_loss=0.0001414, whisper_loss=0.0914, over 3870666.84 frames. ], batch size: 77, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:31:58,194 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 35 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-18 12:32:18,399 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-18 12:32:20,363 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.25 vs. limit=15.0 2024-08-18 12:32:24,093 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.85 vs. limit=22.5 2024-08-18 12:32:34,376 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-18 12:32:37,202 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.95 vs. limit=12.0 2024-08-18 12:32:49,094 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.01 vs. limit=15.0 2024-08-18 12:32:53,273 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 12:32:57,983 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 2650, loss[loss=0.09576, beats_loss=0.01311, ecapa_loss=0.0001437, whisper_loss=0.08121, over 22303.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01044, ecapa_loss=0.0001426, whisper_loss=0.09124, over 3860746.20 frames. ], batch size: 93, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:33:30,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3892890.0, ans=0.125 2024-08-18 12:33:37,935 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.37 vs. limit=15.0 2024-08-18 12:33:40,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3892990.0, ans=0.1 2024-08-18 12:33:41,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3892990.0, ans=0.1 2024-08-18 12:33:48,369 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.994e+01 2.397e+01 2.618e+01 2.808e+01 4.074e+01, threshold=5.236e+01, percent-clipped=0.0 2024-08-18 12:33:57,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3893090.0, ans=0.125 2024-08-18 12:33:59,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3893190.0, ans=0.1 2024-08-18 12:34:00,612 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 2700, loss[loss=0.1158, beats_loss=0.008605, ecapa_loss=0.0001513, whisper_loss=0.1057, over 15155.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01043, ecapa_loss=0.000143, whisper_loss=0.09101, over 3863695.69 frames. ], batch size: 58, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:34:03,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3893190.0, ans=0.2 2024-08-18 12:34:04,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3893190.0, ans=0.125 2024-08-18 12:34:08,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3893190.0, ans=0.125 2024-08-18 12:34:10,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3893190.0, ans=0.0 2024-08-18 12:34:16,006 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-18 12:34:21,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3893290.0, ans=0.1 2024-08-18 12:34:44,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3893490.0, ans=0.125 2024-08-18 12:34:44,321 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.81 vs. limit=15.0 2024-08-18 12:34:49,929 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 25 from Vox, 20 fro AS 2024-08-18 12:35:03,436 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 2750, loss[loss=0.1088, beats_loss=0.009203, ecapa_loss=0.0001446, whisper_loss=0.09817, over 20571.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01049, ecapa_loss=0.0001423, whisper_loss=0.09069, over 3864545.86 frames. ], batch size: 79, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:35:08,542 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 22 from LS+wenet, 23 from Vox, 18 fro AS 2024-08-18 12:35:12,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3893690.0, ans=0.0 2024-08-18 12:35:12,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3893690.0, ans=0.125 2024-08-18 12:35:39,590 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.04 vs. limit=10.0 2024-08-18 12:35:41,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3893990.0, ans=0.0 2024-08-18 12:35:42,014 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.27 vs. limit=15.0 2024-08-18 12:35:46,147 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.66 vs. limit=15.0 2024-08-18 12:35:53,874 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.272e+01 2.417e+01 2.652e+01 4.888e+01, threshold=4.834e+01, percent-clipped=0.0 2024-08-18 12:36:06,020 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 2800, loss[loss=0.09824, beats_loss=0.009689, ecapa_loss=0.0001753, whisper_loss=0.08679, over 21310.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01047, ecapa_loss=0.0001422, whisper_loss=0.0906, over 3853515.01 frames. ], batch size: 89, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:36:09,040 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2024-08-18 12:36:14,933 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 21 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-18 12:36:21,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3894290.0, ans=0.0 2024-08-18 12:36:45,069 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 17 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-18 12:36:53,187 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=13.28 vs. limit=12.0 2024-08-18 12:36:57,665 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 12:37:05,870 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 12:37:07,014 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 22 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-18 12:37:08,229 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 2850, loss[loss=0.1053, beats_loss=0.008752, ecapa_loss=0.0001313, whisper_loss=0.09525, over 14841.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01048, ecapa_loss=0.000142, whisper_loss=0.09001, over 3837638.15 frames. ], batch size: 59, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:37:08,379 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-18 12:37:19,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3894790.0, ans=0.0 2024-08-18 12:37:20,765 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 12:37:24,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3894790.0, ans=0.125 2024-08-18 12:37:32,827 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 17 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-18 12:37:34,116 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-18 12:37:53,645 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 26 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-18 12:37:57,122 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.725e+01 2.309e+01 2.623e+01 2.921e+01 3.884e+01, threshold=5.245e+01, percent-clipped=0.0 2024-08-18 12:38:02,299 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 11 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-18 12:38:09,921 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 2900, loss[loss=0.1035, beats_loss=0.01028, ecapa_loss=0.0001358, whisper_loss=0.09184, over 14389.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01045, ecapa_loss=0.0001438, whisper_loss=0.09028, over 3852191.03 frames. ], batch size: 57, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:38:13,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3895190.0, ans=0.0 2024-08-18 12:38:28,445 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 14 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-18 12:38:32,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3895290.0, ans=0.125 2024-08-18 12:38:39,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3895390.0, ans=0.125 2024-08-18 12:38:45,526 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 27 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-18 12:38:47,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3895490.0, ans=0.2 2024-08-18 12:38:49,849 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.14 vs. limit=15.0 2024-08-18 12:39:00,323 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-18 12:39:05,252 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-18 12:39:11,110 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 2950, loss[loss=0.09892, beats_loss=0.01241, ecapa_loss=0.0001455, whisper_loss=0.08505, over 23209.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0105, ecapa_loss=0.0001442, whisper_loss=0.09023, over 3898665.86 frames. ], batch size: 94, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:39:14,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3895690.0, ans=0.125 2024-08-18 12:39:23,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3895790.0, ans=0.125 2024-08-18 12:39:24,092 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-18 12:39:26,533 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 23 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-18 12:39:48,538 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.92 vs. limit=15.0 2024-08-18 12:39:54,453 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-18 12:40:01,795 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.273e+01 2.616e+01 2.939e+01 5.806e+01, threshold=5.233e+01, percent-clipped=1.0 2024-08-18 12:40:04,181 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.61 vs. limit=15.0 2024-08-18 12:40:14,604 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 3000, loss[loss=0.1275, beats_loss=0.01027, ecapa_loss=0.0001358, whisper_loss=0.1159, over 23282.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01044, ecapa_loss=0.000145, whisper_loss=0.09062, over 3910769.47 frames. ], batch size: 90, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:40:14,605 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-18 12:40:51,458 INFO [train_multi_KD3.py:1149] (0/4) Epoch 27, validation on ASR_libri: loss=0.2535, beats_loss=0, ecapa_loss=0.000526, whisper_loss=0.2482, over 922467.00 frames. 2024-08-18 12:41:08,042 INFO [train_multi_KD3.py:1149] (0/4) Epoch 27, validation on SV_voxceleb1: loss=0.003954, beats_loss=0, ecapa_loss=0.0003954, whisper_loss=0, over 939242.00 frames. 2024-08-18 12:42:59,068 INFO [train_multi_KD3.py:1149] (0/4) Epoch 27, validation on AT_audioset: loss=0.02313, beats_loss=0.02313, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 12:42:59,072 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32389MB 2024-08-18 12:43:00,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3896190.0, ans=0.2 2024-08-18 12:43:11,253 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.51 vs. limit=10.0 2024-08-18 12:43:26,931 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 12:43:27,884 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 31 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-18 12:43:31,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3896390.0, ans=0.0 2024-08-18 12:43:34,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3896390.0, ans=0.0 2024-08-18 12:43:39,436 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.46 vs. limit=12.0 2024-08-18 12:43:46,860 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 12:43:54,231 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-18 12:43:55,411 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-18 12:44:00,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3896690.0, ans=0.0 2024-08-18 12:44:01,462 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 3050, loss[loss=0.1042, beats_loss=0.01092, ecapa_loss=0.0001466, whisper_loss=0.09185, over 16126.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01045, ecapa_loss=0.0001447, whisper_loss=0.09094, over 3931857.85 frames. ], batch size: 63, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:44:29,975 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-18 12:44:34,817 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 22 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-18 12:44:41,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3896990.0, ans=0.125 2024-08-18 12:44:51,083 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.987e+01 2.420e+01 2.665e+01 2.949e+01 2.105e+02, threshold=5.329e+01, percent-clipped=1.0 2024-08-18 12:45:03,649 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 3100, loss[loss=0.1102, beats_loss=0.01035, ecapa_loss=0.0001483, whisper_loss=0.09834, over 20150.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01046, ecapa_loss=0.0001447, whisper_loss=0.09116, over 3928952.92 frames. ], batch size: 81, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:45:07,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3897190.0, ans=0.0 2024-08-18 12:45:21,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3897290.0, ans=0.1 2024-08-18 12:45:23,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3897290.0, ans=0.0 2024-08-18 12:45:28,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3897390.0, ans=0.125 2024-08-18 12:45:30,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3897390.0, ans=0.125 2024-08-18 12:45:36,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3897390.0, ans=0.2 2024-08-18 12:45:47,183 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 21 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-18 12:46:06,131 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 3150, loss[loss=0.09333, beats_loss=0.01287, ecapa_loss=0.000127, whisper_loss=0.07919, over 16318.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01055, ecapa_loss=0.0001447, whisper_loss=0.09084, over 3912857.46 frames. ], batch size: 66, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:46:07,490 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 41 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-18 12:46:11,265 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 26 from LS+wenet, 19 from Vox, 14 fro AS 2024-08-18 12:46:20,027 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 26 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-18 12:46:21,508 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 13 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-18 12:46:23,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3897790.0, ans=0.0 2024-08-18 12:46:25,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3897790.0, ans=0.125 2024-08-18 12:46:45,720 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-18 12:46:55,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3898090.0, ans=0.0 2024-08-18 12:46:56,621 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.328e+01 2.486e+01 2.838e+01 3.960e+01, threshold=4.973e+01, percent-clipped=0.0 2024-08-18 12:46:56,784 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 24 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-18 12:47:08,935 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 3200, loss[loss=0.05883, beats_loss=0.014, ecapa_loss=0.0001329, whisper_loss=0.0435, over 16636.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0105, ecapa_loss=0.0001457, whisper_loss=0.09088, over 3863721.87 frames. ], batch size: 69, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:47:28,072 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.51 vs. limit=15.0 2024-08-18 12:47:32,664 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-18 12:47:36,675 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-18 12:47:40,277 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 22 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-18 12:47:49,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3898490.0, ans=0.125 2024-08-18 12:47:50,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3898490.0, ans=0.125 2024-08-18 12:47:50,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3898490.0, ans=0.125 2024-08-18 12:47:55,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=3898490.0, ans=15.0 2024-08-18 12:47:58,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3898590.0, ans=0.07 2024-08-18 12:48:11,351 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 3250, loss[loss=0.0957, beats_loss=0.01058, ecapa_loss=0.0001521, whisper_loss=0.0836, over 16899.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01047, ecapa_loss=0.0001459, whisper_loss=0.09143, over 3886707.57 frames. ], batch size: 68, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:48:17,926 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 19 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-18 12:48:30,686 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.09 vs. limit=15.0 2024-08-18 12:48:34,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3898790.0, ans=0.0 2024-08-18 12:48:42,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3898890.0, ans=0.2 2024-08-18 12:49:00,904 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.726e+01 2.280e+01 2.573e+01 2.891e+01 1.155e+02, threshold=5.145e+01, percent-clipped=3.0 2024-08-18 12:49:02,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3899090.0, ans=0.0 2024-08-18 12:49:05,904 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 23 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-18 12:49:12,242 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-18 12:49:13,357 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 3300, loss[loss=0.103, beats_loss=0.01115, ecapa_loss=0.0001247, whisper_loss=0.09063, over 19113.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01052, ecapa_loss=0.0001451, whisper_loss=0.09116, over 3862171.66 frames. ], batch size: 76, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:49:14,601 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 18 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-18 12:49:16,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3899190.0, ans=0.2 2024-08-18 12:49:24,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3899290.0, ans=0.125 2024-08-18 12:49:26,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3899290.0, ans=0.0 2024-08-18 12:49:39,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3899390.0, ans=0.125 2024-08-18 12:49:41,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3899390.0, ans=0.125 2024-08-18 12:49:47,897 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-18 12:50:02,435 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 15 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-18 12:50:05,051 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 36 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-18 12:50:15,142 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 3350, loss[loss=0.1148, beats_loss=0.009832, ecapa_loss=0.0001445, whisper_loss=0.1035, over 22910.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01056, ecapa_loss=0.000145, whisper_loss=0.09082, over 3846581.42 frames. ], batch size: 89, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:50:20,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3899690.0, ans=0.125 2024-08-18 12:50:28,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3899790.0, ans=0.5 2024-08-18 12:50:47,333 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 24 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-18 12:51:04,245 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.820e+01 2.398e+01 2.647e+01 2.975e+01 4.321e+02, threshold=5.295e+01, percent-clipped=5.0 2024-08-18 12:51:16,213 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 3400, loss[loss=0.09287, beats_loss=0.01391, ecapa_loss=0.0001204, whisper_loss=0.07775, over 21750.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01054, ecapa_loss=0.0001443, whisper_loss=0.09089, over 3842470.14 frames. ], batch size: 88, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:51:18,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3900190.0, ans=0.125 2024-08-18 12:51:20,960 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-18 12:51:21,736 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.89 vs. limit=8.0 2024-08-18 12:51:29,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3900290.0, ans=0.1 2024-08-18 12:51:46,742 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.49 vs. limit=15.0 2024-08-18 12:52:02,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3900490.0, ans=0.2 2024-08-18 12:52:03,610 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-18 12:52:16,815 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 3450, loss[loss=0.07786, beats_loss=0.01265, ecapa_loss=0.0001586, whisper_loss=0.06363, over 15282.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01053, ecapa_loss=0.0001449, whisper_loss=0.09103, over 3845881.69 frames. ], batch size: 65, lr: 2.28e-03, grad_scale: 1.152921504606847e+18 2024-08-18 12:52:27,992 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 16 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-18 12:52:49,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3900890.0, ans=0.2 2024-08-18 12:52:51,684 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-18 12:53:02,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3900990.0, ans=0.0 2024-08-18 12:53:03,762 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-18 12:53:07,360 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.48 vs. limit=15.0 2024-08-18 12:53:10,846 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.235e+01 2.457e+01 2.725e+01 3.914e+01, threshold=4.915e+01, percent-clipped=0.0 2024-08-18 12:53:21,409 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 15 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-18 12:53:21,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3901090.0, ans=0.1 2024-08-18 12:53:24,875 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 27 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-18 12:53:25,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3901090.0, ans=0.0 2024-08-18 12:53:29,000 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 3500, loss[loss=0.09456, beats_loss=0.01207, ecapa_loss=0.0001416, whisper_loss=0.08108, over 19190.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01051, ecapa_loss=0.0001461, whisper_loss=0.09037, over 3841937.80 frames. ], batch size: 78, lr: 2.28e-03, grad_scale: 1.152921504606847e+18 2024-08-18 12:53:29,140 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 18 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-18 12:53:44,362 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-18 12:54:21,511 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 12:54:25,796 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-18 12:54:47,433 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 3550, loss[loss=0.1104, beats_loss=0.008944, ecapa_loss=0.000137, whisper_loss=0.1001, over 22418.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01052, ecapa_loss=0.0001447, whisper_loss=0.09053, over 3849822.25 frames. ], batch size: 87, lr: 2.28e-03, grad_scale: 1.152921504606847e+18 2024-08-18 12:54:54,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3901690.0, ans=0.125 2024-08-18 12:54:56,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3901690.0, ans=0.125 2024-08-18 12:54:57,721 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 23 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-18 12:55:09,917 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-18 12:55:10,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3901790.0, ans=0.0 2024-08-18 12:55:15,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3901790.0, ans=0.125 2024-08-18 12:55:25,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3901890.0, ans=0.0 2024-08-18 12:55:39,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3901990.0, ans=0.125 2024-08-18 12:55:47,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3902090.0, ans=0.1 2024-08-18 12:55:50,571 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.348e+01 2.623e+01 2.938e+01 4.839e+01, threshold=5.246e+01, percent-clipped=0.0 2024-08-18 12:55:56,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3902090.0, ans=0.0 2024-08-18 12:56:05,421 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 3600, loss[loss=0.1186, beats_loss=0.009231, ecapa_loss=0.0001701, whisper_loss=0.1077, over 21976.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01053, ecapa_loss=0.0001448, whisper_loss=0.08981, over 3844394.97 frames. ], batch size: 89, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:56:20,126 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 25 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-18 12:56:31,965 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.41 vs. limit=10.0 2024-08-18 12:56:33,568 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-18 12:56:55,319 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.78 vs. limit=15.0 2024-08-18 12:56:58,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3902490.0, ans=0.2 2024-08-18 12:57:14,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3902590.0, ans=0.125 2024-08-18 12:57:23,106 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.78 vs. limit=22.5 2024-08-18 12:57:28,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3902590.0, ans=0.2 2024-08-18 12:57:30,345 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 3650, loss[loss=0.1209, beats_loss=0.01204, ecapa_loss=0.0001131, whisper_loss=0.1078, over 19404.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01048, ecapa_loss=0.0001447, whisper_loss=0.09027, over 3830982.85 frames. ], batch size: 75, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:57:32,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3902690.0, ans=0.2 2024-08-18 12:57:39,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3902690.0, ans=0.1 2024-08-18 12:57:44,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3902790.0, ans=0.125 2024-08-18 12:57:46,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3902790.0, ans=0.5 2024-08-18 12:57:49,780 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.49 vs. limit=22.5 2024-08-18 12:57:53,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3902790.0, ans=0.0 2024-08-18 12:57:53,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3902790.0, ans=0.0 2024-08-18 12:57:59,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3902890.0, ans=0.1 2024-08-18 12:58:06,456 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 36 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-18 12:58:07,920 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 25 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-18 12:58:21,436 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.704e+01 2.261e+01 2.422e+01 2.681e+01 4.543e+01, threshold=4.845e+01, percent-clipped=0.0 2024-08-18 12:58:33,540 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 3700, loss[loss=0.09159, beats_loss=0.01002, ecapa_loss=0.0001701, whisper_loss=0.07987, over 20751.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01044, ecapa_loss=0.0001448, whisper_loss=0.09047, over 3808204.38 frames. ], batch size: 86, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:58:34,837 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 18 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-18 12:58:37,602 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 20 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-18 12:58:52,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3903290.0, ans=0.1 2024-08-18 12:58:54,766 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 32 from LS+wenet, 15 from Vox, 47 fro AS 2024-08-18 12:59:14,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3903490.0, ans=0.1 2024-08-18 12:59:16,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3903490.0, ans=0.2 2024-08-18 12:59:18,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3903490.0, ans=0.0 2024-08-18 12:59:21,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3903490.0, ans=0.2 2024-08-18 12:59:41,515 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 3750, loss[loss=0.1063, beats_loss=0.01112, ecapa_loss=0.0001687, whisper_loss=0.09346, over 14879.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01046, ecapa_loss=0.0001448, whisper_loss=0.09057, over 3803196.51 frames. ], batch size: 59, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:00:04,370 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 13:00:15,773 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.82 vs. limit=15.0 2024-08-18 13:00:45,064 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.286e+01 2.566e+01 2.907e+01 8.320e+01, threshold=5.133e+01, percent-clipped=1.0 2024-08-18 13:00:51,877 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-18 13:00:59,264 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 3800, loss[loss=0.1007, beats_loss=0.01148, ecapa_loss=0.0001534, whisper_loss=0.08768, over 16454.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01046, ecapa_loss=0.0001457, whisper_loss=0.09061, over 3834803.40 frames. ], batch size: 66, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:01:27,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3904290.0, ans=0.0 2024-08-18 13:01:29,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3904390.0, ans=0.125 2024-08-18 13:01:36,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3904390.0, ans=0.125 2024-08-18 13:01:43,178 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3904390.0, ans=0.125 2024-08-18 13:01:55,441 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 23 from LS+wenet, 20 from Vox, 16 fro AS 2024-08-18 13:01:59,564 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-18 13:02:11,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3904590.0, ans=0.0 2024-08-18 13:02:13,863 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 3850, loss[loss=0.09445, beats_loss=0.008688, ecapa_loss=0.0001922, whisper_loss=0.08384, over 17794.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01044, ecapa_loss=0.0001461, whisper_loss=0.09034, over 3841116.40 frames. ], batch size: 74, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:02:23,524 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 26 from LS+wenet, 15 from Vox, 16 fro AS 2024-08-18 13:02:28,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3904790.0, ans=0.2 2024-08-18 13:02:34,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3904790.0, ans=0.0 2024-08-18 13:02:55,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3904890.0, ans=0.125 2024-08-18 13:02:56,249 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 22 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-18 13:03:05,005 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-18 13:03:12,100 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 17 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-18 13:03:14,618 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.318e+01 2.599e+01 3.015e+01 2.305e+02, threshold=5.197e+01, percent-clipped=2.0 2024-08-18 13:03:28,047 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 3900, loss[loss=0.118, beats_loss=0.00907, ecapa_loss=0.0001638, whisper_loss=0.1073, over 21452.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01047, ecapa_loss=0.000145, whisper_loss=0.09083, over 3854366.23 frames. ], batch size: 88, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:03:37,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3905190.0, ans=0.07 2024-08-18 13:03:41,944 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 24 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-18 13:03:46,654 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-18 13:03:49,213 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-18 13:03:50,945 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 30 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-18 13:03:53,421 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 18 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-18 13:04:11,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3905390.0, ans=0.125 2024-08-18 13:04:25,038 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-18 13:04:31,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3905590.0, ans=0.1 2024-08-18 13:04:43,559 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 3950, loss[loss=0.1123, beats_loss=0.009563, ecapa_loss=0.0002196, whisper_loss=0.1006, over 14900.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01036, ecapa_loss=0.0001463, whisper_loss=0.09161, over 3867773.93 frames. ], batch size: 62, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:04:52,781 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3905690.0, ans=0.07 2024-08-18 13:05:03,087 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 15 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-18 13:05:14,259 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.31 vs. limit=10.0 2024-08-18 13:05:28,112 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 17 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-18 13:05:45,145 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.363e+01 2.538e+01 2.990e+01 4.778e+02, threshold=5.076e+01, percent-clipped=2.0 2024-08-18 13:05:48,166 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3906090.0, ans=0.0 2024-08-18 13:05:49,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3906090.0, ans=0.0 2024-08-18 13:05:49,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3906090.0, ans=0.05 2024-08-18 13:05:51,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3906090.0, ans=0.07 2024-08-18 13:05:58,266 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 4000, loss[loss=0.09459, beats_loss=0.009622, ecapa_loss=0.0001892, whisper_loss=0.08308, over 21204.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01042, ecapa_loss=0.0001466, whisper_loss=0.09187, over 3887535.61 frames. ], batch size: 90, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:06:01,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3906190.0, ans=0.125 2024-08-18 13:06:02,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3906190.0, ans=0.09899494936611666 2024-08-18 13:06:08,726 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-18 13:06:23,603 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-18 13:06:30,256 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-18 13:07:08,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3906590.0, ans=0.125 2024-08-18 13:07:14,346 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 4050, loss[loss=0.103, beats_loss=0.01001, ecapa_loss=0.0001857, whisper_loss=0.09117, over 17566.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01035, ecapa_loss=0.0001474, whisper_loss=0.09157, over 3872665.87 frames. ], batch size: 74, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:07:18,766 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 24 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-18 13:07:19,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3906690.0, ans=0.125 2024-08-18 13:07:43,762 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.24 vs. limit=15.0 2024-08-18 13:08:08,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3906990.0, ans=0.125 2024-08-18 13:08:10,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3906990.0, ans=0.125 2024-08-18 13:08:15,986 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.772e+01 2.302e+01 2.524e+01 2.887e+01 1.698e+02, threshold=5.047e+01, percent-clipped=2.0 2024-08-18 13:08:17,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3907090.0, ans=0.125 2024-08-18 13:08:20,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3907090.0, ans=0.125 2024-08-18 13:08:28,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3907190.0, ans=0.1 2024-08-18 13:08:28,890 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 4100, loss[loss=0.1049, beats_loss=0.01016, ecapa_loss=0.0001575, whisper_loss=0.09315, over 21305.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01035, ecapa_loss=0.0001464, whisper_loss=0.09182, over 3878336.74 frames. ], batch size: 90, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:08:35,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=3907190.0, ans=15.0 2024-08-18 13:08:54,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3907290.0, ans=0.125 2024-08-18 13:09:13,130 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 28 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-18 13:09:18,430 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.69 vs. limit=15.0 2024-08-18 13:09:19,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3907490.0, ans=0.125 2024-08-18 13:09:19,500 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.95 vs. limit=15.0 2024-08-18 13:09:21,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3907490.0, ans=0.2 2024-08-18 13:09:32,902 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2024-08-18 13:09:40,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3907590.0, ans=0.125 2024-08-18 13:09:45,566 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 4150, loss[loss=0.09943, beats_loss=0.009189, ecapa_loss=0.0001188, whisper_loss=0.08905, over 14637.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01034, ecapa_loss=0.000147, whisper_loss=0.09197, over 3900668.98 frames. ], batch size: 55, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:10:09,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3907790.0, ans=0.2 2024-08-18 13:10:35,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3907990.0, ans=0.2 2024-08-18 13:10:38,061 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 22 from LS+wenet, 10 from Vox, 21 fro AS 2024-08-18 13:10:41,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3907990.0, ans=0.0 2024-08-18 13:10:44,880 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.199e+01 2.501e+01 2.835e+01 5.919e+01, threshold=5.001e+01, percent-clipped=1.0 2024-08-18 13:10:51,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3908090.0, ans=0.07 2024-08-18 13:10:56,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3908090.0, ans=0.125 2024-08-18 13:10:58,565 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 4200, loss[loss=0.06989, beats_loss=0.01337, ecapa_loss=0.0001517, whisper_loss=0.055, over 14541.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01036, ecapa_loss=0.000147, whisper_loss=0.09192, over 3898311.55 frames. ], batch size: 61, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:11:20,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3908290.0, ans=0.2 2024-08-18 13:11:25,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3908290.0, ans=0.2 2024-08-18 13:11:34,085 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-18 13:11:35,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3908390.0, ans=0.2 2024-08-18 13:11:47,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3908490.0, ans=0.2 2024-08-18 13:12:03,322 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.25 vs. limit=10.0 2024-08-18 13:12:08,511 WARNING [optim.py:496] (0/4) Scaling gradients by 0.06405556201934814, model_norm_threshold=50.014076232910156 2024-08-18 13:12:08,675 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.0.norm.log_scale with proportion 0.31, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.868e+05, grad_sumsq=1.868e+05, orig_rms_sq=1.000e+00 2024-08-18 13:12:14,231 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 4250, loss[loss=0.0826, beats_loss=0.0113, ecapa_loss=0.0001132, whisper_loss=0.07017, over 16727.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01048, ecapa_loss=0.0001454, whisper_loss=0.0908, over 3895652.96 frames. ], batch size: 65, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:12:24,606 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.45 vs. limit=15.0 2024-08-18 13:12:28,858 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 12 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-18 13:12:36,877 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-18 13:12:41,287 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.66 vs. limit=22.5 2024-08-18 13:12:46,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3908890.0, ans=0.0 2024-08-18 13:12:46,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3908890.0, ans=0.1 2024-08-18 13:12:47,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3908890.0, ans=0.07 2024-08-18 13:13:16,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3909090.0, ans=0.125 2024-08-18 13:13:17,493 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.252e+01 2.550e+01 2.768e+01 7.808e+02, threshold=5.101e+01, percent-clipped=1.0 2024-08-18 13:13:19,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3909090.0, ans=0.125 2024-08-18 13:13:31,595 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 4300, loss[loss=0.1246, beats_loss=0.009405, ecapa_loss=0.0001656, whisper_loss=0.1136, over 20376.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0105, ecapa_loss=0.0001453, whisper_loss=0.09008, over 3884065.84 frames. ], batch size: 83, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:13:35,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3909190.0, ans=0.125 2024-08-18 13:13:35,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3909190.0, ans=0.04949747468305833 2024-08-18 13:13:38,311 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 20 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-18 13:13:41,345 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 28 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-18 13:13:41,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3909190.0, ans=0.0 2024-08-18 13:13:46,084 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 20 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-18 13:13:51,901 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 17 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-18 13:13:59,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3909290.0, ans=0.125 2024-08-18 13:14:02,924 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.16 vs. limit=15.0 2024-08-18 13:14:11,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3909390.0, ans=0.1 2024-08-18 13:14:17,132 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 23 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-18 13:14:24,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3909490.0, ans=0.1 2024-08-18 13:14:28,906 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 20 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-18 13:14:32,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3909590.0, ans=0.025 2024-08-18 13:14:47,992 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 4350, loss[loss=0.1005, beats_loss=0.01028, ecapa_loss=0.0001521, whisper_loss=0.08873, over 22751.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01048, ecapa_loss=0.0001451, whisper_loss=0.09011, over 3892098.44 frames. ], batch size: 95, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:14:48,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3909690.0, ans=0.0 2024-08-18 13:14:49,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3909690.0, ans=0.2 2024-08-18 13:15:09,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3909790.0, ans=0.125 2024-08-18 13:15:44,275 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-18 13:15:51,271 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.872e+01 2.300e+01 2.560e+01 2.936e+01 6.147e+01, threshold=5.120e+01, percent-clipped=1.0 2024-08-18 13:16:05,366 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 4400, loss[loss=0.1043, beats_loss=0.0116, ecapa_loss=0.0001488, whisper_loss=0.09126, over 16324.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01042, ecapa_loss=0.0001463, whisper_loss=0.09041, over 3878268.22 frames. ], batch size: 69, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:16:07,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3910190.0, ans=0.2 2024-08-18 13:16:15,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3910190.0, ans=0.2 2024-08-18 13:16:20,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3910290.0, ans=0.1 2024-08-18 13:16:25,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3910290.0, ans=0.125 2024-08-18 13:16:32,414 WARNING [optim.py:496] (0/4) Scaling gradients by 0.03998252749443054, model_norm_threshold=51.19541549682617 2024-08-18 13:16:32,578 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.24, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.970e+05, grad_sumsq=3.970e+05, orig_rms_sq=1.000e+00 2024-08-18 13:16:47,575 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.68 vs. limit=15.0 2024-08-18 13:16:57,485 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.87 vs. limit=10.0 2024-08-18 13:17:12,382 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.05 vs. limit=15.0 2024-08-18 13:17:19,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3910590.0, ans=0.0 2024-08-18 13:17:24,182 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 4450, loss[loss=0.1036, beats_loss=0.01183, ecapa_loss=0.0001261, whisper_loss=0.09049, over 23306.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01051, ecapa_loss=0.0001457, whisper_loss=0.09063, over 3880122.46 frames. ], batch size: 94, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:17:34,354 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 20 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-18 13:17:44,750 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 17 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-18 13:17:47,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3910790.0, ans=0.125 2024-08-18 13:17:49,254 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-18 13:17:49,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3910790.0, ans=0.0 2024-08-18 13:17:52,757 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-18 13:18:11,317 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-18 13:18:19,177 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-18 13:18:30,113 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.358e+01 2.721e+01 3.082e+01 1.280e+03, threshold=5.441e+01, percent-clipped=5.0 2024-08-18 13:18:38,290 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2024-08-18 13:18:40,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3911090.0, ans=0.0 2024-08-18 13:18:43,458 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 4500, loss[loss=0.1268, beats_loss=0.007672, ecapa_loss=0.000171, whisper_loss=0.1174, over 20309.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01053, ecapa_loss=0.0001452, whisper_loss=0.09052, over 3918514.94 frames. ], batch size: 81, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:18:53,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3911190.0, ans=0.95 2024-08-18 13:18:55,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3911190.0, ans=0.0 2024-08-18 13:19:30,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3911490.0, ans=0.2 2024-08-18 13:19:32,566 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 28 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-18 13:19:42,598 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=19.01 vs. limit=15.0 2024-08-18 13:19:57,557 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-18 13:20:00,651 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 4550, loss[loss=0.08609, beats_loss=0.0121, ecapa_loss=0.0001548, whisper_loss=0.07245, over 21264.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01055, ecapa_loss=0.0001461, whisper_loss=0.08958, over 3886213.37 frames. ], batch size: 90, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:20:11,888 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 21 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-18 13:20:16,871 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.61 vs. limit=10.0 2024-08-18 13:20:30,223 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 22 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-18 13:20:49,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3911990.0, ans=0.125 2024-08-18 13:21:03,584 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.646e+01 2.293e+01 2.530e+01 2.882e+01 1.902e+02, threshold=5.061e+01, percent-clipped=1.0 2024-08-18 13:21:10,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3912090.0, ans=0.125 2024-08-18 13:21:11,818 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-18 13:21:15,202 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 29 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-18 13:21:16,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3912190.0, ans=0.0 2024-08-18 13:21:17,460 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 4600, loss[loss=0.1162, beats_loss=0.01113, ecapa_loss=0.0001519, whisper_loss=0.1036, over 22856.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01054, ecapa_loss=0.0001449, whisper_loss=0.09, over 3897311.99 frames. ], batch size: 93, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:21:19,719 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-18 13:21:35,182 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2024-08-18 13:21:39,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3912290.0, ans=0.0 2024-08-18 13:21:56,083 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 26 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-18 13:21:59,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3912390.0, ans=0.0 2024-08-18 13:22:05,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3912490.0, ans=0.125 2024-08-18 13:22:27,715 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 21 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-18 13:22:32,243 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 19 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-18 13:22:33,917 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 4650, loss[loss=0.08213, beats_loss=0.01198, ecapa_loss=0.0001632, whisper_loss=0.06852, over 20939.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01053, ecapa_loss=0.0001462, whisper_loss=0.08985, over 3898461.51 frames. ], batch size: 88, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:22:34,057 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 28 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-18 13:22:45,281 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 24 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-18 13:22:55,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3912790.0, ans=0.125 2024-08-18 13:23:18,862 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 26 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-18 13:23:24,849 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 25 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-18 13:23:35,603 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.611e+01 2.195e+01 2.467e+01 2.773e+01 3.878e+01, threshold=4.935e+01, percent-clipped=0.0 2024-08-18 13:23:36,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3913090.0, ans=0.0 2024-08-18 13:23:40,504 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 27 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-18 13:23:49,347 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 4700, loss[loss=0.09122, beats_loss=0.01037, ecapa_loss=0.0001712, whisper_loss=0.07913, over 20751.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01053, ecapa_loss=0.0001457, whisper_loss=0.08966, over 3895340.59 frames. ], batch size: 92, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:23:54,354 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 21 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-18 13:23:54,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3913190.0, ans=0.125 2024-08-18 13:24:25,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3913390.0, ans=0.1 2024-08-18 13:24:40,195 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-18 13:24:50,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3913590.0, ans=0.125 2024-08-18 13:24:55,486 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 21 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-18 13:24:57,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3913590.0, ans=0.125 2024-08-18 13:25:02,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3913590.0, ans=0.1 2024-08-18 13:25:05,712 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 4750, loss[loss=0.09072, beats_loss=0.01076, ecapa_loss=0.0001559, whisper_loss=0.0784, over 17276.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01057, ecapa_loss=0.0001457, whisper_loss=0.08944, over 3870193.59 frames. ], batch size: 71, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:25:07,708 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.56 vs. limit=15.0 2024-08-18 13:25:20,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3913790.0, ans=0.0 2024-08-18 13:25:33,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3913790.0, ans=0.1 2024-08-18 13:25:51,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3913990.0, ans=0.0 2024-08-18 13:26:04,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3913990.0, ans=0.125 2024-08-18 13:26:04,178 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3913990.0, ans=0.125 2024-08-18 13:26:04,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3913990.0, ans=0.0 2024-08-18 13:26:04,300 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.00 vs. limit=15.0 2024-08-18 13:26:07,652 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.735e+01 2.285e+01 2.505e+01 2.813e+01 4.108e+01, threshold=5.010e+01, percent-clipped=0.0 2024-08-18 13:26:21,769 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 4800, loss[loss=0.1185, beats_loss=0.009265, ecapa_loss=0.00017, whisper_loss=0.1076, over 14515.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01058, ecapa_loss=0.0001469, whisper_loss=0.08959, over 3916797.97 frames. ], batch size: 60, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:26:23,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3914190.0, ans=0.1 2024-08-18 13:26:55,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3914390.0, ans=0.0 2024-08-18 13:27:37,900 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 4850, loss[loss=0.08792, beats_loss=0.008642, ecapa_loss=0.0001795, whisper_loss=0.07748, over 18800.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0107, ecapa_loss=0.0001455, whisper_loss=0.08903, over 3925629.73 frames. ], batch size: 78, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:27:45,219 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 16 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-18 13:27:51,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3914790.0, ans=0.2 2024-08-18 13:27:58,990 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 20 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-18 13:28:02,014 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-18 13:28:14,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3914890.0, ans=0.125 2024-08-18 13:28:26,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3914990.0, ans=0.125 2024-08-18 13:28:26,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3914990.0, ans=0.1 2024-08-18 13:28:35,920 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.142e+01 2.395e+01 2.645e+01 2.966e+01 4.545e+01, threshold=5.290e+01, percent-clipped=0.0 2024-08-18 13:28:48,456 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 4900, loss[loss=0.09815, beats_loss=0.01185, ecapa_loss=0.0001321, whisper_loss=0.08498, over 19596.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01065, ecapa_loss=0.000145, whisper_loss=0.0895, over 3901307.52 frames. ], batch size: 79, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:28:59,065 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 18 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-18 13:29:02,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3915290.0, ans=0.125 2024-08-18 13:29:21,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3915390.0, ans=0.0 2024-08-18 13:29:33,348 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 38 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-18 13:29:51,365 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.07 vs. limit=15.0 2024-08-18 13:29:58,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3915590.0, ans=0.0 2024-08-18 13:30:05,157 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 4950, loss[loss=0.1044, beats_loss=0.008916, ecapa_loss=0.0001483, whisper_loss=0.09401, over 18610.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01065, ecapa_loss=0.0001442, whisper_loss=0.08924, over 3859947.70 frames. ], batch size: 71, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:30:11,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3915690.0, ans=0.0 2024-08-18 13:30:23,064 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.04 vs. limit=15.0 2024-08-18 13:30:28,526 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-18 13:30:34,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3915890.0, ans=0.1 2024-08-18 13:30:56,872 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.60 vs. limit=15.0 2024-08-18 13:31:08,364 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.718e+01 2.264e+01 2.536e+01 2.797e+01 4.034e+01, threshold=5.071e+01, percent-clipped=0.0 2024-08-18 13:31:16,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3916090.0, ans=0.0 2024-08-18 13:31:22,247 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 5000, loss[loss=0.1219, beats_loss=0.008914, ecapa_loss=0.0001455, whisper_loss=0.1116, over 19760.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01052, ecapa_loss=0.0001441, whisper_loss=0.09022, over 3842841.26 frames. ], batch size: 77, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:31:29,696 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.07 vs. limit=12.0 2024-08-18 13:31:50,047 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 20 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-18 13:32:01,947 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.288e+05 2024-08-18 13:32:07,874 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.299e+01 2024-08-18 13:32:19,139 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.41 vs. limit=15.0 2024-08-18 13:32:35,407 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 28 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-18 13:32:36,505 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 5050, loss[loss=0.1159, beats_loss=0.01012, ecapa_loss=0.000125, whisper_loss=0.1045, over 19885.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01051, ecapa_loss=0.0001452, whisper_loss=0.09104, over 3846996.13 frames. ], batch size: 74, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:32:50,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3916790.0, ans=0.2 2024-08-18 13:32:57,398 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 18 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-18 13:33:18,827 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 23 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-18 13:33:30,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3916990.0, ans=0.125 2024-08-18 13:33:37,475 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.311e+01 2.560e+01 2.884e+01 4.690e+01, threshold=5.121e+01, percent-clipped=0.0 2024-08-18 13:33:48,763 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 18 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-18 13:33:50,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3917190.0, ans=0.1 2024-08-18 13:33:51,522 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 5100, loss[loss=0.09668, beats_loss=0.009883, ecapa_loss=0.0001732, whisper_loss=0.08506, over 14840.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01056, ecapa_loss=0.0001438, whisper_loss=0.09098, over 3858743.21 frames. ], batch size: 60, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:33:59,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3917190.0, ans=0.0 2024-08-18 13:34:09,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3917290.0, ans=0.1 2024-08-18 13:34:11,108 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.87 vs. limit=22.5 2024-08-18 13:34:18,634 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-18 13:34:22,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3917390.0, ans=0.125 2024-08-18 13:34:26,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3917390.0, ans=0.125 2024-08-18 13:34:33,833 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 13 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-18 13:34:35,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3917490.0, ans=0.125 2024-08-18 13:34:48,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3917490.0, ans=0.0 2024-08-18 13:34:48,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3917490.0, ans=0.1 2024-08-18 13:35:07,222 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 5150, loss[loss=0.1079, beats_loss=0.01096, ecapa_loss=0.0001351, whisper_loss=0.09554, over 21909.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0105, ecapa_loss=0.000143, whisper_loss=0.09098, over 3893169.38 frames. ], batch size: 88, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:35:15,244 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.77 vs. limit=15.0 2024-08-18 13:35:16,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3917690.0, ans=0.0 2024-08-18 13:35:20,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3917790.0, ans=0.125 2024-08-18 13:35:23,822 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-18 13:35:31,704 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.83 vs. limit=10.0 2024-08-18 13:35:33,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3917790.0, ans=0.1 2024-08-18 13:35:41,447 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.43 vs. limit=6.0 2024-08-18 13:35:52,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3917990.0, ans=0.0 2024-08-18 13:36:01,338 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 26 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-18 13:36:05,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3918090.0, ans=0.1 2024-08-18 13:36:08,170 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.267e+01 2.541e+01 2.830e+01 4.847e+01, threshold=5.083e+01, percent-clipped=0.0 2024-08-18 13:36:11,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3918090.0, ans=0.1 2024-08-18 13:36:20,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3918190.0, ans=0.0 2024-08-18 13:36:21,728 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 5200, loss[loss=0.1126, beats_loss=0.009552, ecapa_loss=0.0001168, whisper_loss=0.1019, over 19112.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0105, ecapa_loss=0.0001436, whisper_loss=0.09037, over 3848405.75 frames. ], batch size: 71, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:36:42,701 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 23 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-18 13:37:03,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3918390.0, ans=0.07 2024-08-18 13:37:11,823 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 31 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-18 13:37:12,462 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.50 vs. limit=22.5 2024-08-18 13:37:30,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3918590.0, ans=0.125 2024-08-18 13:37:40,833 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 5250, loss[loss=0.09595, beats_loss=0.0118, ecapa_loss=0.0001482, whisper_loss=0.08266, over 15669.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01048, ecapa_loss=0.0001445, whisper_loss=0.09002, over 3830545.07 frames. ], batch size: 63, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:37:42,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3918690.0, ans=0.125 2024-08-18 13:38:16,552 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 27 from LS+wenet, 11 from Vox, 36 fro AS 2024-08-18 13:38:39,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=3918990.0, ans=15.0 2024-08-18 13:38:40,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3919090.0, ans=0.125 2024-08-18 13:38:42,506 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.766e+01 2.331e+01 2.617e+01 2.849e+01 4.827e+01, threshold=5.234e+01, percent-clipped=0.0 2024-08-18 13:38:55,906 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 5300, loss[loss=0.105, beats_loss=0.008924, ecapa_loss=0.0001236, whisper_loss=0.09484, over 17381.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01052, ecapa_loss=0.0001446, whisper_loss=0.0899, over 3825846.85 frames. ], batch size: 66, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:38:56,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3919190.0, ans=0.125 2024-08-18 13:38:56,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3919190.0, ans=0.1 2024-08-18 13:39:12,829 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.74 vs. limit=15.0 2024-08-18 13:39:34,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3919390.0, ans=0.0 2024-08-18 13:39:37,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3919390.0, ans=0.125 2024-08-18 13:39:47,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3919490.0, ans=0.125 2024-08-18 13:39:56,339 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-18 13:40:12,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3919690.0, ans=0.125 2024-08-18 13:40:13,481 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 5350, loss[loss=0.09204, beats_loss=0.01327, ecapa_loss=9.775e-05, whisper_loss=0.07779, over 18116.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01051, ecapa_loss=0.0001442, whisper_loss=0.08948, over 3848978.84 frames. ], batch size: 70, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:40:22,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3919690.0, ans=0.125 2024-08-18 13:40:55,954 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-18 13:40:57,389 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-392000.pt 2024-08-18 13:41:14,884 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.240e+01 2.441e+01 2.747e+01 4.165e+01, threshold=4.882e+01, percent-clipped=0.0 2024-08-18 13:41:25,454 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.16 vs. limit=15.0 2024-08-18 13:41:27,484 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 5400, loss[loss=0.09528, beats_loss=0.01073, ecapa_loss=0.0001313, whisper_loss=0.08323, over 22767.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01052, ecapa_loss=0.0001435, whisper_loss=0.08955, over 3869891.93 frames. ], batch size: 92, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:41:32,681 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.55 vs. limit=10.0 2024-08-18 13:41:53,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3920290.0, ans=0.2 2024-08-18 13:42:01,790 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-18 13:42:05,627 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 21 from LS+wenet, 29 from Vox, 42 fro AS 2024-08-18 13:42:12,873 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-18 13:42:15,296 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 11 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-18 13:42:15,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3920490.0, ans=0.1 2024-08-18 13:42:20,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3920590.0, ans=0.125 2024-08-18 13:42:30,903 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-18 13:42:34,003 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-18 13:42:35,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3920690.0, ans=0.125 2024-08-18 13:42:36,640 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 5450, loss[loss=0.1069, beats_loss=0.009606, ecapa_loss=0.0001225, whisper_loss=0.09602, over 20288.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01051, ecapa_loss=0.0001428, whisper_loss=0.08947, over 3895859.72 frames. ], batch size: 77, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:42:42,285 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-18 13:42:49,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3920790.0, ans=0.125 2024-08-18 13:42:52,175 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 17 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-18 13:42:53,074 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=17.71 vs. limit=15.0 2024-08-18 13:42:56,951 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.71 vs. limit=15.0 2024-08-18 13:42:57,848 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2024-08-18 13:42:58,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3920790.0, ans=0.0 2024-08-18 13:43:00,723 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.14 vs. limit=22.5 2024-08-18 13:43:03,197 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 21 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-18 13:43:07,722 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=15.0 2024-08-18 13:43:10,206 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.62 vs. limit=15.0 2024-08-18 13:43:17,151 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.31 vs. limit=15.0 2024-08-18 13:43:21,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3920990.0, ans=0.0 2024-08-18 13:43:34,112 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.277e+01 2.488e+01 2.860e+01 4.810e+01, threshold=4.975e+01, percent-clipped=0.0 2024-08-18 13:43:48,614 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 5500, loss[loss=0.08632, beats_loss=0.01155, ecapa_loss=0.0001456, whisper_loss=0.07331, over 19821.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0105, ecapa_loss=0.0001434, whisper_loss=0.08943, over 3898630.44 frames. ], batch size: 77, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:44:12,221 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-18 13:44:27,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3921390.0, ans=0.125 2024-08-18 13:44:31,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3921390.0, ans=0.0 2024-08-18 13:45:02,788 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 18 from LS+wenet, 9 from Vox, 27 fro AS 2024-08-18 13:45:04,020 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 5550, loss[loss=0.1104, beats_loss=0.01113, ecapa_loss=0.0001088, whisper_loss=0.09819, over 14808.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01039, ecapa_loss=0.0001446, whisper_loss=0.09051, over 3917749.11 frames. ], batch size: 54, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:45:15,058 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.70 vs. limit=22.5 2024-08-18 13:45:18,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3921690.0, ans=0.125 2024-08-18 13:45:26,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3921790.0, ans=0.0 2024-08-18 13:45:27,055 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2024-08-18 13:45:45,165 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.16 vs. limit=6.0 2024-08-18 13:46:08,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3922090.0, ans=0.125 2024-08-18 13:46:11,123 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.872e+01 2.360e+01 2.583e+01 2.976e+01 1.161e+02, threshold=5.166e+01, percent-clipped=2.0 2024-08-18 13:46:22,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3922090.0, ans=0.125 2024-08-18 13:46:25,031 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 5600, loss[loss=0.125, beats_loss=0.00966, ecapa_loss=0.0001343, whisper_loss=0.114, over 16497.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01036, ecapa_loss=0.0001452, whisper_loss=0.09147, over 3941488.63 frames. ], batch size: 63, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 13:46:27,977 WARNING [optim.py:496] (0/4) Scaling gradients by 0.0688062459230423, model_norm_threshold=51.66341781616211 2024-08-18 13:46:28,180 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.17, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=9.409e+04, grad_sumsq=9.409e+04, orig_rms_sq=1.000e+00 2024-08-18 13:46:41,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3922290.0, ans=0.0 2024-08-18 13:46:43,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3922290.0, ans=0.125 2024-08-18 13:46:52,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3922290.0, ans=0.125 2024-08-18 13:46:54,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3922390.0, ans=0.1 2024-08-18 13:46:58,419 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 12 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-18 13:47:14,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3922490.0, ans=0.05 2024-08-18 13:47:20,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3922490.0, ans=0.125 2024-08-18 13:47:40,538 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 5650, loss[loss=0.08639, beats_loss=0.01127, ecapa_loss=0.0001509, whisper_loss=0.07361, over 19546.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01043, ecapa_loss=0.0001455, whisper_loss=0.09096, over 3955782.03 frames. ], batch size: 81, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 13:47:50,514 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 21 from LS+wenet, 26 from Vox, 45 fro AS 2024-08-18 13:47:50,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3922690.0, ans=0.07 2024-08-18 13:48:17,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3922890.0, ans=0.1 2024-08-18 13:48:25,290 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-18 13:48:36,941 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 25 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-18 13:48:45,595 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.965e+01 2.361e+01 2.628e+01 2.997e+01 7.509e+02, threshold=5.255e+01, percent-clipped=3.0 2024-08-18 13:48:57,148 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.122e+01 2024-08-18 13:48:58,143 WARNING [optim.py:496] (0/4) Scaling gradients by 0.08530262112617493, model_norm_threshold=52.552433013916016 2024-08-18 13:48:58,306 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.1.norm.log_scale with proportion 0.15, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.807e+04, grad_sumsq=5.807e+04, orig_rms_sq=1.000e+00 2024-08-18 13:48:58,330 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 5700, loss[loss=0.08809, beats_loss=0.01186, ecapa_loss=0.0001494, whisper_loss=0.07473, over 22363.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01046, ecapa_loss=0.0001457, whisper_loss=0.09107, over 3972820.24 frames. ], batch size: 92, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 13:49:23,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3923290.0, ans=0.125 2024-08-18 13:49:31,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3923390.0, ans=0.015 2024-08-18 13:49:52,674 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-18 13:50:15,609 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 5750, loss[loss=0.09019, beats_loss=0.012, ecapa_loss=0.0001365, whisper_loss=0.07682, over 20285.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01048, ecapa_loss=0.0001457, whisper_loss=0.09064, over 3960726.75 frames. ], batch size: 81, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 13:50:21,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3923690.0, ans=0.125 2024-08-18 13:50:29,404 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 11 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-18 13:50:29,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3923790.0, ans=0.125 2024-08-18 13:50:32,446 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.279e+01 2024-08-18 13:50:33,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3923790.0, ans=0.125 2024-08-18 13:50:39,501 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-18 13:50:41,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3923790.0, ans=0.2 2024-08-18 13:50:57,185 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-18 13:50:58,294 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-18 13:51:00,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3923990.0, ans=0.125 2024-08-18 13:51:06,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3923990.0, ans=0.0 2024-08-18 13:51:16,766 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.280e+01 2.579e+01 2.799e+01 6.161e+02, threshold=5.158e+01, percent-clipped=2.0 2024-08-18 13:51:22,499 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.37 vs. limit=15.0 2024-08-18 13:51:28,384 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 5800, loss[loss=0.1067, beats_loss=0.00901, ecapa_loss=0.000147, whisper_loss=0.09617, over 19863.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01053, ecapa_loss=0.0001451, whisper_loss=0.09006, over 3893999.74 frames. ], batch size: 79, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 13:51:35,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3924190.0, ans=15.0 2024-08-18 13:51:46,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3924290.0, ans=0.0 2024-08-18 13:52:14,832 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.86 vs. limit=15.0 2024-08-18 13:52:21,217 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 19 from LS+wenet, 31 from Vox, 39 fro AS 2024-08-18 13:52:32,194 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.61 vs. limit=15.0 2024-08-18 13:52:41,892 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 5850, loss[loss=0.08697, beats_loss=0.01145, ecapa_loss=0.000166, whisper_loss=0.07385, over 17342.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01062, ecapa_loss=0.0001456, whisper_loss=0.08981, over 3912736.72 frames. ], batch size: 75, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 13:52:43,596 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 26 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-18 13:52:57,756 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.90 vs. limit=22.5 2024-08-18 13:53:02,057 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-18 13:53:03,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3924790.0, ans=0.04949747468305833 2024-08-18 13:53:28,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3924990.0, ans=0.125 2024-08-18 13:53:45,903 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.710e+01 2.307e+01 2.558e+01 2.890e+01 4.953e+01, threshold=5.117e+01, percent-clipped=0.0 2024-08-18 13:53:56,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3925090.0, ans=0.0 2024-08-18 13:53:56,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3925090.0, ans=0.0 2024-08-18 13:54:00,989 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 5900, loss[loss=0.09913, beats_loss=0.01153, ecapa_loss=9.722e-05, whisper_loss=0.08663, over 20917.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01056, ecapa_loss=0.0001457, whisper_loss=0.08962, over 3892805.58 frames. ], batch size: 78, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 13:54:01,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3925190.0, ans=0.0 2024-08-18 13:54:28,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3925290.0, ans=0.125 2024-08-18 13:54:37,989 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 31 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-18 13:54:49,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3925490.0, ans=0.2 2024-08-18 13:55:05,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3925590.0, ans=0.1 2024-08-18 13:55:14,816 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-18 13:55:20,940 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 5950, loss[loss=0.09996, beats_loss=0.01159, ecapa_loss=0.0001497, whisper_loss=0.08687, over 17749.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01056, ecapa_loss=0.0001456, whisper_loss=0.0898, over 3901359.49 frames. ], batch size: 71, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 13:55:49,674 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-18 13:55:58,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3925890.0, ans=0.0 2024-08-18 13:56:06,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3925890.0, ans=0.125 2024-08-18 13:56:24,955 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.637e+01 2.192e+01 2.474e+01 2.892e+01 4.028e+01, threshold=4.947e+01, percent-clipped=0.0 2024-08-18 13:56:27,890 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 20 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-18 13:56:38,127 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 6000, loss[loss=0.09079, beats_loss=0.01097, ecapa_loss=0.0001381, whisper_loss=0.07844, over 19582.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01062, ecapa_loss=0.0001448, whisper_loss=0.0899, over 3901840.92 frames. ], batch size: 80, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 13:56:38,128 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-18 13:57:15,573 INFO [train_multi_KD3.py:1149] (0/4) Epoch 27, validation on ASR_libri: loss=0.2529, beats_loss=0, ecapa_loss=0.0005218, whisper_loss=0.2477, over 922467.00 frames. 2024-08-18 13:57:30,189 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.8605, 3.6289, 2.7067, 3.2212], device='cuda:0') 2024-08-18 13:57:32,487 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.2062, 1.3387, 1.7359, 0.8933, 1.3375, 1.8343, 1.3180, 1.3173], device='cuda:0') 2024-08-18 13:57:34,401 INFO [train_multi_KD3.py:1149] (0/4) Epoch 27, validation on SV_voxceleb1: loss=0.004081, beats_loss=0, ecapa_loss=0.0004081, whisper_loss=0, over 939242.00 frames. 2024-08-18 13:57:49,595 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.8930, 3.3746, 2.4681, 3.6604], device='cuda:0') 2024-08-18 13:59:16,579 INFO [train_multi_KD3.py:1149] (0/4) Epoch 27, validation on AT_audioset: loss=0.02317, beats_loss=0.02317, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 13:59:16,584 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32389MB 2024-08-18 13:59:17,999 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 29 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-18 13:59:46,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3926390.0, ans=0.2 2024-08-18 13:59:55,696 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.08 vs. limit=22.5 2024-08-18 14:00:06,843 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-18 14:00:20,864 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-18 14:00:33,921 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 6050, loss[loss=0.1207, beats_loss=0.01017, ecapa_loss=0.0001354, whisper_loss=0.1092, over 14847.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01065, ecapa_loss=0.0001443, whisper_loss=0.09, over 3866832.31 frames. ], batch size: 58, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 14:00:34,102 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-18 14:01:14,936 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.04 vs. limit=15.0 2024-08-18 14:01:30,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3926990.0, ans=0.2 2024-08-18 14:01:31,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3926990.0, ans=0.125 2024-08-18 14:01:38,792 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.321e+01 2.592e+01 2.870e+01 3.846e+01, threshold=5.183e+01, percent-clipped=0.0 2024-08-18 14:01:41,128 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 24 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-18 14:01:45,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3927090.0, ans=0.1 2024-08-18 14:01:53,337 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 6100, loss[loss=0.0857, beats_loss=0.009816, ecapa_loss=0.0001903, whisper_loss=0.07398, over 17920.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0106, ecapa_loss=0.0001447, whisper_loss=0.08974, over 3875280.18 frames. ], batch size: 77, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 14:02:05,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3927190.0, ans=0.1 2024-08-18 14:02:05,638 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.05 vs. limit=6.0 2024-08-18 14:02:35,090 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 26 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-18 14:02:51,902 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 19 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-18 14:02:59,231 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-18 14:03:02,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=3927590.0, ans=0.02 2024-08-18 14:03:10,616 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 6150, loss[loss=0.1083, beats_loss=0.009585, ecapa_loss=0.0001476, whisper_loss=0.09722, over 19323.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01057, ecapa_loss=0.0001455, whisper_loss=0.09011, over 3896998.95 frames. ], batch size: 79, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 14:03:20,685 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 21 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-18 14:03:20,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3927690.0, ans=0.125 2024-08-18 14:03:22,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3927690.0, ans=0.0 2024-08-18 14:03:22,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3927690.0, ans=0.0 2024-08-18 14:03:38,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3927790.0, ans=0.0 2024-08-18 14:03:50,842 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 20 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-18 14:04:00,060 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-18 14:04:01,286 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 24 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-18 14:04:10,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3928090.0, ans=0.1 2024-08-18 14:04:11,604 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.379e+01 2.632e+01 2.794e+01 5.915e+01, threshold=5.263e+01, percent-clipped=2.0 2024-08-18 14:04:18,879 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 36 from LS+wenet, 13 from Vox, 41 fro AS 2024-08-18 14:04:24,776 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 6200, loss[loss=0.0905, beats_loss=0.01, ecapa_loss=0.0001479, whisper_loss=0.07902, over 17617.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01052, ecapa_loss=0.0001457, whisper_loss=0.08963, over 3838787.73 frames. ], batch size: 71, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 14:04:28,883 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 28 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-18 14:04:29,409 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.73 vs. limit=15.0 2024-08-18 14:04:33,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=3928190.0, ans=0.05 2024-08-18 14:04:34,813 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-18 14:04:37,814 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 29 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-18 14:04:38,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3928290.0, ans=0.125 2024-08-18 14:04:44,206 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.23 vs. limit=15.0 2024-08-18 14:04:45,301 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-18 14:04:45,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3928290.0, ans=0.1 2024-08-18 14:04:51,300 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-18 14:04:54,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3928390.0, ans=0.2 2024-08-18 14:04:54,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3928390.0, ans=0.125 2024-08-18 14:05:15,146 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 24 from LS+wenet, 10 from Vox, 28 fro AS 2024-08-18 14:05:33,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3928590.0, ans=0.0 2024-08-18 14:05:43,442 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 6250, loss[loss=0.08323, beats_loss=0.01343, ecapa_loss=0.0001173, whisper_loss=0.06863, over 21957.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01049, ecapa_loss=0.000146, whisper_loss=0.09005, over 3857352.41 frames. ], batch size: 88, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 14:05:52,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3928690.0, ans=0.2 2024-08-18 14:06:02,019 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-18 14:06:14,867 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-18 14:06:22,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3928890.0, ans=0.0 2024-08-18 14:06:23,943 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3928890.0, ans=0.125 2024-08-18 14:06:25,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3928890.0, ans=0.0 2024-08-18 14:06:46,533 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.713e+01 2.311e+01 2.533e+01 2.797e+01 1.821e+02, threshold=5.065e+01, percent-clipped=2.0 2024-08-18 14:06:48,025 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-18 14:06:53,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3929090.0, ans=0.125 2024-08-18 14:06:59,977 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 6300, loss[loss=0.1184, beats_loss=0.008442, ecapa_loss=0.0001592, whisper_loss=0.1084, over 22718.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01047, ecapa_loss=0.0001461, whisper_loss=0.09035, over 3900737.81 frames. ], batch size: 90, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 14:07:06,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3929190.0, ans=0.0 2024-08-18 14:07:15,476 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 21 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-18 14:07:45,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3929490.0, ans=0.125 2024-08-18 14:07:45,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3929490.0, ans=0.0 2024-08-18 14:07:51,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3929490.0, ans=0.125 2024-08-18 14:07:53,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3929490.0, ans=0.125 2024-08-18 14:07:58,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3929490.0, ans=0.0 2024-08-18 14:08:09,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=3929590.0, ans=0.02 2024-08-18 14:08:12,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3929590.0, ans=0.125 2024-08-18 14:08:15,677 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 6350, loss[loss=0.08453, beats_loss=0.01125, ecapa_loss=0.0001474, whisper_loss=0.07181, over 19831.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01053, ecapa_loss=0.0001454, whisper_loss=0.08995, over 3892783.38 frames. ], batch size: 80, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 14:08:21,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3929690.0, ans=0.125 2024-08-18 14:08:25,810 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 12 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-18 14:08:26,606 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2024-08-18 14:08:34,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3929790.0, ans=0.0 2024-08-18 14:08:56,857 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-18 14:09:00,386 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.78 vs. limit=10.0 2024-08-18 14:09:19,615 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.703e+01 2.236e+01 2.432e+01 2.687e+01 3.502e+01, threshold=4.864e+01, percent-clipped=0.0 2024-08-18 14:09:23,393 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 14:09:32,172 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 6400, loss[loss=0.09189, beats_loss=0.0117, ecapa_loss=0.0001364, whisper_loss=0.07882, over 22857.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01059, ecapa_loss=0.000145, whisper_loss=0.09, over 3927910.23 frames. ], batch size: 95, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 14:09:37,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3930190.0, ans=0.125 2024-08-18 14:09:43,850 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.76 vs. limit=15.0 2024-08-18 14:09:59,428 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 19 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-18 14:10:05,486 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 17 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-18 14:10:08,296 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-18 14:10:35,620 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 32 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-18 14:10:36,774 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 21 from LS+wenet, 22 from Vox, 49 fro AS 2024-08-18 14:10:43,464 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.93 vs. limit=15.0 2024-08-18 14:10:45,181 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 6450, loss[loss=0.112, beats_loss=0.01043, ecapa_loss=0.0001204, whisper_loss=0.1004, over 17992.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01057, ecapa_loss=0.0001455, whisper_loss=0.08999, over 3945099.33 frames. ], batch size: 68, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 14:10:58,780 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-18 14:11:09,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3930790.0, ans=0.125 2024-08-18 14:11:19,258 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=15.0 2024-08-18 14:11:21,028 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.49 vs. limit=10.0 2024-08-18 14:11:25,477 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-18 14:11:26,555 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-18 14:11:37,730 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.963e+01 2.379e+01 2.625e+01 2.941e+01 1.011e+02, threshold=5.251e+01, percent-clipped=1.0 2024-08-18 14:11:43,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3931090.0, ans=0.125 2024-08-18 14:11:44,905 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.86 vs. limit=15.0 2024-08-18 14:11:45,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3931090.0, ans=10.0 2024-08-18 14:11:49,620 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 6500, loss[loss=0.1015, beats_loss=0.009572, ecapa_loss=0.0001666, whisper_loss=0.09031, over 21487.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01055, ecapa_loss=0.0001461, whisper_loss=0.09012, over 3929035.00 frames. ], batch size: 88, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 14:11:56,148 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-18 14:11:59,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3931190.0, ans=0.125 2024-08-18 14:12:05,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3931290.0, ans=0.0 2024-08-18 14:12:12,938 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 34 from LS+wenet, 25 from Vox, 18 fro AS 2024-08-18 14:12:19,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3931390.0, ans=0.125 2024-08-18 14:12:20,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3931390.0, ans=0.125 2024-08-18 14:12:24,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3931390.0, ans=0.2 2024-08-18 14:12:50,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3931590.0, ans=0.125 2024-08-18 14:12:52,617 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 6550, loss[loss=0.1001, beats_loss=0.01042, ecapa_loss=0.0001704, whisper_loss=0.08793, over 17757.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01057, ecapa_loss=0.0001457, whisper_loss=0.09014, over 3926260.82 frames. ], batch size: 74, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:13:18,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3931890.0, ans=0.2 2024-08-18 14:13:18,949 WARNING [optim.py:496] (0/4) Scaling gradients by 0.08532015979290009, model_norm_threshold=52.50708770751953 2024-08-18 14:13:19,111 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.11, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.293e+04, grad_sumsq=4.293e+04, orig_rms_sq=1.000e+00 2024-08-18 14:13:22,602 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-18 14:13:25,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3931890.0, ans=0.0 2024-08-18 14:13:30,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3931990.0, ans=0.2 2024-08-18 14:13:45,290 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.883e+01 2.337e+01 2.574e+01 2.944e+01 6.154e+02, threshold=5.148e+01, percent-clipped=1.0 2024-08-18 14:13:46,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3932090.0, ans=0.125 2024-08-18 14:13:52,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3932090.0, ans=0.0 2024-08-18 14:13:55,093 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 6600, loss[loss=0.1107, beats_loss=0.007176, ecapa_loss=0.0001623, whisper_loss=0.1019, over 17292.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01052, ecapa_loss=0.0001467, whisper_loss=0.09031, over 3917764.04 frames. ], batch size: 66, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:14:02,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3932190.0, ans=0.0 2024-08-18 14:14:09,433 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 32 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-18 14:14:12,351 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 27 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-18 14:14:16,037 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-18 14:14:17,309 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-18 14:14:54,309 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-18 14:14:56,436 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 6650, loss[loss=0.1009, beats_loss=0.01091, ecapa_loss=0.0001475, whisper_loss=0.08848, over 22788.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01049, ecapa_loss=0.0001467, whisper_loss=0.09097, over 3926635.13 frames. ], batch size: 92, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:14:59,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3932690.0, ans=0.125 2024-08-18 14:15:20,351 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 14 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-18 14:15:25,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3932890.0, ans=0.04949747468305833 2024-08-18 14:15:40,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3932990.0, ans=0.2 2024-08-18 14:15:40,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3932990.0, ans=0.125 2024-08-18 14:15:49,079 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.331e+01 2.674e+01 2.933e+01 1.002e+02, threshold=5.348e+01, percent-clipped=1.0 2024-08-18 14:15:51,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3933090.0, ans=0.125 2024-08-18 14:15:55,202 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-18 14:15:58,788 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 6700, loss[loss=0.08166, beats_loss=0.01116, ecapa_loss=0.0001565, whisper_loss=0.06893, over 16827.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01052, ecapa_loss=0.0001457, whisper_loss=0.0904, over 3940206.80 frames. ], batch size: 70, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:16:01,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3933190.0, ans=0.125 2024-08-18 14:16:09,600 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-18 14:16:13,518 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 22 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-18 14:16:19,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3933290.0, ans=0.125 2024-08-18 14:16:47,089 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.924e+00 2024-08-18 14:16:50,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3933590.0, ans=0.125 2024-08-18 14:16:52,854 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-18 14:17:02,434 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 6750, loss[loss=0.1112, beats_loss=0.01013, ecapa_loss=0.0001023, whisper_loss=0.1001, over 14870.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01058, ecapa_loss=0.0001449, whisper_loss=0.09001, over 3891068.94 frames. ], batch size: 55, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:17:03,761 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-18 14:17:03,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3933690.0, ans=0.1 2024-08-18 14:17:08,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3933690.0, ans=0.0 2024-08-18 14:17:09,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3933690.0, ans=0.125 2024-08-18 14:17:25,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3933790.0, ans=0.0 2024-08-18 14:17:26,510 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-18 14:17:29,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3933890.0, ans=0.125 2024-08-18 14:17:39,028 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 33 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-18 14:17:40,610 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.46 vs. limit=15.0 2024-08-18 14:17:44,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3933990.0, ans=0.09899494936611666 2024-08-18 14:17:49,087 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 29 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-18 14:17:50,423 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 23 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-18 14:17:55,337 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.453e+01 2.659e+01 2.920e+01 3.778e+02, threshold=5.318e+01, percent-clipped=4.0 2024-08-18 14:18:03,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3934090.0, ans=0.125 2024-08-18 14:18:05,715 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 6800, loss[loss=0.08898, beats_loss=0.01406, ecapa_loss=0.0001002, whisper_loss=0.07392, over 15747.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01049, ecapa_loss=0.0001465, whisper_loss=0.0903, over 3897024.97 frames. ], batch size: 62, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:18:13,082 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-18 14:18:19,443 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-18 14:18:20,673 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-18 14:18:34,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3934390.0, ans=0.125 2024-08-18 14:18:38,504 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-18 14:18:42,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3934490.0, ans=0.0 2024-08-18 14:18:56,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3934590.0, ans=0.0 2024-08-18 14:19:08,900 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.61 vs. limit=15.0 2024-08-18 14:19:09,310 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 6850, loss[loss=0.1081, beats_loss=0.0107, ecapa_loss=0.0001106, whisper_loss=0.09628, over 23468.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01044, ecapa_loss=0.0001461, whisper_loss=0.09064, over 3884052.19 frames. ], batch size: 90, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:19:21,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3934790.0, ans=0.125 2024-08-18 14:19:39,237 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 18 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-18 14:20:00,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3935090.0, ans=0.0 2024-08-18 14:20:03,727 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.935e+01 2.389e+01 2.625e+01 3.075e+01 4.351e+02, threshold=5.250e+01, percent-clipped=2.0 2024-08-18 14:20:13,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3935190.0, ans=0.0 2024-08-18 14:20:14,235 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 6900, loss[loss=0.1025, beats_loss=0.01136, ecapa_loss=0.0001594, whisper_loss=0.08955, over 19737.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01057, ecapa_loss=0.0001451, whisper_loss=0.09056, over 3861323.84 frames. ], batch size: 82, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:20:19,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3935190.0, ans=0.125 2024-08-18 14:20:20,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3935190.0, ans=0.125 2024-08-18 14:20:33,693 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.16 vs. limit=15.0 2024-08-18 14:20:36,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3935290.0, ans=0.05 2024-08-18 14:20:57,435 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-18 14:21:04,082 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-18 14:21:19,060 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 6950, loss[loss=0.09808, beats_loss=0.0103, ecapa_loss=0.0001376, whisper_loss=0.08641, over 13620.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01065, ecapa_loss=0.0001444, whisper_loss=0.08973, over 3877750.50 frames. ], batch size: 53, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:21:23,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=3935690.0, ans=15.0 2024-08-18 14:21:24,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3935690.0, ans=0.0 2024-08-18 14:21:29,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3935690.0, ans=0.125 2024-08-18 14:21:44,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3935890.0, ans=0.2 2024-08-18 14:21:48,287 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 22 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-18 14:21:49,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3935890.0, ans=0.07 2024-08-18 14:22:06,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3935990.0, ans=0.125 2024-08-18 14:22:07,042 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-18 14:22:11,311 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 14:22:12,456 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.310e+01 2.521e+01 2.778e+01 4.175e+02, threshold=5.041e+01, percent-clipped=1.0 2024-08-18 14:22:23,209 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 7000, loss[loss=0.08638, beats_loss=0.01152, ecapa_loss=0.0001524, whisper_loss=0.07333, over 20573.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0106, ecapa_loss=0.0001451, whisper_loss=0.08977, over 3843905.27 frames. ], batch size: 84, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:22:30,501 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 27 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-18 14:22:30,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3936190.0, ans=0.0 2024-08-18 14:22:53,018 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-18 14:22:56,779 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 28 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-18 14:23:03,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3936490.0, ans=0.2 2024-08-18 14:23:25,405 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 7050, loss[loss=0.09265, beats_loss=0.009746, ecapa_loss=0.0001617, whisper_loss=0.08129, over 15886.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01055, ecapa_loss=0.0001451, whisper_loss=0.09016, over 3863449.48 frames. ], batch size: 66, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:23:40,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3936790.0, ans=0.125 2024-08-18 14:23:46,953 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-18 14:24:00,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3936890.0, ans=0.5 2024-08-18 14:24:10,344 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.83 vs. limit=15.0 2024-08-18 14:24:18,933 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.686e+01 2.218e+01 2.427e+01 2.693e+01 4.080e+01, threshold=4.854e+01, percent-clipped=0.0 2024-08-18 14:24:19,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=3937090.0, ans=22.5 2024-08-18 14:24:25,024 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 13 from Vox, 53 fro AS 2024-08-18 14:24:28,592 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 7100, loss[loss=0.1277, beats_loss=0.009657, ecapa_loss=0.0001504, whisper_loss=0.1166, over 23259.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01057, ecapa_loss=0.0001437, whisper_loss=0.0904, over 3861593.04 frames. ], batch size: 92, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:24:40,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3937290.0, ans=0.035 2024-08-18 14:24:43,762 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3937290.0, ans=0.125 2024-08-18 14:24:59,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3937390.0, ans=0.0 2024-08-18 14:25:10,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3937490.0, ans=0.1 2024-08-18 14:25:11,802 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 20 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-18 14:25:13,285 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 26 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-18 14:25:13,698 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.16 vs. limit=12.0 2024-08-18 14:25:30,495 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 7150, loss[loss=0.1163, beats_loss=0.008356, ecapa_loss=0.0001483, whisper_loss=0.1065, over 22622.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01055, ecapa_loss=0.0001431, whisper_loss=0.09036, over 3842816.82 frames. ], batch size: 87, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:25:36,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3937690.0, ans=0.1 2024-08-18 14:25:42,796 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-18 14:25:43,268 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.83 vs. limit=15.0 2024-08-18 14:25:51,659 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 21 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-18 14:26:08,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3937990.0, ans=0.1 2024-08-18 14:26:18,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3937990.0, ans=0.0 2024-08-18 14:26:22,361 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.289e+01 2.542e+01 2.748e+01 4.524e+01, threshold=5.084e+01, percent-clipped=0.0 2024-08-18 14:26:32,326 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 7200, loss[loss=0.1021, beats_loss=0.0116, ecapa_loss=0.0001327, whisper_loss=0.08915, over 19631.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01055, ecapa_loss=0.0001437, whisper_loss=0.09025, over 3856021.05 frames. ], batch size: 80, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:26:36,738 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.70 vs. limit=6.0 2024-08-18 14:26:43,679 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 14:26:54,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3938290.0, ans=0.2 2024-08-18 14:27:03,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3938390.0, ans=10.0 2024-08-18 14:27:05,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3938390.0, ans=0.125 2024-08-18 14:27:14,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3938490.0, ans=0.125 2024-08-18 14:27:19,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3938490.0, ans=0.09899494936611666 2024-08-18 14:27:21,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3938590.0, ans=0.0 2024-08-18 14:27:31,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3938590.0, ans=0.1 2024-08-18 14:27:33,596 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 7250, loss[loss=0.07695, beats_loss=0.01244, ecapa_loss=0.0001502, whisper_loss=0.06301, over 18347.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0105, ecapa_loss=0.0001453, whisper_loss=0.09138, over 3916919.86 frames. ], batch size: 75, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:27:36,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3938690.0, ans=0.05 2024-08-18 14:27:42,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3938690.0, ans=0.125 2024-08-18 14:27:59,914 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 22 from LS+wenet, 10 from Vox, 22 fro AS 2024-08-18 14:28:03,823 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-18 14:28:04,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3938890.0, ans=0.2 2024-08-18 14:28:10,086 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-18 14:28:12,988 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.14 vs. limit=15.0 2024-08-18 14:28:16,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3938990.0, ans=0.09899494936611666 2024-08-18 14:28:26,056 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.318e+01 2.608e+01 2.955e+01 6.690e+01, threshold=5.215e+01, percent-clipped=2.0 2024-08-18 14:28:26,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3939090.0, ans=0.0 2024-08-18 14:28:32,174 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 23 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-18 14:28:35,866 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 7300, loss[loss=0.1005, beats_loss=0.0117, ecapa_loss=0.0001243, whisper_loss=0.08754, over 18129.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01054, ecapa_loss=0.0001463, whisper_loss=0.09032, over 3893107.48 frames. ], batch size: 73, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:28:36,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3939190.0, ans=0.1 2024-08-18 14:28:47,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3939290.0, ans=0.125 2024-08-18 14:28:49,715 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-18 14:28:51,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3939290.0, ans=0.125 2024-08-18 14:28:58,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3939290.0, ans=0.125 2024-08-18 14:28:59,423 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 24 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-18 14:28:59,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=3939390.0, ans=0.05 2024-08-18 14:29:07,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3939390.0, ans=0.125 2024-08-18 14:29:12,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3939490.0, ans=0.2 2024-08-18 14:29:18,923 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 32 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-18 14:29:37,670 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 7350, loss[loss=0.1027, beats_loss=0.009772, ecapa_loss=0.0001211, whisper_loss=0.09172, over 17643.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0105, ecapa_loss=0.000147, whisper_loss=0.09, over 3862144.68 frames. ], batch size: 68, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:29:44,138 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=4.813e+01 2024-08-18 14:29:45,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3939690.0, ans=0.125 2024-08-18 14:29:53,593 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3939790.0, ans=0.0 2024-08-18 14:30:07,364 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-18 14:30:24,416 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.16 vs. limit=22.5 2024-08-18 14:30:26,435 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-18 14:30:29,943 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.344e+01 2.539e+01 2.800e+01 8.685e+01, threshold=5.077e+01, percent-clipped=1.0 2024-08-18 14:30:39,510 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.77 vs. limit=15.0 2024-08-18 14:30:40,014 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 7400, loss[loss=0.1145, beats_loss=0.009298, ecapa_loss=0.0001565, whisper_loss=0.1036, over 20091.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01047, ecapa_loss=0.0001463, whisper_loss=0.09035, over 3885717.63 frames. ], batch size: 81, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:30:41,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3940190.0, ans=0.2 2024-08-18 14:30:41,815 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.95 vs. limit=15.0 2024-08-18 14:31:07,045 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-18 14:31:12,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3940390.0, ans=0.09899494936611666 2024-08-18 14:31:13,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3940390.0, ans=10.0 2024-08-18 14:31:26,839 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.48 vs. limit=15.0 2024-08-18 14:31:33,626 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.69 vs. limit=15.0 2024-08-18 14:31:37,051 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.50 vs. limit=22.5 2024-08-18 14:31:41,549 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 7450, loss[loss=0.09331, beats_loss=0.01136, ecapa_loss=0.0001361, whisper_loss=0.08059, over 15011.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01048, ecapa_loss=0.0001464, whisper_loss=0.09062, over 3884412.24 frames. ], batch size: 58, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:31:44,162 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-18 14:31:44,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3940690.0, ans=0.04949747468305833 2024-08-18 14:31:59,218 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.21 vs. limit=15.0 2024-08-18 14:32:03,612 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 22 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-18 14:32:05,665 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.73 vs. limit=22.5 2024-08-18 14:32:24,654 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 33 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-18 14:32:27,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3940990.0, ans=0.0 2024-08-18 14:32:33,406 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.745e+01 2.339e+01 2.554e+01 2.965e+01 5.290e+01, threshold=5.108e+01, percent-clipped=2.0 2024-08-18 14:32:33,576 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 32 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-18 14:32:40,804 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-18 14:32:43,003 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 7500, loss[loss=0.09456, beats_loss=0.01073, ecapa_loss=0.0001419, whisper_loss=0.08242, over 17001.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01044, ecapa_loss=0.0001443, whisper_loss=0.09097, over 3898423.12 frames. ], batch size: 68, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:32:58,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3941290.0, ans=0.125 2024-08-18 14:33:02,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3941290.0, ans=0.2 2024-08-18 14:33:18,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3941390.0, ans=0.0 2024-08-18 14:33:32,639 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-18 14:33:43,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3941590.0, ans=0.125 2024-08-18 14:33:47,427 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 7550, loss[loss=0.1069, beats_loss=0.01079, ecapa_loss=0.0001468, whisper_loss=0.09465, over 22306.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01039, ecapa_loss=0.0001458, whisper_loss=0.09117, over 3872592.71 frames. ], batch size: 92, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:33:52,610 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.83 vs. limit=15.0 2024-08-18 14:34:00,864 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.16 vs. limit=15.0 2024-08-18 14:34:03,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3941790.0, ans=0.0 2024-08-18 14:34:09,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3941790.0, ans=0.1 2024-08-18 14:34:15,491 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 20 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-18 14:34:30,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3941890.0, ans=0.125 2024-08-18 14:34:30,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=3941890.0, ans=0.02 2024-08-18 14:34:31,993 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.292e+01 2024-08-18 14:34:36,127 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.33 vs. limit=22.5 2024-08-18 14:34:44,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3941990.0, ans=0.0 2024-08-18 14:34:47,293 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 19 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-18 14:34:53,217 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.776e+01 2.267e+01 2.538e+01 2.826e+01 4.465e+01, threshold=5.075e+01, percent-clipped=0.0 2024-08-18 14:35:05,562 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 7600, loss[loss=0.08458, beats_loss=0.01023, ecapa_loss=0.0001563, whisper_loss=0.07279, over 13924.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01044, ecapa_loss=0.0001444, whisper_loss=0.09073, over 3841872.53 frames. ], batch size: 56, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:35:05,731 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-18 14:35:12,482 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-18 14:35:14,631 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.71 vs. limit=15.0 2024-08-18 14:35:38,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3942390.0, ans=0.1 2024-08-18 14:35:57,284 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.58 vs. limit=15.0 2024-08-18 14:36:19,056 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-18 14:36:32,724 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 7650, loss[loss=0.1306, beats_loss=0.007355, ecapa_loss=0.0002034, whisper_loss=0.1212, over 18537.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01042, ecapa_loss=0.0001452, whisper_loss=0.09095, over 3867102.65 frames. ], batch size: 76, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:36:33,009 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 10 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-18 14:36:56,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3942790.0, ans=0.5 2024-08-18 14:37:06,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3942890.0, ans=0.1 2024-08-18 14:37:39,189 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-18 14:37:39,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3943090.0, ans=0.1 2024-08-18 14:37:43,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3943090.0, ans=0.125 2024-08-18 14:37:44,651 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.361e+01 2.566e+01 2.916e+01 1.165e+02, threshold=5.131e+01, percent-clipped=2.0 2024-08-18 14:37:58,386 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 7700, loss[loss=0.1092, beats_loss=0.01035, ecapa_loss=0.000158, whisper_loss=0.09725, over 16783.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01044, ecapa_loss=0.0001451, whisper_loss=0.09064, over 3866044.90 frames. ], batch size: 70, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:38:02,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3943190.0, ans=0.125 2024-08-18 14:38:07,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3943190.0, ans=0.125 2024-08-18 14:38:08,262 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 22 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-18 14:38:24,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3943390.0, ans=0.125 2024-08-18 14:38:28,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3943390.0, ans=0.125 2024-08-18 14:38:33,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3943390.0, ans=0.125 2024-08-18 14:38:47,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3943490.0, ans=0.125 2024-08-18 14:39:03,065 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 7750, loss[loss=0.1041, beats_loss=0.006716, ecapa_loss=0.0001768, whisper_loss=0.09565, over 20284.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01046, ecapa_loss=0.0001444, whisper_loss=0.09022, over 3852439.99 frames. ], batch size: 80, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:39:10,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3943690.0, ans=0.0 2024-08-18 14:39:19,532 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.57 vs. limit=15.0 2024-08-18 14:39:35,564 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.81 vs. limit=12.0 2024-08-18 14:39:36,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3943890.0, ans=0.125 2024-08-18 14:39:37,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3943890.0, ans=0.125 2024-08-18 14:39:52,650 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 18 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-18 14:39:52,925 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.30 vs. limit=10.0 2024-08-18 14:39:56,137 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.698e+01 2.246e+01 2.508e+01 2.794e+01 3.157e+02, threshold=5.017e+01, percent-clipped=3.0 2024-08-18 14:40:04,150 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3944090.0, ans=0.2 2024-08-18 14:40:06,227 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 7800, loss[loss=0.1173, beats_loss=0.008914, ecapa_loss=0.0001353, whisper_loss=0.1071, over 17040.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01049, ecapa_loss=0.0001441, whisper_loss=0.09024, over 3864324.10 frames. ], batch size: 64, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:40:31,585 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.04 vs. limit=10.0 2024-08-18 14:40:43,055 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.28 vs. limit=15.0 2024-08-18 14:41:03,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3944590.0, ans=0.2 2024-08-18 14:41:05,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3944590.0, ans=0.1 2024-08-18 14:41:11,782 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 7850, loss[loss=0.1045, beats_loss=0.01023, ecapa_loss=0.0001568, whisper_loss=0.09273, over 14962.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01053, ecapa_loss=0.0001442, whisper_loss=0.09041, over 3884419.39 frames. ], batch size: 61, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:41:18,382 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-18 14:41:19,955 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 24 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-18 14:41:34,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3944790.0, ans=0.125 2024-08-18 14:41:59,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3944990.0, ans=0.025 2024-08-18 14:42:09,803 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.960e+01 2.337e+01 2.593e+01 2.821e+01 2.204e+02, threshold=5.186e+01, percent-clipped=1.0 2024-08-18 14:42:10,310 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.43 vs. limit=15.0 2024-08-18 14:42:19,887 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 7900, loss[loss=0.09678, beats_loss=0.01217, ecapa_loss=0.000126, whisper_loss=0.08335, over 20946.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01059, ecapa_loss=0.000143, whisper_loss=0.09122, over 3894303.11 frames. ], batch size: 84, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:42:28,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3945190.0, ans=0.1 2024-08-18 14:42:50,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3945390.0, ans=0.2 2024-08-18 14:43:04,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3945490.0, ans=0.125 2024-08-18 14:43:05,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3945490.0, ans=0.1 2024-08-18 14:43:08,018 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-18 14:43:15,774 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 22 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-18 14:43:20,727 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 22 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-18 14:43:24,348 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 7950, loss[loss=0.1078, beats_loss=0.01117, ecapa_loss=9.152e-05, whisper_loss=0.09572, over 25510.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01069, ecapa_loss=0.0001415, whisper_loss=0.09056, over 3893360.42 frames. ], batch size: 94, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:43:32,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3945690.0, ans=0.0 2024-08-18 14:43:36,875 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-18 14:44:02,596 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 29 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-18 14:44:02,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3945990.0, ans=0.0 2024-08-18 14:44:06,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3945990.0, ans=0.125 2024-08-18 14:44:12,981 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 31 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-18 14:44:17,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3946090.0, ans=0.125 2024-08-18 14:44:18,406 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.738e+01 2.253e+01 2.458e+01 2.855e+01 4.177e+01, threshold=4.916e+01, percent-clipped=0.0 2024-08-18 14:44:19,081 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.16 vs. limit=10.0 2024-08-18 14:44:28,678 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 8000, loss[loss=0.1085, beats_loss=0.009033, ecapa_loss=0.0001811, whisper_loss=0.09766, over 21399.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01062, ecapa_loss=0.0001417, whisper_loss=0.09079, over 3899246.12 frames. ], batch size: 89, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:44:31,167 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-18 14:44:51,305 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 29 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-18 14:44:51,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3946290.0, ans=0.125 2024-08-18 14:44:57,926 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=5.517e-01 2024-08-18 14:45:00,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3946390.0, ans=0.125 2024-08-18 14:45:13,038 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 23 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-18 14:45:14,187 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-18 14:45:25,726 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 31 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-18 14:45:31,470 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 8050, loss[loss=0.07922, beats_loss=0.01017, ecapa_loss=0.0001389, whisper_loss=0.06766, over 15490.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01061, ecapa_loss=0.0001427, whisper_loss=0.09075, over 3921388.49 frames. ], batch size: 59, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:45:42,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3946690.0, ans=0.0 2024-08-18 14:45:45,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3946790.0, ans=0.0 2024-08-18 14:45:45,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3946790.0, ans=0.125 2024-08-18 14:45:51,934 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.66 vs. limit=10.0 2024-08-18 14:45:52,709 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 11 from Vox, 39 fro AS 2024-08-18 14:46:17,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3946990.0, ans=0.0 2024-08-18 14:46:17,557 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 14:46:18,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3946990.0, ans=0.0 2024-08-18 14:46:18,759 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.19 vs. limit=15.0 2024-08-18 14:46:24,655 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.326e+01 2.639e+01 3.174e+01 1.521e+02, threshold=5.277e+01, percent-clipped=3.0 2024-08-18 14:46:27,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3947090.0, ans=0.025 2024-08-18 14:46:35,140 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 8100, loss[loss=0.101, beats_loss=0.01254, ecapa_loss=0.0001293, whisper_loss=0.08722, over 18886.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01059, ecapa_loss=0.0001431, whisper_loss=0.09077, over 3880776.11 frames. ], batch size: 75, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:46:43,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3947190.0, ans=0.125 2024-08-18 14:46:48,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3947290.0, ans=0.125 2024-08-18 14:47:00,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3947390.0, ans=0.0 2024-08-18 14:47:41,926 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 8150, loss[loss=0.1007, beats_loss=0.01099, ecapa_loss=0.0001211, whisper_loss=0.08852, over 18523.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01053, ecapa_loss=0.0001435, whisper_loss=0.09025, over 3885598.08 frames. ], batch size: 71, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:47:46,945 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-18 14:47:58,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3947790.0, ans=0.0 2024-08-18 14:48:08,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3947890.0, ans=0.125 2024-08-18 14:48:08,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=3947890.0, ans=22.5 2024-08-18 14:48:14,638 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-18 14:48:17,334 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 25 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-18 14:48:35,008 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.970e+01 2.257e+01 2.558e+01 2.766e+01 4.647e+01, threshold=5.117e+01, percent-clipped=0.0 2024-08-18 14:48:45,279 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 8200, loss[loss=0.1052, beats_loss=0.01121, ecapa_loss=0.0001373, whisper_loss=0.0926, over 18428.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01055, ecapa_loss=0.0001432, whisper_loss=0.09029, over 3905036.99 frames. ], batch size: 73, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:48:45,429 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 17 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-18 14:48:46,239 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2024-08-18 14:48:55,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3948190.0, ans=0.05 2024-08-18 14:49:20,051 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3948390.0, ans=0.125 2024-08-18 14:49:40,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3948590.0, ans=0.2 2024-08-18 14:49:43,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3948590.0, ans=0.125 2024-08-18 14:49:49,323 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 8250, loss[loss=0.09166, beats_loss=0.009682, ecapa_loss=0.0001396, whisper_loss=0.08059, over 17452.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01056, ecapa_loss=0.0001425, whisper_loss=0.09006, over 3922301.29 frames. ], batch size: 68, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:49:52,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3948690.0, ans=0.0 2024-08-18 14:50:00,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3948690.0, ans=0.0 2024-08-18 14:50:00,375 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.634e-02 2024-08-18 14:50:12,346 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 14:50:25,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3948890.0, ans=0.1 2024-08-18 14:50:29,121 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-18 14:50:30,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3948990.0, ans=0.1 2024-08-18 14:50:39,526 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 21 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-18 14:50:44,684 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.276e+01 2.490e+01 2.765e+01 6.193e+01, threshold=4.979e+01, percent-clipped=1.0 2024-08-18 14:50:54,755 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 8300, loss[loss=0.1109, beats_loss=0.007366, ecapa_loss=0.0001521, whisper_loss=0.1021, over 14609.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01057, ecapa_loss=0.0001419, whisper_loss=0.08944, over 3907561.33 frames. ], batch size: 54, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:50:55,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3949190.0, ans=0.0 2024-08-18 14:50:55,782 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.76 vs. limit=6.0 2024-08-18 14:51:12,169 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-18 14:51:15,206 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.18 vs. limit=22.5 2024-08-18 14:51:21,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3949390.0, ans=0.0 2024-08-18 14:51:32,219 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 27 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-18 14:51:36,939 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 30 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-18 14:51:39,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3949490.0, ans=0.125 2024-08-18 14:51:51,027 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.59 vs. limit=12.0 2024-08-18 14:51:54,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3949590.0, ans=0.0 2024-08-18 14:51:57,029 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 8350, loss[loss=0.0873, beats_loss=0.01023, ecapa_loss=0.0001674, whisper_loss=0.07539, over 18464.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01047, ecapa_loss=0.0001435, whisper_loss=0.09032, over 3909224.39 frames. ], batch size: 74, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:52:08,058 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.17 vs. limit=15.0 2024-08-18 14:52:10,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3949790.0, ans=0.0 2024-08-18 14:52:13,586 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-18 14:52:13,949 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.59 vs. limit=10.0 2024-08-18 14:52:18,712 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-18 14:52:22,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3949890.0, ans=0.1 2024-08-18 14:52:25,131 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3949890.0, ans=0.125 2024-08-18 14:52:25,257 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.31 vs. limit=22.5 2024-08-18 14:52:28,184 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3949890.0, ans=0.1 2024-08-18 14:52:33,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3949890.0, ans=0.125 2024-08-18 14:52:37,418 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 19 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-18 14:52:43,304 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.28 vs. limit=6.0 2024-08-18 14:52:45,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3949990.0, ans=0.0 2024-08-18 14:52:52,729 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.711e+01 2.289e+01 2.579e+01 2.822e+01 1.067e+02, threshold=5.159e+01, percent-clipped=1.0 2024-08-18 14:52:56,107 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 13 from Vox, 41 fro AS 2024-08-18 14:53:01,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3950090.0, ans=0.125 2024-08-18 14:53:04,040 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 8400, loss[loss=0.0878, beats_loss=0.01082, ecapa_loss=0.0001535, whisper_loss=0.07544, over 21823.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01053, ecapa_loss=0.0001443, whisper_loss=0.08984, over 3895649.93 frames. ], batch size: 88, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:53:12,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3950190.0, ans=0.125 2024-08-18 14:53:21,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3950290.0, ans=0.0 2024-08-18 14:53:37,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3950390.0, ans=0.1 2024-08-18 14:53:50,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3950490.0, ans=0.125 2024-08-18 14:53:55,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3950490.0, ans=0.125 2024-08-18 14:54:10,523 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 8450, loss[loss=0.08331, beats_loss=0.01188, ecapa_loss=0.0001298, whisper_loss=0.07014, over 20802.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01052, ecapa_loss=0.0001446, whisper_loss=0.08975, over 3888169.36 frames. ], batch size: 84, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:54:10,683 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 21 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-18 14:54:11,682 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 26 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-18 14:54:11,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3950690.0, ans=0.5 2024-08-18 14:54:23,466 WARNING [optim.py:496] (0/4) Scaling gradients by 0.09940025955438614, model_norm_threshold=51.58603286743164 2024-08-18 14:54:23,629 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.22, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.828e+04, grad_sumsq=5.828e+04, orig_rms_sq=1.000e+00 2024-08-18 14:54:52,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3950990.0, ans=0.0 2024-08-18 14:55:04,300 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.244e+01 2.462e+01 2.718e+01 5.190e+02, threshold=4.924e+01, percent-clipped=1.0 2024-08-18 14:55:09,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3951090.0, ans=0.1 2024-08-18 14:55:14,719 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 8500, loss[loss=0.1124, beats_loss=0.01034, ecapa_loss=0.0001437, whisper_loss=0.1007, over 23147.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01057, ecapa_loss=0.0001444, whisper_loss=0.09017, over 3906543.65 frames. ], batch size: 93, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:55:19,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3951190.0, ans=0.125 2024-08-18 14:55:23,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3951190.0, ans=0.2 2024-08-18 14:55:27,152 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 17 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-18 14:55:28,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3951290.0, ans=0.0 2024-08-18 14:55:39,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3951390.0, ans=0.0 2024-08-18 14:55:46,209 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.10 vs. limit=22.5 2024-08-18 14:55:57,380 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 32 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-18 14:55:57,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3951490.0, ans=0.1 2024-08-18 14:55:59,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3951490.0, ans=0.125 2024-08-18 14:56:01,006 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 26 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-18 14:56:03,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=3951590.0, ans=0.05 2024-08-18 14:56:08,239 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-18 14:56:11,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3951590.0, ans=0.125 2024-08-18 14:56:12,048 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-18 14:56:16,864 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 8550, loss[loss=0.1032, beats_loss=0.009539, ecapa_loss=0.0001461, whisper_loss=0.09219, over 19462.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01055, ecapa_loss=0.0001446, whisper_loss=0.09016, over 3921910.89 frames. ], batch size: 77, lr: 2.26e-03, grad_scale: 1.152921504606847e+18 2024-08-18 14:56:23,788 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.20 vs. limit=15.0 2024-08-18 14:56:34,602 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-18 14:56:37,015 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 18 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-18 14:56:52,767 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.81 vs. limit=22.5 2024-08-18 14:57:09,768 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 22 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-18 14:57:10,807 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.987e+01 2.534e+01 2.714e+01 3.031e+01 4.468e+01, threshold=5.428e+01, percent-clipped=0.0 2024-08-18 14:57:13,677 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 14:57:13,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3952090.0, ans=0.0 2024-08-18 14:57:19,453 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 8600, loss[loss=0.1021, beats_loss=0.01091, ecapa_loss=0.000149, whisper_loss=0.08966, over 18801.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01057, ecapa_loss=0.0001438, whisper_loss=0.0902, over 3889101.52 frames. ], batch size: 76, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:57:21,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3952190.0, ans=0.125 2024-08-18 14:57:21,470 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.75 vs. limit=15.0 2024-08-18 14:57:22,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3952190.0, ans=0.125 2024-08-18 14:57:30,005 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.77 vs. limit=12.0 2024-08-18 14:57:33,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3952290.0, ans=0.125 2024-08-18 14:57:42,012 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-18 14:57:44,422 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-18 14:58:03,147 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 30 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-18 14:58:08,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3952590.0, ans=0.1 2024-08-18 14:58:09,025 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.47 vs. limit=15.0 2024-08-18 14:58:21,613 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 8650, loss[loss=0.1056, beats_loss=0.01152, ecapa_loss=0.0001539, whisper_loss=0.0925, over 22049.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01063, ecapa_loss=0.0001437, whisper_loss=0.08967, over 3877671.92 frames. ], batch size: 90, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:58:24,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3952690.0, ans=0.0 2024-08-18 14:58:24,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3952690.0, ans=0.125 2024-08-18 14:58:30,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.whiten.whitening_limit, batch_count=3952690.0, ans=12.0 2024-08-18 14:58:36,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3952790.0, ans=0.125 2024-08-18 14:58:38,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=3952790.0, ans=15.0 2024-08-18 14:58:57,476 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-18 14:59:15,070 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.252e+01 2.415e+01 2.736e+01 4.412e+01, threshold=4.831e+01, percent-clipped=0.0 2024-08-18 14:59:23,840 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 8700, loss[loss=0.1069, beats_loss=0.006848, ecapa_loss=0.0001979, whisper_loss=0.09804, over 14872.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01056, ecapa_loss=0.0001446, whisper_loss=0.08943, over 3846500.11 frames. ], batch size: 61, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:59:30,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3953190.0, ans=0.0 2024-08-18 14:59:31,063 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 19 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-18 14:59:44,155 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3953290.0, ans=0.1 2024-08-18 14:59:46,653 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 14:59:47,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3953390.0, ans=0.0 2024-08-18 14:59:51,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3953390.0, ans=0.1 2024-08-18 14:59:52,318 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 26 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-18 14:59:57,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3953390.0, ans=0.0 2024-08-18 15:00:21,232 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-18 15:00:25,984 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 8750, loss[loss=0.08674, beats_loss=0.01193, ecapa_loss=0.0001335, whisper_loss=0.07347, over 18885.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01048, ecapa_loss=0.0001452, whisper_loss=0.09013, over 3847763.12 frames. ], batch size: 76, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:00:33,984 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.18 vs. limit=6.0 2024-08-18 15:00:37,081 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 21 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-18 15:00:41,042 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 28 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-18 15:00:52,122 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-18 15:00:53,629 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 15:01:02,136 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-18 15:01:09,382 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 24 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-18 15:01:11,940 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 23 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-18 15:01:14,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3954090.0, ans=0.125 2024-08-18 15:01:19,349 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+01 2.312e+01 2.474e+01 2.763e+01 1.199e+02, threshold=4.947e+01, percent-clipped=1.0 2024-08-18 15:01:23,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3954090.0, ans=0.0 2024-08-18 15:01:27,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3954190.0, ans=0.125 2024-08-18 15:01:28,061 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 8800, loss[loss=0.08713, beats_loss=0.01472, ecapa_loss=0.0001426, whisper_loss=0.07099, over 16590.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01056, ecapa_loss=0.0001458, whisper_loss=0.08983, over 3865997.86 frames. ], batch size: 67, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:01:54,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3954390.0, ans=0.0 2024-08-18 15:02:04,627 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.62 vs. limit=22.5 2024-08-18 15:02:08,269 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.18 vs. limit=12.0 2024-08-18 15:02:11,605 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-18 15:02:27,637 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.43 vs. limit=15.0 2024-08-18 15:02:30,603 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 8850, loss[loss=0.1064, beats_loss=0.007577, ecapa_loss=0.0001725, whisper_loss=0.09714, over 19429.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01063, ecapa_loss=0.0001442, whisper_loss=0.08948, over 3877586.40 frames. ], batch size: 77, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:02:33,143 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 24 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-18 15:02:35,382 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.82 vs. limit=15.0 2024-08-18 15:02:38,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3954690.0, ans=0.125 2024-08-18 15:02:51,599 WARNING [optim.py:496] (0/4) Scaling gradients by 0.04036853462457657, model_norm_threshold=49.47042465209961 2024-08-18 15:02:51,761 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.10, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.484e+05, grad_sumsq=1.484e+05, orig_rms_sq=1.000e+00 2024-08-18 15:02:53,649 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.81 vs. limit=12.0 2024-08-18 15:02:59,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3954890.0, ans=0.125 2024-08-18 15:03:09,617 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 31 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-18 15:03:24,434 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.312e+01 2.617e+01 3.003e+01 1.225e+03, threshold=5.234e+01, percent-clipped=1.0 2024-08-18 15:03:33,257 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 8900, loss[loss=0.1089, beats_loss=0.00934, ecapa_loss=0.0001559, whisper_loss=0.09804, over 20976.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01061, ecapa_loss=0.0001437, whisper_loss=0.09008, over 3863508.24 frames. ], batch size: 85, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:03:35,082 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.00 vs. limit=15.0 2024-08-18 15:03:38,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3955190.0, ans=0.1 2024-08-18 15:03:42,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=3955190.0, ans=15.0 2024-08-18 15:03:42,910 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-18 15:03:43,935 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.29 vs. limit=22.5 2024-08-18 15:03:44,383 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 21 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-18 15:04:04,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3955390.0, ans=0.125 2024-08-18 15:04:08,900 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.18 vs. limit=10.0 2024-08-18 15:04:17,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3955490.0, ans=0.0 2024-08-18 15:04:20,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3955490.0, ans=0.0 2024-08-18 15:04:32,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3955590.0, ans=0.1 2024-08-18 15:04:35,949 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 8950, loss[loss=0.07414, beats_loss=0.01271, ecapa_loss=0.0001386, whisper_loss=0.06005, over 13600.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01057, ecapa_loss=0.000143, whisper_loss=0.09045, over 3853406.36 frames. ], batch size: 57, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:04:37,170 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-18 15:04:41,244 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.13 vs. limit=6.0 2024-08-18 15:04:51,475 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.29 vs. limit=15.0 2024-08-18 15:05:12,028 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 22 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-18 15:05:16,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3955990.0, ans=0.0 2024-08-18 15:05:17,237 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=7.355e+00 2024-08-18 15:05:25,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3956090.0, ans=0.125 2024-08-18 15:05:29,625 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.620e+01 2.255e+01 2.650e+01 2.885e+01 7.175e+01, threshold=5.300e+01, percent-clipped=2.0 2024-08-18 15:05:31,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3956090.0, ans=0.0 2024-08-18 15:05:36,789 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 18 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-18 15:05:38,063 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 9000, loss[loss=0.08528, beats_loss=0.01277, ecapa_loss=0.0001285, whisper_loss=0.07122, over 18590.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01057, ecapa_loss=0.0001433, whisper_loss=0.09058, over 3858720.78 frames. ], batch size: 72, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:05:38,064 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-18 15:06:15,537 INFO [train_multi_KD3.py:1149] (0/4) Epoch 27, validation on ASR_libri: loss=0.2517, beats_loss=0, ecapa_loss=0.0005205, whisper_loss=0.2465, over 922467.00 frames. 2024-08-18 15:06:34,025 INFO [train_multi_KD3.py:1149] (0/4) Epoch 27, validation on SV_voxceleb1: loss=0.004102, beats_loss=0, ecapa_loss=0.0004102, whisper_loss=0, over 939242.00 frames. 2024-08-18 15:07:57,336 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.6662, 2.0591, 2.2461, 1.5798, 1.8361, 2.3989, 2.9378, 1.7407], device='cuda:0') 2024-08-18 15:08:24,082 INFO [train_multi_KD3.py:1149] (0/4) Epoch 27, validation on AT_audioset: loss=0.02313, beats_loss=0.02313, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 15:08:24,086 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32389MB 2024-08-18 15:08:29,808 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.13 vs. limit=15.0 2024-08-18 15:08:37,178 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=3956290.0, ans=22.5 2024-08-18 15:08:41,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3956290.0, ans=0.1 2024-08-18 15:08:43,035 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 16 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-18 15:08:43,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3956290.0, ans=0.125 2024-08-18 15:08:45,746 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-18 15:08:45,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3956290.0, ans=0.05 2024-08-18 15:09:26,664 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 9050, loss[loss=0.1193, beats_loss=0.01033, ecapa_loss=0.0001144, whisper_loss=0.1078, over 24342.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01051, ecapa_loss=0.0001435, whisper_loss=0.09038, over 3862766.13 frames. ], batch size: 92, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:09:34,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3956690.0, ans=0.1 2024-08-18 15:09:43,580 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.74 vs. limit=15.0 2024-08-18 15:09:46,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3956790.0, ans=0.125 2024-08-18 15:10:00,289 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 23 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-18 15:10:05,247 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 18 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-18 15:10:06,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3956990.0, ans=0.1 2024-08-18 15:10:19,809 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.267e+01 2.520e+01 2.853e+01 4.367e+01, threshold=5.041e+01, percent-clipped=0.0 2024-08-18 15:10:25,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3957090.0, ans=0.1 2024-08-18 15:10:28,976 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 9100, loss[loss=0.1168, beats_loss=0.009103, ecapa_loss=0.0001615, whisper_loss=0.1061, over 16323.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01044, ecapa_loss=0.0001444, whisper_loss=0.09025, over 3798254.62 frames. ], batch size: 65, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:10:37,970 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 21 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-18 15:10:41,734 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 12 from Vox, 48 fro AS 2024-08-18 15:10:49,128 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-18 15:10:49,455 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.78 vs. limit=15.0 2024-08-18 15:10:57,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3957390.0, ans=0.125 2024-08-18 15:11:01,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3957390.0, ans=0.015 2024-08-18 15:11:17,014 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.86 vs. limit=15.0 2024-08-18 15:11:23,734 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 18 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-18 15:11:26,130 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 38 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-18 15:11:28,649 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-18 15:11:30,901 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 9150, loss[loss=0.1094, beats_loss=0.01008, ecapa_loss=0.0001505, whisper_loss=0.09781, over 22506.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01055, ecapa_loss=0.0001422, whisper_loss=0.08973, over 3838247.52 frames. ], batch size: 90, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:11:38,695 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-18 15:11:40,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3957690.0, ans=0.125 2024-08-18 15:12:03,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3957890.0, ans=0.2 2024-08-18 15:12:04,882 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 20 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-18 15:12:18,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3957990.0, ans=0.2 2024-08-18 15:12:24,948 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.708e+01 2.284e+01 2.490e+01 2.890e+01 6.008e+01, threshold=4.980e+01, percent-clipped=1.0 2024-08-18 15:12:33,857 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 9200, loss[loss=0.09903, beats_loss=0.009971, ecapa_loss=0.0001654, whisper_loss=0.0874, over 22324.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01057, ecapa_loss=0.0001439, whisper_loss=0.08924, over 3840277.46 frames. ], batch size: 94, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:12:33,959 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 32 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-18 15:12:40,146 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-18 15:12:57,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3958390.0, ans=0.125 2024-08-18 15:12:57,595 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.22 vs. limit=15.0 2024-08-18 15:13:15,882 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.616e+01 2024-08-18 15:13:21,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3958590.0, ans=0.07 2024-08-18 15:13:26,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3958590.0, ans=0.125 2024-08-18 15:13:35,076 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 9250, loss[loss=0.1059, beats_loss=0.01014, ecapa_loss=0.0001523, whisper_loss=0.09427, over 22790.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01053, ecapa_loss=0.0001442, whisper_loss=0.08975, over 3872118.75 frames. ], batch size: 92, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:14:23,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3959090.0, ans=0.2 2024-08-18 15:14:27,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3959090.0, ans=0.125 2024-08-18 15:14:28,620 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.320e+01 2.629e+01 2.972e+01 5.900e+01, threshold=5.258e+01, percent-clipped=2.0 2024-08-18 15:14:37,197 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 9300, loss[loss=0.08333, beats_loss=0.01279, ecapa_loss=0.0001339, whisper_loss=0.0692, over 20132.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01049, ecapa_loss=0.0001451, whisper_loss=0.09039, over 3891731.23 frames. ], batch size: 83, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:14:53,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3959290.0, ans=0.1 2024-08-18 15:14:57,443 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 16 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-18 15:15:16,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3959490.0, ans=0.2 2024-08-18 15:15:34,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3959590.0, ans=0.1 2024-08-18 15:15:38,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3959590.0, ans=0.2 2024-08-18 15:15:40,220 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-18 15:15:41,437 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 9350, loss[loss=0.1014, beats_loss=0.01088, ecapa_loss=0.0001087, whisper_loss=0.08938, over 19208.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01048, ecapa_loss=0.0001454, whisper_loss=0.09072, over 3874381.99 frames. ], batch size: 73, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:15:48,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3959690.0, ans=0.125 2024-08-18 15:16:05,249 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-18 15:16:10,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3959890.0, ans=0.0 2024-08-18 15:16:15,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3959890.0, ans=0.1 2024-08-18 15:16:17,148 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2024-08-18 15:16:19,087 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-396000.pt 2024-08-18 15:16:23,193 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.80 vs. limit=15.0 2024-08-18 15:16:31,393 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-18 15:16:34,065 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-18 15:16:37,862 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.319e+01 2.506e+01 2.733e+01 5.206e+01, threshold=5.011e+01, percent-clipped=0.0 2024-08-18 15:16:38,071 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 19 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-18 15:16:47,462 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 9400, loss[loss=0.09542, beats_loss=0.01125, ecapa_loss=0.000138, whisper_loss=0.08279, over 21598.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01045, ecapa_loss=0.0001462, whisper_loss=0.09136, over 3890984.57 frames. ], batch size: 87, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:16:51,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3960190.0, ans=0.125 2024-08-18 15:17:01,170 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.21 vs. limit=10.0 2024-08-18 15:17:08,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3960290.0, ans=0.0 2024-08-18 15:17:11,163 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 17 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-18 15:17:12,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3960390.0, ans=0.125 2024-08-18 15:17:24,763 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.99 vs. limit=15.0 2024-08-18 15:17:34,447 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-18 15:17:49,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3960590.0, ans=0.125 2024-08-18 15:17:50,421 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-18 15:17:53,124 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 9450, loss[loss=0.1233, beats_loss=0.008294, ecapa_loss=0.0001398, whisper_loss=0.1136, over 19261.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01044, ecapa_loss=0.000146, whisper_loss=0.09059, over 3898050.91 frames. ], batch size: 75, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:18:12,729 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-18 15:18:14,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3960790.0, ans=0.2 2024-08-18 15:18:22,968 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.17 vs. limit=15.0 2024-08-18 15:18:41,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3960990.0, ans=0.125 2024-08-18 15:18:41,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3960990.0, ans=0.1 2024-08-18 15:18:50,283 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.723e+01 2.322e+01 2.578e+01 2.897e+01 2.615e+02, threshold=5.157e+01, percent-clipped=1.0 2024-08-18 15:18:59,851 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 9500, loss[loss=0.1124, beats_loss=0.01128, ecapa_loss=0.0001102, whisper_loss=0.1, over 16560.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01049, ecapa_loss=0.0001452, whisper_loss=0.09105, over 3914491.67 frames. ], batch size: 65, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:19:05,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3961190.0, ans=0.0 2024-08-18 15:19:09,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3961190.0, ans=0.0 2024-08-18 15:19:13,962 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-18 15:19:17,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3961290.0, ans=0.125 2024-08-18 15:19:18,936 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 11 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-18 15:19:43,384 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.11 vs. limit=15.0 2024-08-18 15:19:44,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3961490.0, ans=0.0 2024-08-18 15:19:55,735 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.54 vs. limit=22.5 2024-08-18 15:19:59,153 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-18 15:20:11,222 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 9550, loss[loss=0.08668, beats_loss=0.01199, ecapa_loss=0.000155, whisper_loss=0.07314, over 21741.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01057, ecapa_loss=0.0001448, whisper_loss=0.09005, over 3903462.56 frames. ], batch size: 92, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:20:18,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3961690.0, ans=0.125 2024-08-18 15:20:20,616 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-18 15:20:22,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3961690.0, ans=0.0 2024-08-18 15:20:27,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3961790.0, ans=0.1 2024-08-18 15:20:34,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3961790.0, ans=0.0 2024-08-18 15:20:41,498 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-18 15:20:52,753 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 18 from LS+wenet, 8 from Vox, 28 fro AS 2024-08-18 15:20:58,239 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 30 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-18 15:20:58,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3961990.0, ans=0.125 2024-08-18 15:21:10,238 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.297e+01 2.564e+01 2.916e+01 8.592e+01, threshold=5.127e+01, percent-clipped=1.0 2024-08-18 15:21:20,399 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 9600, loss[loss=0.1207, beats_loss=0.009347, ecapa_loss=0.0001527, whisper_loss=0.1099, over 22195.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01051, ecapa_loss=0.0001442, whisper_loss=0.09039, over 3912906.28 frames. ], batch size: 87, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:21:40,387 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 17 from LS+wenet, 9 from Vox, 30 fro AS 2024-08-18 15:21:40,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3962290.0, ans=0.125 2024-08-18 15:21:41,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3962290.0, ans=0.125 2024-08-18 15:21:43,280 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.57 vs. limit=6.0 2024-08-18 15:21:47,090 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 26 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-18 15:21:52,888 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-18 15:22:04,027 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-18 15:22:11,585 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 23 from LS+wenet, 32 from Vox, 39 fro AS 2024-08-18 15:22:28,981 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 9650, loss[loss=0.09371, beats_loss=0.00939, ecapa_loss=0.000176, whisper_loss=0.08256, over 19598.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0104, ecapa_loss=0.0001452, whisper_loss=0.09063, over 3887926.22 frames. ], batch size: 80, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:22:31,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3962690.0, ans=0.125 2024-08-18 15:22:33,312 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.94 vs. limit=12.0 2024-08-18 15:22:36,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3962690.0, ans=0.0 2024-08-18 15:22:59,063 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-18 15:23:09,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3962990.0, ans=0.07 2024-08-18 15:23:17,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3962990.0, ans=0.125 2024-08-18 15:23:22,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3963090.0, ans=0.125 2024-08-18 15:23:26,913 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.385e+01 2.565e+01 2.930e+01 2.026e+02, threshold=5.129e+01, percent-clipped=1.0 2024-08-18 15:23:30,082 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 26 from LS+wenet, 10 from Vox, 28 fro AS 2024-08-18 15:23:30,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3963090.0, ans=0.125 2024-08-18 15:23:37,087 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 9700, loss[loss=0.1006, beats_loss=0.01216, ecapa_loss=0.00014, whisper_loss=0.08709, over 22153.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01046, ecapa_loss=0.0001454, whisper_loss=0.09039, over 3883450.58 frames. ], batch size: 91, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:23:37,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3963190.0, ans=0.0 2024-08-18 15:23:37,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3963190.0, ans=0.0 2024-08-18 15:24:01,577 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-18 15:24:02,225 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.50 vs. limit=15.0 2024-08-18 15:24:42,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3963590.0, ans=0.2 2024-08-18 15:24:50,575 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 9750, loss[loss=0.08838, beats_loss=0.01061, ecapa_loss=0.0001543, whisper_loss=0.07623, over 20968.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01044, ecapa_loss=0.0001458, whisper_loss=0.09042, over 3860612.10 frames. ], batch size: 90, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:24:56,882 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-18 15:25:05,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3963790.0, ans=0.125 2024-08-18 15:25:35,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3963990.0, ans=0.0 2024-08-18 15:25:39,415 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-18 15:25:51,334 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.207e+01 2.474e+01 2.731e+01 4.379e+01, threshold=4.949e+01, percent-clipped=0.0 2024-08-18 15:25:52,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3964090.0, ans=0.0 2024-08-18 15:26:00,833 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 9800, loss[loss=0.09473, beats_loss=0.01111, ecapa_loss=0.0001606, whisper_loss=0.08201, over 16643.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0105, ecapa_loss=0.0001458, whisper_loss=0.09052, over 3871476.91 frames. ], batch size: 69, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:26:20,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3964290.0, ans=0.125 2024-08-18 15:26:27,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3964390.0, ans=15.0 2024-08-18 15:26:33,597 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.37 vs. limit=22.5 2024-08-18 15:26:33,609 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.56 vs. limit=15.0 2024-08-18 15:26:38,610 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 15:26:55,703 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 22 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-18 15:26:55,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3964590.0, ans=0.125 2024-08-18 15:27:00,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3964590.0, ans=0.125 2024-08-18 15:27:00,706 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.63 vs. limit=15.0 2024-08-18 15:27:04,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3964590.0, ans=0.0 2024-08-18 15:27:07,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3964590.0, ans=0.1 2024-08-18 15:27:10,806 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 9850, loss[loss=0.108, beats_loss=0.00894, ecapa_loss=0.0001386, whisper_loss=0.09771, over 17639.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01063, ecapa_loss=0.0001449, whisper_loss=0.08923, over 3841005.76 frames. ], batch size: 69, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:27:11,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3964690.0, ans=0.04949747468305833 2024-08-18 15:27:20,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3964690.0, ans=0.0 2024-08-18 15:27:23,362 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.42 vs. limit=15.0 2024-08-18 15:27:26,770 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.99 vs. limit=22.5 2024-08-18 15:27:42,919 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-18 15:28:09,192 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.740e+01 2.298e+01 2.545e+01 2.787e+01 5.182e+01, threshold=5.091e+01, percent-clipped=2.0 2024-08-18 15:28:13,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3965090.0, ans=0.0 2024-08-18 15:28:16,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3965090.0, ans=0.1 2024-08-18 15:28:18,339 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 9900, loss[loss=0.1092, beats_loss=0.01088, ecapa_loss=0.0001144, whisper_loss=0.09717, over 18973.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01066, ecapa_loss=0.0001452, whisper_loss=0.08902, over 3830495.29 frames. ], batch size: 71, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:28:41,128 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-18 15:28:57,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3965490.0, ans=0.125 2024-08-18 15:29:01,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3965490.0, ans=0.2 2024-08-18 15:29:06,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3965490.0, ans=0.5 2024-08-18 15:29:12,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3965590.0, ans=0.05 2024-08-18 15:29:17,677 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 20 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-18 15:29:22,803 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 15 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-18 15:29:24,998 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 9950, loss[loss=0.1276, beats_loss=0.00776, ecapa_loss=0.0001457, whisper_loss=0.1183, over 22517.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01061, ecapa_loss=0.0001449, whisper_loss=0.08933, over 3828582.15 frames. ], batch size: 91, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:29:31,923 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 29 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-18 15:29:48,165 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-18 15:30:12,869 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3965990.0, ans=0.125 2024-08-18 15:30:13,801 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 20 from LS+wenet, 29 from Vox, 46 fro AS 2024-08-18 15:30:15,813 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.24 vs. limit=15.0 2024-08-18 15:30:20,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3966090.0, ans=0.125 2024-08-18 15:30:21,479 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.940e+01 2.257e+01 2.444e+01 2.772e+01 4.436e+01, threshold=4.888e+01, percent-clipped=0.0 2024-08-18 15:30:30,485 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 10000, loss[loss=0.1054, beats_loss=0.01041, ecapa_loss=0.0001552, whisper_loss=0.09345, over 21738.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01058, ecapa_loss=0.0001453, whisper_loss=0.08909, over 3815748.13 frames. ], batch size: 93, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:30:30,586 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-18 15:30:51,259 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.39 vs. limit=22.5 2024-08-18 15:31:01,094 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 19 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-18 15:31:04,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3966390.0, ans=0.125 2024-08-18 15:31:13,480 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3966490.0, ans=0.1 2024-08-18 15:31:28,213 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.14 vs. limit=15.0 2024-08-18 15:31:29,064 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 29 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-18 15:31:36,874 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 10050, loss[loss=0.08593, beats_loss=0.01248, ecapa_loss=0.000128, whisper_loss=0.07216, over 18167.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01053, ecapa_loss=0.0001447, whisper_loss=0.0891, over 3820358.16 frames. ], batch size: 73, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:31:42,036 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 25 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-18 15:31:47,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3966690.0, ans=0.0 2024-08-18 15:32:00,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3966790.0, ans=0.0 2024-08-18 15:32:13,690 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-18 15:32:28,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3966990.0, ans=0.125 2024-08-18 15:32:35,663 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.258e+01 2.497e+01 2.785e+01 5.136e+01, threshold=4.994e+01, percent-clipped=1.0 2024-08-18 15:32:36,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3967090.0, ans=0.2 2024-08-18 15:32:45,148 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 10100, loss[loss=0.07839, beats_loss=0.01257, ecapa_loss=0.0001696, whisper_loss=0.06412, over 20752.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01055, ecapa_loss=0.0001436, whisper_loss=0.08957, over 3857408.83 frames. ], batch size: 91, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:32:50,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3967190.0, ans=0.0 2024-08-18 15:32:50,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3967190.0, ans=0.07 2024-08-18 15:32:51,881 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-18 15:32:55,548 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 26 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-18 15:33:00,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3967290.0, ans=0.0 2024-08-18 15:33:07,260 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=7.770e-02 2024-08-18 15:33:19,202 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-18 15:33:46,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3967590.0, ans=0.1 2024-08-18 15:33:50,949 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 10150, loss[loss=0.1125, beats_loss=0.01038, ecapa_loss=0.0001394, whisper_loss=0.1007, over 23408.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01056, ecapa_loss=0.0001446, whisper_loss=0.0899, over 3901985.95 frames. ], batch size: 92, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:34:01,006 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 29 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-18 15:34:08,780 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-18 15:34:12,092 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 31 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-18 15:34:25,441 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.45 vs. limit=12.0 2024-08-18 15:34:39,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3967990.0, ans=0.125 2024-08-18 15:34:40,869 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 21 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-18 15:34:44,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3968090.0, ans=0.0 2024-08-18 15:34:47,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3968090.0, ans=0.125 2024-08-18 15:34:49,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3968090.0, ans=0.0 2024-08-18 15:34:50,166 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.305e+01 2.555e+01 2.872e+01 4.370e+01, threshold=5.110e+01, percent-clipped=0.0 2024-08-18 15:34:55,914 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 24 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-18 15:34:57,578 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.04 vs. limit=15.0 2024-08-18 15:34:58,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3968190.0, ans=0.2 2024-08-18 15:34:59,485 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 10200, loss[loss=0.09068, beats_loss=0.01206, ecapa_loss=0.000172, whisper_loss=0.07691, over 20239.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0105, ecapa_loss=0.0001448, whisper_loss=0.09082, over 3903503.00 frames. ], batch size: 90, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:35:02,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3968190.0, ans=0.125 2024-08-18 15:35:14,770 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 11 from Vox, 38 fro AS 2024-08-18 15:35:14,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3968290.0, ans=0.125 2024-08-18 15:35:27,004 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-18 15:35:34,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3968390.0, ans=0.1 2024-08-18 15:36:03,838 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 30 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-18 15:36:04,955 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 10250, loss[loss=0.1234, beats_loss=0.008695, ecapa_loss=0.0001671, whisper_loss=0.113, over 18459.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01048, ecapa_loss=0.0001464, whisper_loss=0.09042, over 3909997.40 frames. ], batch size: 74, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:36:05,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3968690.0, ans=0.0 2024-08-18 15:36:12,717 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 19 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-18 15:36:22,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3968790.0, ans=0.0 2024-08-18 15:36:22,405 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2024-08-18 15:36:24,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3968790.0, ans=0.95 2024-08-18 15:36:28,993 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 27 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-18 15:36:31,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3968890.0, ans=0.125 2024-08-18 15:37:01,338 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.987e+01 2.331e+01 2.519e+01 2.797e+01 4.005e+01, threshold=5.039e+01, percent-clipped=0.0 2024-08-18 15:37:01,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3969090.0, ans=0.0 2024-08-18 15:37:04,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3969090.0, ans=0.0 2024-08-18 15:37:10,995 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 10300, loss[loss=0.1086, beats_loss=0.01113, ecapa_loss=0.0001328, whisper_loss=0.09611, over 23153.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01048, ecapa_loss=0.0001456, whisper_loss=0.09044, over 3898745.27 frames. ], batch size: 92, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:37:14,006 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.308e+05 2024-08-18 15:37:20,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3969190.0, ans=0.5 2024-08-18 15:37:31,395 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.15 vs. limit=22.5 2024-08-18 15:37:32,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3969290.0, ans=0.125 2024-08-18 15:37:33,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3969290.0, ans=0.125 2024-08-18 15:37:49,662 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.99 vs. limit=10.0 2024-08-18 15:37:51,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3969490.0, ans=0.07 2024-08-18 15:37:53,402 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 15:38:01,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3969490.0, ans=0.125 2024-08-18 15:38:03,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3969490.0, ans=0.025 2024-08-18 15:38:16,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3969590.0, ans=0.125 2024-08-18 15:38:20,124 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 10350, loss[loss=0.09686, beats_loss=0.01203, ecapa_loss=0.0001057, whisper_loss=0.08378, over 15870.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01045, ecapa_loss=0.0001449, whisper_loss=0.0913, over 3911484.58 frames. ], batch size: 63, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:38:48,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3969890.0, ans=0.125 2024-08-18 15:38:52,822 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.28 vs. limit=12.0 2024-08-18 15:38:54,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3969890.0, ans=0.0 2024-08-18 15:39:13,073 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 23 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-18 15:39:20,511 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.307e+01 2.571e+01 2.921e+01 7.269e+01, threshold=5.142e+01, percent-clipped=1.0 2024-08-18 15:39:28,622 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.24 vs. limit=6.0 2024-08-18 15:39:30,271 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 10400, loss[loss=0.0999, beats_loss=0.0114, ecapa_loss=0.0001326, whisper_loss=0.08717, over 15033.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01049, ecapa_loss=0.0001448, whisper_loss=0.09089, over 3901047.92 frames. ], batch size: 58, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:39:32,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3970190.0, ans=0.2 2024-08-18 15:39:35,493 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-18 15:39:38,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3970190.0, ans=0.1 2024-08-18 15:39:41,218 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.42 vs. limit=15.0 2024-08-18 15:39:46,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3970290.0, ans=0.0 2024-08-18 15:40:05,274 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.32 vs. limit=12.0 2024-08-18 15:40:21,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=3970490.0, ans=0.95 2024-08-18 15:40:38,015 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 10450, loss[loss=0.1047, beats_loss=0.01159, ecapa_loss=0.0001304, whisper_loss=0.09182, over 21144.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01055, ecapa_loss=0.0001445, whisper_loss=0.09042, over 3914750.67 frames. ], batch size: 84, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:40:39,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3970690.0, ans=0.125 2024-08-18 15:40:41,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3970690.0, ans=0.1 2024-08-18 15:40:56,344 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-18 15:40:56,556 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3970790.0, ans=0.125 2024-08-18 15:41:03,299 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.00 vs. limit=22.5 2024-08-18 15:41:21,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3970990.0, ans=0.125 2024-08-18 15:41:26,530 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-18 15:41:34,984 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.245e+01 2.442e+01 2.727e+01 4.233e+01, threshold=4.884e+01, percent-clipped=0.0 2024-08-18 15:41:36,454 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-18 15:41:39,079 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 31 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-18 15:41:40,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3971090.0, ans=0.125 2024-08-18 15:41:41,699 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-18 15:41:43,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3971190.0, ans=0.04949747468305833 2024-08-18 15:41:44,291 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 10500, loss[loss=0.1109, beats_loss=0.01019, ecapa_loss=0.0001294, whisper_loss=0.09943, over 16709.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01051, ecapa_loss=0.0001436, whisper_loss=0.09093, over 3923655.75 frames. ], batch size: 64, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:41:45,606 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-18 15:41:51,407 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.22 vs. limit=15.0 2024-08-18 15:41:56,978 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-18 15:41:58,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3971290.0, ans=0.125 2024-08-18 15:42:02,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3971290.0, ans=0.125 2024-08-18 15:42:06,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3971290.0, ans=0.1 2024-08-18 15:42:50,222 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 10550, loss[loss=0.08254, beats_loss=0.01279, ecapa_loss=0.0001102, whisper_loss=0.06865, over 21628.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01053, ecapa_loss=0.0001435, whisper_loss=0.09063, over 3895468.67 frames. ], batch size: 85, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:43:00,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3971690.0, ans=0.0 2024-08-18 15:43:00,982 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.34 vs. limit=15.0 2024-08-18 15:43:21,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3971890.0, ans=0.125 2024-08-18 15:43:23,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3971890.0, ans=0.0 2024-08-18 15:43:45,870 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.358e+01 2.575e+01 2.904e+01 4.365e+01, threshold=5.151e+01, percent-clipped=0.0 2024-08-18 15:43:49,987 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 32 from Vox, 32 fro AS 2024-08-18 15:43:50,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3972090.0, ans=0.0 2024-08-18 15:43:51,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3972090.0, ans=0.1 2024-08-18 15:43:53,128 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 23 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-18 15:43:53,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3972090.0, ans=0.125 2024-08-18 15:43:55,678 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 10600, loss[loss=0.09672, beats_loss=0.009701, ecapa_loss=0.0001209, whisper_loss=0.08581, over 20089.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01047, ecapa_loss=0.0001441, whisper_loss=0.09088, over 3902862.50 frames. ], batch size: 76, lr: 2.26e-03, grad_scale: 1.152921504606847e+18 2024-08-18 15:43:56,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3972190.0, ans=0.04949747468305833 2024-08-18 15:44:10,219 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-18 15:44:14,380 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 18 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-18 15:44:28,756 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 29 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-18 15:44:34,506 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 23 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-18 15:44:36,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=3972490.0, ans=0.2 2024-08-18 15:44:37,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3972490.0, ans=0.125 2024-08-18 15:45:02,394 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 10650, loss[loss=0.09231, beats_loss=0.008909, ecapa_loss=0.000161, whisper_loss=0.08179, over 21208.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0104, ecapa_loss=0.0001443, whisper_loss=0.09078, over 3877406.16 frames. ], batch size: 88, lr: 2.26e-03, grad_scale: 1.152921504606847e+18 2024-08-18 15:45:02,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3972690.0, ans=0.125 2024-08-18 15:45:08,501 WARNING [optim.py:496] (0/4) Scaling gradients by 0.027766374871134758, model_norm_threshold=51.50757598876953 2024-08-18 15:45:08,666 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.457e+05, grad_sumsq=1.339e+05, orig_rms_sq=3.328e+00 2024-08-18 15:45:11,604 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-18 15:45:32,751 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.10 vs. limit=5.0 2024-08-18 15:45:53,160 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 9 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-18 15:46:00,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=3973090.0, ans=15.0 2024-08-18 15:46:01,416 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.310e+01 2.515e+01 2.823e+01 1.855e+03, threshold=5.029e+01, percent-clipped=1.0 2024-08-18 15:46:10,784 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 10700, loss[loss=0.09499, beats_loss=0.01081, ecapa_loss=0.0001696, whisper_loss=0.08248, over 22721.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01054, ecapa_loss=0.0001439, whisper_loss=0.08964, over 3901462.22 frames. ], batch size: 98, lr: 2.26e-03, grad_scale: 1.152921504606847e+18 2024-08-18 15:46:22,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3973190.0, ans=0.1 2024-08-18 15:46:24,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3973290.0, ans=0.125 2024-08-18 15:46:43,669 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 15:47:04,863 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 22 from LS+wenet, 32 from Vox, 34 fro AS 2024-08-18 15:47:13,335 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-18 15:47:22,602 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 10750, loss[loss=0.08959, beats_loss=0.01384, ecapa_loss=0.0001252, whisper_loss=0.0745, over 19012.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01068, ecapa_loss=0.0001447, whisper_loss=0.08888, over 3879631.14 frames. ], batch size: 77, lr: 2.26e-03, grad_scale: 1.152921504606847e+18 2024-08-18 15:47:23,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3973690.0, ans=0.125 2024-08-18 15:47:24,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3973690.0, ans=0.2 2024-08-18 15:47:34,816 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 23 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-18 15:47:39,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3973790.0, ans=0.015 2024-08-18 15:48:10,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3973990.0, ans=0.125 2024-08-18 15:48:10,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3973990.0, ans=0.09899494936611666 2024-08-18 15:48:27,962 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.366e+01 2.579e+01 2.828e+01 3.318e+02, threshold=5.158e+01, percent-clipped=2.0 2024-08-18 15:48:35,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3974090.0, ans=0.125 2024-08-18 15:48:38,062 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 10800, loss[loss=0.1089, beats_loss=0.01152, ecapa_loss=0.0001335, whisper_loss=0.096, over 22956.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0106, ecapa_loss=0.000145, whisper_loss=0.08982, over 3852185.73 frames. ], batch size: 90, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:48:38,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3974190.0, ans=0.125 2024-08-18 15:48:45,691 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.11 vs. limit=6.0 2024-08-18 15:49:16,541 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 16 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-18 15:49:27,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3974490.0, ans=0.1 2024-08-18 15:49:32,954 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-18 15:49:45,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3974590.0, ans=6.0 2024-08-18 15:49:48,369 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-18 15:49:50,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3974590.0, ans=0.0 2024-08-18 15:49:54,534 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 10850, loss[loss=0.09744, beats_loss=0.01286, ecapa_loss=0.0001336, whisper_loss=0.08325, over 16575.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01052, ecapa_loss=0.0001461, whisper_loss=0.09072, over 3884521.65 frames. ], batch size: 68, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:50:22,322 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-18 15:50:23,589 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 9 from Vox, 30 fro AS 2024-08-18 15:50:28,286 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.03 vs. limit=12.0 2024-08-18 15:50:33,531 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.09 vs. limit=6.0 2024-08-18 15:50:43,896 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 16 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-18 15:50:47,270 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 21 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-18 15:50:53,674 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 18 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-18 15:50:55,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3975090.0, ans=0.125 2024-08-18 15:51:01,345 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.249e+01 2.453e+01 2.667e+01 3.964e+01, threshold=4.906e+01, percent-clipped=0.0 2024-08-18 15:51:10,409 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 10900, loss[loss=0.0911, beats_loss=0.009063, ecapa_loss=0.0001725, whisper_loss=0.08031, over 17053.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0105, ecapa_loss=0.0001464, whisper_loss=0.0903, over 3872771.29 frames. ], batch size: 71, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:51:11,913 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 40 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-18 15:51:17,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3975190.0, ans=0.1 2024-08-18 15:51:19,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3975190.0, ans=0.125 2024-08-18 15:51:21,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3975190.0, ans=0.125 2024-08-18 15:51:28,574 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.06 vs. limit=15.0 2024-08-18 15:51:50,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3975390.0, ans=0.125 2024-08-18 15:51:54,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3975390.0, ans=0.125 2024-08-18 15:51:58,571 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3975490.0, ans=0.125 2024-08-18 15:52:01,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3975490.0, ans=0.1 2024-08-18 15:52:11,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3975590.0, ans=0.09899494936611666 2024-08-18 15:52:26,995 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 10950, loss[loss=0.1057, beats_loss=0.01083, ecapa_loss=0.0001261, whisper_loss=0.09357, over 19099.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01054, ecapa_loss=0.000145, whisper_loss=0.09029, over 3905671.50 frames. ], batch size: 74, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:52:31,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3975690.0, ans=0.0 2024-08-18 15:52:38,181 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-18 15:52:45,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3975790.0, ans=0.1 2024-08-18 15:52:53,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3975790.0, ans=0.125 2024-08-18 15:52:54,469 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-18 15:53:00,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3975890.0, ans=0.125 2024-08-18 15:53:02,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=3975890.0, ans=0.2 2024-08-18 15:53:18,823 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-18 15:53:23,218 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 17 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-18 15:53:25,092 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=15.0 2024-08-18 15:53:33,579 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.325e+01 2.540e+01 2.829e+01 5.122e+01, threshold=5.080e+01, percent-clipped=1.0 2024-08-18 15:53:43,346 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 11000, loss[loss=0.08466, beats_loss=0.01274, ecapa_loss=0.0001367, whisper_loss=0.07056, over 14924.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01057, ecapa_loss=0.0001445, whisper_loss=0.0906, over 3926446.87 frames. ], batch size: 60, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:54:12,571 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3976290.0, ans=0.125 2024-08-18 15:54:14,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3976290.0, ans=0.035 2024-08-18 15:54:22,341 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 33 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-18 15:54:42,541 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 14 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-18 15:54:47,295 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 24 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-18 15:54:47,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3976490.0, ans=0.0 2024-08-18 15:54:56,883 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-18 15:55:05,540 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 11050, loss[loss=0.08954, beats_loss=0.01057, ecapa_loss=0.0001368, whisper_loss=0.0776, over 22067.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01056, ecapa_loss=0.0001442, whisper_loss=0.09065, over 3953021.16 frames. ], batch size: 85, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:55:16,401 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-18 15:55:17,764 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-18 15:55:19,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3976790.0, ans=0.1 2024-08-18 15:55:27,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3976790.0, ans=0.05 2024-08-18 15:55:53,355 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-18 15:55:59,095 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 36 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-18 15:56:02,532 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-18 15:56:02,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3976990.0, ans=0.125 2024-08-18 15:56:03,932 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3976990.0, ans=0.1 2024-08-18 15:56:11,013 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 20 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-18 15:56:12,632 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.281e+01 2.516e+01 2.908e+01 1.267e+02, threshold=5.032e+01, percent-clipped=1.0 2024-08-18 15:56:15,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3977090.0, ans=0.1 2024-08-18 15:56:21,873 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 11100, loss[loss=0.1023, beats_loss=0.009433, ecapa_loss=0.000117, whisper_loss=0.09165, over 14147.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0105, ecapa_loss=0.0001446, whisper_loss=0.09062, over 3938759.90 frames. ], batch size: 54, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:56:55,512 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.10 vs. limit=15.0 2024-08-18 15:56:57,745 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 19 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-18 15:57:05,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3977490.0, ans=0.1 2024-08-18 15:57:25,332 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-18 15:57:32,139 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 9 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-18 15:57:35,320 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 11150, loss[loss=0.09193, beats_loss=0.01211, ecapa_loss=0.0001517, whisper_loss=0.0783, over 21538.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01059, ecapa_loss=0.0001436, whisper_loss=0.09012, over 3909170.88 frames. ], batch size: 91, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:57:51,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3977790.0, ans=0.125 2024-08-18 15:57:57,378 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.50 vs. limit=10.0 2024-08-18 15:58:02,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3977890.0, ans=0.125 2024-08-18 15:58:10,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3977890.0, ans=0.1 2024-08-18 15:58:11,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3977890.0, ans=0.2 2024-08-18 15:58:22,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3977990.0, ans=0.0 2024-08-18 15:58:23,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3977990.0, ans=0.05 2024-08-18 15:58:24,426 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 29 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-18 15:58:28,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3977990.0, ans=0.0 2024-08-18 15:58:38,479 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.725e+01 2.330e+01 2.608e+01 2.859e+01 1.941e+02, threshold=5.216e+01, percent-clipped=1.0 2024-08-18 15:58:47,731 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 11200, loss[loss=0.1025, beats_loss=0.01112, ecapa_loss=0.0001258, whisper_loss=0.09007, over 14862.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01057, ecapa_loss=0.000144, whisper_loss=0.0904, over 3906664.25 frames. ], batch size: 57, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:58:56,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3978190.0, ans=0.0 2024-08-18 15:59:01,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3978190.0, ans=0.1 2024-08-18 15:59:05,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3978290.0, ans=0.125 2024-08-18 15:59:38,348 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 28 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-18 15:59:52,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3978590.0, ans=0.0 2024-08-18 15:59:54,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3978590.0, ans=0.0 2024-08-18 16:00:01,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3978590.0, ans=0.0 2024-08-18 16:00:06,959 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 11250, loss[loss=0.08452, beats_loss=0.01177, ecapa_loss=0.0001447, whisper_loss=0.0713, over 21749.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01055, ecapa_loss=0.0001446, whisper_loss=0.09012, over 3897869.08 frames. ], batch size: 89, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:00:09,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3978690.0, ans=0.0 2024-08-18 16:00:20,059 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 27 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-18 16:00:24,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3978790.0, ans=0.0 2024-08-18 16:00:24,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=3978790.0, ans=22.5 2024-08-18 16:00:29,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3978790.0, ans=0.2 2024-08-18 16:00:45,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3978890.0, ans=0.0 2024-08-18 16:01:03,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3978990.0, ans=0.1 2024-08-18 16:01:06,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3979090.0, ans=0.125 2024-08-18 16:01:12,817 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.408e+01 2.624e+01 3.093e+01 2.615e+02, threshold=5.248e+01, percent-clipped=2.0 2024-08-18 16:01:12,946 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 31 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-18 16:01:22,308 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 11300, loss[loss=0.1208, beats_loss=0.008276, ecapa_loss=0.0001918, whisper_loss=0.1106, over 20871.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01046, ecapa_loss=0.000144, whisper_loss=0.09034, over 3890866.54 frames. ], batch size: 85, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:01:43,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3979290.0, ans=0.1 2024-08-18 16:01:48,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3979290.0, ans=0.0 2024-08-18 16:02:14,534 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.38 vs. limit=22.5 2024-08-18 16:02:18,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3979490.0, ans=0.1 2024-08-18 16:02:27,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=3979590.0, ans=0.1 2024-08-18 16:02:38,428 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 11350, loss[loss=0.09666, beats_loss=0.01018, ecapa_loss=0.0001747, whisper_loss=0.08473, over 18343.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01046, ecapa_loss=0.000144, whisper_loss=0.09059, over 3887385.67 frames. ], batch size: 78, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:02:50,818 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2024-08-18 16:03:01,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3979790.0, ans=0.1 2024-08-18 16:03:19,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3979890.0, ans=0.125 2024-08-18 16:03:36,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3979990.0, ans=0.2 2024-08-18 16:03:37,688 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-18 16:03:46,646 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+01 2.287e+01 2.489e+01 2.829e+01 3.988e+01, threshold=4.979e+01, percent-clipped=0.0 2024-08-18 16:03:51,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3980090.0, ans=0.2 2024-08-18 16:03:54,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3980190.0, ans=0.0 2024-08-18 16:03:55,496 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 11400, loss[loss=0.1081, beats_loss=0.009675, ecapa_loss=0.0001567, whisper_loss=0.09689, over 15055.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01051, ecapa_loss=0.0001435, whisper_loss=0.09094, over 3911479.22 frames. ], batch size: 60, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:03:56,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3980190.0, ans=0.125 2024-08-18 16:04:12,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3980290.0, ans=0.1 2024-08-18 16:04:23,241 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.38 vs. limit=15.0 2024-08-18 16:04:31,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3980390.0, ans=0.125 2024-08-18 16:04:36,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3980390.0, ans=0.125 2024-08-18 16:05:03,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3980590.0, ans=0.125 2024-08-18 16:05:10,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3980590.0, ans=0.125 2024-08-18 16:05:13,271 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 11450, loss[loss=0.1227, beats_loss=0.008111, ecapa_loss=0.000172, whisper_loss=0.1129, over 17122.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01056, ecapa_loss=0.0001429, whisper_loss=0.09038, over 3897235.32 frames. ], batch size: 66, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:05:23,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3980690.0, ans=0.125 2024-08-18 16:05:23,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3980690.0, ans=0.125 2024-08-18 16:06:02,688 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=5.566e-02 2024-08-18 16:06:27,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3981090.0, ans=0.125 2024-08-18 16:06:27,847 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.932e+01 2.331e+01 2.551e+01 2.848e+01 4.379e+01, threshold=5.102e+01, percent-clipped=0.0 2024-08-18 16:06:37,178 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 11500, loss[loss=0.1254, beats_loss=0.008901, ecapa_loss=0.0001357, whisper_loss=0.1151, over 23500.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01053, ecapa_loss=0.0001431, whisper_loss=0.09095, over 3912990.01 frames. ], batch size: 89, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:07:08,926 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 16:07:12,671 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 29 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-18 16:07:19,369 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 33 from Vox, 32 fro AS 2024-08-18 16:07:26,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3981390.0, ans=0.125 2024-08-18 16:07:27,933 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-18 16:08:06,145 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.53 vs. limit=15.0 2024-08-18 16:08:18,255 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 11550, loss[loss=0.07908, beats_loss=0.01492, ecapa_loss=9.169e-05, whisper_loss=0.06325, over 15883.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01053, ecapa_loss=0.0001432, whisper_loss=0.09018, over 3888460.00 frames. ], batch size: 65, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:08:38,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3981790.0, ans=0.0 2024-08-18 16:08:45,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3981790.0, ans=0.125 2024-08-18 16:09:53,868 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.363e+01 2.578e+01 2.890e+01 4.329e+01, threshold=5.155e+01, percent-clipped=0.0 2024-08-18 16:10:08,491 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 11600, loss[loss=0.09245, beats_loss=0.01304, ecapa_loss=8.126e-05, whisper_loss=0.0786, over 14584.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01056, ecapa_loss=0.0001435, whisper_loss=0.09055, over 3912697.55 frames. ], batch size: 54, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:10:09,036 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.11 vs. limit=22.5 2024-08-18 16:10:15,755 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.32 vs. limit=15.0 2024-08-18 16:10:17,347 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.67 vs. limit=15.0 2024-08-18 16:10:56,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3982390.0, ans=0.125 2024-08-18 16:11:11,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3982390.0, ans=0.1 2024-08-18 16:11:19,540 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.85 vs. limit=12.0 2024-08-18 16:11:25,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3982490.0, ans=0.125 2024-08-18 16:11:25,580 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.05 vs. limit=12.0 2024-08-18 16:11:40,879 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.57 vs. limit=15.0 2024-08-18 16:11:47,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3982590.0, ans=0.1 2024-08-18 16:11:47,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3982590.0, ans=0.07 2024-08-18 16:12:08,184 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3982690.0, ans=0.125 2024-08-18 16:12:10,426 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 11650, loss[loss=0.1034, beats_loss=0.009721, ecapa_loss=0.0001962, whisper_loss=0.09172, over 20113.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01053, ecapa_loss=0.0001432, whisper_loss=0.09053, over 3916666.83 frames. ], batch size: 88, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:12:13,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3982690.0, ans=0.125 2024-08-18 16:12:20,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3982690.0, ans=0.125 2024-08-18 16:12:46,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3982790.0, ans=0.0 2024-08-18 16:12:48,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3982790.0, ans=0.125 2024-08-18 16:13:39,689 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.63 vs. limit=15.0 2024-08-18 16:13:51,002 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.304e+01 2.600e+01 2.905e+01 3.001e+02, threshold=5.199e+01, percent-clipped=1.0 2024-08-18 16:14:04,551 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 11700, loss[loss=0.1156, beats_loss=0.009315, ecapa_loss=0.0001748, whisper_loss=0.1045, over 19699.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01058, ecapa_loss=0.0001443, whisper_loss=0.09003, over 3917181.26 frames. ], batch size: 82, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:14:14,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3983190.0, ans=0.125 2024-08-18 16:14:16,551 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 24 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-18 16:14:20,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3983190.0, ans=0.125 2024-08-18 16:14:35,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3983290.0, ans=0.125 2024-08-18 16:14:39,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3983290.0, ans=0.0 2024-08-18 16:14:54,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3983390.0, ans=0.1 2024-08-18 16:14:57,230 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 24 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-18 16:14:59,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3983390.0, ans=0.125 2024-08-18 16:15:06,496 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.71 vs. limit=12.0 2024-08-18 16:15:13,072 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.87 vs. limit=10.0 2024-08-18 16:15:21,726 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.49 vs. limit=12.0 2024-08-18 16:15:29,995 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 11750, loss[loss=0.1031, beats_loss=0.01079, ecapa_loss=0.0001354, whisper_loss=0.09096, over 23242.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01068, ecapa_loss=0.0001433, whisper_loss=0.08998, over 3915726.57 frames. ], batch size: 94, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:15:33,770 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-18 16:15:39,874 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3983690.0, ans=0.125 2024-08-18 16:16:06,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3983890.0, ans=0.2 2024-08-18 16:16:39,128 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.018e+01 2.396e+01 2.657e+01 3.044e+01 4.817e+01, threshold=5.315e+01, percent-clipped=0.0 2024-08-18 16:16:42,909 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-18 16:16:48,227 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 11800, loss[loss=0.08675, beats_loss=0.0104, ecapa_loss=0.0001359, whisper_loss=0.075, over 14570.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01064, ecapa_loss=0.0001428, whisper_loss=0.09034, over 3892735.21 frames. ], batch size: 55, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:16:48,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3984190.0, ans=0.0 2024-08-18 16:16:52,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3984190.0, ans=0.125 2024-08-18 16:17:11,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3984290.0, ans=0.0 2024-08-18 16:17:16,620 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-18 16:17:26,479 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 19 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-18 16:17:37,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3984490.0, ans=0.1 2024-08-18 16:17:38,960 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 18 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-18 16:17:41,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3984490.0, ans=0.0 2024-08-18 16:17:42,080 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 18 from LS+wenet, 31 from Vox, 30 fro AS 2024-08-18 16:17:51,505 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-18 16:17:54,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3984590.0, ans=0.125 2024-08-18 16:18:03,898 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-18 16:18:09,381 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 11850, loss[loss=0.09525, beats_loss=0.01018, ecapa_loss=0.0001439, whisper_loss=0.08363, over 20029.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01065, ecapa_loss=0.0001431, whisper_loss=0.09002, over 3906191.48 frames. ], batch size: 79, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:18:20,633 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 21 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-18 16:18:46,455 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.34 vs. limit=22.5 2024-08-18 16:18:57,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3984990.0, ans=0.125 2024-08-18 16:19:13,388 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 29 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-18 16:19:17,966 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.743e+01 2.255e+01 2.535e+01 2.793e+01 4.854e+01, threshold=5.071e+01, percent-clipped=0.0 2024-08-18 16:19:19,463 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-18 16:19:26,140 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 11900, loss[loss=0.1035, beats_loss=0.01139, ecapa_loss=0.0001214, whisper_loss=0.0909, over 22166.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01073, ecapa_loss=0.0001434, whisper_loss=0.08933, over 3921055.20 frames. ], batch size: 89, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:19:34,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3985190.0, ans=0.125 2024-08-18 16:19:35,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3985190.0, ans=0.125 2024-08-18 16:19:53,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3985290.0, ans=0.125 2024-08-18 16:20:08,930 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-18 16:20:12,378 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 22 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-18 16:20:12,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3985490.0, ans=0.125 2024-08-18 16:20:18,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3985490.0, ans=0.07 2024-08-18 16:20:23,834 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.38 vs. limit=15.0 2024-08-18 16:20:42,501 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 11950, loss[loss=0.1053, beats_loss=0.008557, ecapa_loss=0.0001687, whisper_loss=0.09506, over 17297.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01058, ecapa_loss=0.0001448, whisper_loss=0.08975, over 3892053.29 frames. ], batch size: 69, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:20:53,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3985690.0, ans=0.125 2024-08-18 16:20:57,615 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.19 vs. limit=5.0 2024-08-18 16:21:04,901 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.73 vs. limit=22.5 2024-08-18 16:21:07,338 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-18 16:21:23,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3985890.0, ans=0.0 2024-08-18 16:21:42,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3985990.0, ans=0.0 2024-08-18 16:21:51,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3986090.0, ans=0.07 2024-08-18 16:21:53,290 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 19 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-18 16:21:54,290 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.727e+01 2.273e+01 2.566e+01 2.844e+01 1.117e+02, threshold=5.132e+01, percent-clipped=1.0 2024-08-18 16:21:58,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3986090.0, ans=0.1 2024-08-18 16:22:03,589 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 12000, loss[loss=0.1117, beats_loss=0.008315, ecapa_loss=0.0001662, whisper_loss=0.1018, over 18329.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01049, ecapa_loss=0.000144, whisper_loss=0.09062, over 3882678.26 frames. ], batch size: 74, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:22:03,591 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-18 16:22:37,138 INFO [train_multi_KD3.py:1149] (0/4) Epoch 27, validation on ASR_libri: loss=0.2532, beats_loss=0, ecapa_loss=0.0005101, whisper_loss=0.2481, over 922467.00 frames. 2024-08-18 16:22:55,521 INFO [train_multi_KD3.py:1149] (0/4) Epoch 27, validation on SV_voxceleb1: loss=0.004067, beats_loss=0, ecapa_loss=0.0004067, whisper_loss=0, over 939242.00 frames. 2024-08-18 16:24:34,616 INFO [train_multi_KD3.py:1149] (0/4) Epoch 27, validation on AT_audioset: loss=0.02313, beats_loss=0.02313, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 16:24:34,620 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32389MB 2024-08-18 16:24:58,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3986290.0, ans=0.1 2024-08-18 16:25:00,434 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-18 16:25:01,884 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 17 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-18 16:25:04,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3986390.0, ans=0.04949747468305833 2024-08-18 16:25:06,277 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.38 vs. limit=6.0 2024-08-18 16:25:16,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3986390.0, ans=0.125 2024-08-18 16:25:21,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3986490.0, ans=0.2 2024-08-18 16:25:29,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3986490.0, ans=0.125 2024-08-18 16:25:33,553 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-18 16:25:33,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3986490.0, ans=0.1 2024-08-18 16:25:47,137 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.14 vs. limit=8.0 2024-08-18 16:25:52,569 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 12050, loss[loss=0.07692, beats_loss=0.01141, ecapa_loss=0.0002061, whisper_loss=0.06345, over 14199.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01047, ecapa_loss=0.0001442, whisper_loss=0.09028, over 3878885.06 frames. ], batch size: 64, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:26:01,096 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=11.53 vs. limit=12.0 2024-08-18 16:26:41,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3986990.0, ans=0.125 2024-08-18 16:26:43,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3986990.0, ans=0.2 2024-08-18 16:26:54,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3987090.0, ans=0.2 2024-08-18 16:26:57,827 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-18 16:26:58,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3987090.0, ans=0.1 2024-08-18 16:27:02,258 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.347e+01 2.564e+01 2.841e+01 2.951e+02, threshold=5.127e+01, percent-clipped=2.0 2024-08-18 16:27:05,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3987090.0, ans=0.025 2024-08-18 16:27:11,242 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 12100, loss[loss=0.116, beats_loss=0.008543, ecapa_loss=0.0001302, whisper_loss=0.1061, over 17201.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0105, ecapa_loss=0.0001439, whisper_loss=0.09026, over 3900869.46 frames. ], batch size: 62, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:27:18,778 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-18 16:27:22,517 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-18 16:27:30,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3987290.0, ans=0.1 2024-08-18 16:27:39,114 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.53 vs. limit=22.5 2024-08-18 16:27:41,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3987390.0, ans=0.2 2024-08-18 16:28:06,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3987490.0, ans=0.1 2024-08-18 16:28:22,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3987590.0, ans=0.1 2024-08-18 16:28:27,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3987590.0, ans=0.125 2024-08-18 16:28:29,497 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 12150, loss[loss=0.1123, beats_loss=0.009807, ecapa_loss=0.0001859, whisper_loss=0.1006, over 20965.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01051, ecapa_loss=0.0001437, whisper_loss=0.09028, over 3908776.85 frames. ], batch size: 88, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:28:37,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3987690.0, ans=0.125 2024-08-18 16:28:59,603 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.80 vs. limit=15.0 2024-08-18 16:29:13,955 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 25 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-18 16:29:15,777 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-18 16:29:39,140 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.283e+01 2.537e+01 2.740e+01 4.505e+01, threshold=5.074e+01, percent-clipped=0.0 2024-08-18 16:29:47,830 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 12200, loss[loss=0.1258, beats_loss=0.008261, ecapa_loss=0.0001376, whisper_loss=0.1161, over 22395.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01051, ecapa_loss=0.0001429, whisper_loss=0.09011, over 3911543.99 frames. ], batch size: 88, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:30:26,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3988390.0, ans=0.0 2024-08-18 16:30:56,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3988590.0, ans=0.0 2024-08-18 16:30:57,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3988590.0, ans=0.2 2024-08-18 16:31:00,541 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 12250, loss[loss=0.093, beats_loss=0.01009, ecapa_loss=0.0001116, whisper_loss=0.0818, over 17089.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01052, ecapa_loss=0.0001429, whisper_loss=0.09023, over 3904839.55 frames. ], batch size: 62, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:31:01,408 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 31 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-18 16:31:45,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3988990.0, ans=0.125 2024-08-18 16:31:53,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3988990.0, ans=0.0 2024-08-18 16:32:00,967 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-18 16:32:01,612 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.64 vs. limit=15.0 2024-08-18 16:32:07,789 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.312e+01 2.540e+01 2.795e+01 3.669e+01, threshold=5.080e+01, percent-clipped=0.0 2024-08-18 16:32:08,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3989090.0, ans=0.2 2024-08-18 16:32:13,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3989090.0, ans=0.125 2024-08-18 16:32:14,619 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-18 16:32:17,483 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 12300, loss[loss=0.09282, beats_loss=0.009311, ecapa_loss=0.0001406, whisper_loss=0.0821, over 15128.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01051, ecapa_loss=0.0001448, whisper_loss=0.08995, over 3881085.72 frames. ], batch size: 61, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:32:19,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3989190.0, ans=0.09899494936611666 2024-08-18 16:32:36,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3989290.0, ans=0.0 2024-08-18 16:32:43,043 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 33 from LS+wenet, 9 from Vox, 38 fro AS 2024-08-18 16:33:11,526 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-18 16:33:11,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3989490.0, ans=0.125 2024-08-18 16:33:12,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3989490.0, ans=0.125 2024-08-18 16:33:15,594 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-18 16:33:16,692 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 15 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-18 16:33:17,979 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 35 from LS+wenet, 12 from Vox, 40 fro AS 2024-08-18 16:33:26,191 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 13 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-18 16:33:29,884 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 12350, loss[loss=0.1201, beats_loss=0.01129, ecapa_loss=0.0001652, whisper_loss=0.1072, over 21853.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01044, ecapa_loss=0.0001447, whisper_loss=0.09066, over 3875759.21 frames. ], batch size: 91, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:33:39,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3989690.0, ans=0.125 2024-08-18 16:33:41,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3989690.0, ans=0.125 2024-08-18 16:33:46,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3989790.0, ans=0.125 2024-08-18 16:34:06,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3989890.0, ans=0.025 2024-08-18 16:34:07,511 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 26 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-18 16:34:15,495 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.25 vs. limit=15.0 2024-08-18 16:34:29,823 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 24 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-18 16:34:30,334 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.70 vs. limit=15.0 2024-08-18 16:34:33,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=3990090.0, ans=0.02 2024-08-18 16:34:37,309 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.315e+01 2.645e+01 2.969e+01 4.976e+01, threshold=5.289e+01, percent-clipped=0.0 2024-08-18 16:34:42,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3990090.0, ans=0.1 2024-08-18 16:34:45,896 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 12400, loss[loss=0.08122, beats_loss=0.01259, ecapa_loss=0.0001221, whisper_loss=0.06741, over 17200.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01049, ecapa_loss=0.0001444, whisper_loss=0.0901, over 3868145.24 frames. ], batch size: 71, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:34:47,801 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 23 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-18 16:34:49,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3990190.0, ans=0.0 2024-08-18 16:34:52,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3990190.0, ans=0.125 2024-08-18 16:35:00,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=3990290.0, ans=0.02 2024-08-18 16:35:04,970 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 20 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-18 16:35:07,122 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.35 vs. limit=22.5 2024-08-18 16:35:12,965 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 21 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-18 16:35:22,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=3990390.0, ans=0.5 2024-08-18 16:35:39,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3990490.0, ans=0.125 2024-08-18 16:35:45,114 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3990590.0, ans=0.2 2024-08-18 16:35:46,496 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.50 vs. limit=15.0 2024-08-18 16:35:56,175 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 12450, loss[loss=0.1166, beats_loss=0.01012, ecapa_loss=0.0001376, whisper_loss=0.1051, over 21996.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01048, ecapa_loss=0.0001441, whisper_loss=0.08949, over 3841296.27 frames. ], batch size: 89, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:36:12,804 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.51 vs. limit=15.0 2024-08-18 16:36:29,231 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 36 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-18 16:36:38,237 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.76 vs. limit=15.0 2024-08-18 16:36:39,045 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 32 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-18 16:36:39,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3990890.0, ans=0.2 2024-08-18 16:36:43,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3990990.0, ans=0.035 2024-08-18 16:36:44,379 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-18 16:36:51,496 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2024-08-18 16:37:00,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3991090.0, ans=0.0 2024-08-18 16:37:01,210 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.300e+01 2.585e+01 2.837e+01 4.531e+01, threshold=5.170e+01, percent-clipped=0.0 2024-08-18 16:37:10,408 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 12500, loss[loss=0.1281, beats_loss=0.009031, ecapa_loss=0.0001294, whisper_loss=0.1178, over 24592.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01056, ecapa_loss=0.000143, whisper_loss=0.09, over 3881957.04 frames. ], batch size: 92, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:37:18,178 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-18 16:37:18,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3991190.0, ans=0.125 2024-08-18 16:37:37,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3991290.0, ans=0.1 2024-08-18 16:37:41,695 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-18 16:37:43,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3991390.0, ans=0.125 2024-08-18 16:38:00,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3991490.0, ans=0.125 2024-08-18 16:38:23,437 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 12550, loss[loss=0.09036, beats_loss=0.0128, ecapa_loss=0.000142, whisper_loss=0.07614, over 19099.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01052, ecapa_loss=0.0001436, whisper_loss=0.09049, over 3917544.63 frames. ], batch size: 79, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:38:39,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3991790.0, ans=0.0 2024-08-18 16:38:44,499 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.62 vs. limit=15.0 2024-08-18 16:38:46,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3991790.0, ans=0.125 2024-08-18 16:38:57,825 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-18 16:39:25,051 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.976e+01 2.342e+01 2.582e+01 2.915e+01 4.894e+01, threshold=5.164e+01, percent-clipped=0.0 2024-08-18 16:39:33,185 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 12600, loss[loss=0.09141, beats_loss=0.01091, ecapa_loss=0.0001553, whisper_loss=0.07895, over 22009.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01052, ecapa_loss=0.0001433, whisper_loss=0.0899, over 3890665.55 frames. ], batch size: 91, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:39:35,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3992190.0, ans=0.0 2024-08-18 16:39:38,996 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-18 16:39:42,224 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-18 16:39:44,836 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-18 16:40:06,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3992390.0, ans=0.125 2024-08-18 16:40:12,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3992390.0, ans=0.2 2024-08-18 16:40:16,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3992490.0, ans=0.0 2024-08-18 16:40:21,783 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.75 vs. limit=12.0 2024-08-18 16:40:36,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3992590.0, ans=0.0 2024-08-18 16:40:43,881 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 12650, loss[loss=0.07237, beats_loss=0.01059, ecapa_loss=0.0001582, whisper_loss=0.06019, over 15732.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01062, ecapa_loss=0.0001439, whisper_loss=0.08993, over 3910668.61 frames. ], batch size: 62, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:40:44,955 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.78 vs. limit=15.0 2024-08-18 16:40:58,260 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 28 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-18 16:40:59,548 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 14 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-18 16:41:10,510 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 14 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-18 16:41:11,389 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.30 vs. limit=10.0 2024-08-18 16:41:11,793 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 14 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-18 16:41:28,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3992990.0, ans=0.1 2024-08-18 16:41:45,650 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.778e+01 2.296e+01 2.534e+01 2.850e+01 7.549e+01, threshold=5.068e+01, percent-clipped=1.0 2024-08-18 16:41:54,024 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 12700, loss[loss=0.07152, beats_loss=0.01247, ecapa_loss=0.0001047, whisper_loss=0.058, over 15926.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01062, ecapa_loss=0.0001437, whisper_loss=0.0899, over 3878262.43 frames. ], batch size: 61, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:42:02,064 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.56 vs. limit=15.0 2024-08-18 16:42:03,162 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 25 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-18 16:42:26,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3993390.0, ans=0.125 2024-08-18 16:42:29,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3993390.0, ans=0.0 2024-08-18 16:42:39,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3993490.0, ans=0.0 2024-08-18 16:43:01,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3993590.0, ans=0.125 2024-08-18 16:43:06,269 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 12750, loss[loss=0.1292, beats_loss=0.01075, ecapa_loss=0.0001232, whisper_loss=0.1173, over 22200.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01061, ecapa_loss=0.0001433, whisper_loss=0.09032, over 3893615.49 frames. ], batch size: 85, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:43:09,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3993690.0, ans=0.07 2024-08-18 16:43:13,875 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-18 16:43:14,337 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2024-08-18 16:43:15,098 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-18 16:43:18,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3993690.0, ans=0.125 2024-08-18 16:43:29,814 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.190e-01 2024-08-18 16:43:32,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3993790.0, ans=0.1 2024-08-18 16:43:36,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3993890.0, ans=0.1 2024-08-18 16:43:46,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3993890.0, ans=0.125 2024-08-18 16:43:50,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3993990.0, ans=0.0 2024-08-18 16:44:09,990 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.299e+01 2.591e+01 2.848e+01 5.343e+01, threshold=5.182e+01, percent-clipped=2.0 2024-08-18 16:44:11,391 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 26 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-18 16:44:18,385 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 12800, loss[loss=0.1202, beats_loss=0.009398, ecapa_loss=0.000123, whisper_loss=0.1096, over 21909.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01056, ecapa_loss=0.0001441, whisper_loss=0.09098, over 3904205.57 frames. ], batch size: 82, lr: 2.25e-03, grad_scale: 1.152921504606847e+18 2024-08-18 16:44:28,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3994190.0, ans=0.0 2024-08-18 16:44:29,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3994190.0, ans=0.0 2024-08-18 16:44:46,511 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 22 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-18 16:44:50,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3994390.0, ans=0.125 2024-08-18 16:44:58,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3994490.0, ans=0.0 2024-08-18 16:45:27,110 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 12850, loss[loss=0.1181, beats_loss=0.01111, ecapa_loss=0.0001585, whisper_loss=0.1054, over 21615.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01061, ecapa_loss=0.0001448, whisper_loss=0.08968, over 3887801.09 frames. ], batch size: 89, lr: 2.25e-03, grad_scale: 1.152921504606847e+18 2024-08-18 16:45:33,657 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-18 16:45:42,552 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.99 vs. limit=15.0 2024-08-18 16:45:43,203 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 22 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-18 16:45:44,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3994790.0, ans=0.0 2024-08-18 16:45:47,250 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 9 from Vox, 31 fro AS 2024-08-18 16:46:14,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3994990.0, ans=0.0 2024-08-18 16:46:16,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3994990.0, ans=0.125 2024-08-18 16:46:24,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3995090.0, ans=0.0 2024-08-18 16:46:25,028 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.49 vs. limit=12.0 2024-08-18 16:46:27,621 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.743e+01 2.211e+01 2.399e+01 2.704e+01 4.332e+02, threshold=4.798e+01, percent-clipped=1.0 2024-08-18 16:46:29,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3995090.0, ans=0.0 2024-08-18 16:46:34,541 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 28 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-18 16:46:36,137 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 12900, loss[loss=0.1108, beats_loss=0.009437, ecapa_loss=0.000165, whisper_loss=0.09968, over 20142.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01064, ecapa_loss=0.0001447, whisper_loss=0.0891, over 3853577.69 frames. ], batch size: 82, lr: 2.25e-03, grad_scale: 1.152921504606847e+18 2024-08-18 16:46:49,087 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.19 vs. limit=22.5 2024-08-18 16:46:50,055 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 19 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-18 16:46:59,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3995290.0, ans=0.125 2024-08-18 16:47:01,202 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-18 16:47:01,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3995290.0, ans=0.2 2024-08-18 16:47:11,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3995390.0, ans=0.0 2024-08-18 16:47:18,280 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-18 16:47:22,498 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=8.348e-01 2024-08-18 16:47:23,012 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.98 vs. limit=6.0 2024-08-18 16:47:24,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3995490.0, ans=0.125 2024-08-18 16:47:27,458 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 32 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-18 16:47:45,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3995590.0, ans=0.125 2024-08-18 16:47:47,753 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 12950, loss[loss=0.09966, beats_loss=0.009681, ecapa_loss=0.0001657, whisper_loss=0.08832, over 20546.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0105, ecapa_loss=0.0001455, whisper_loss=0.0903, over 3866555.19 frames. ], batch size: 84, lr: 2.25e-03, grad_scale: 1.152921504606847e+18 2024-08-18 16:47:51,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3995690.0, ans=0.125 2024-08-18 16:47:57,986 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-18 16:48:24,844 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.19 vs. limit=12.0 2024-08-18 16:48:26,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3995890.0, ans=0.0 2024-08-18 16:48:45,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3996090.0, ans=0.0 2024-08-18 16:48:48,462 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.384e+01 2.658e+01 2.947e+01 5.232e+01, threshold=5.316e+01, percent-clipped=1.0 2024-08-18 16:48:57,690 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 13000, loss[loss=0.1207, beats_loss=0.009632, ecapa_loss=0.0001362, whisper_loss=0.1097, over 15576.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01055, ecapa_loss=0.0001443, whisper_loss=0.08963, over 3859275.24 frames. ], batch size: 62, lr: 2.25e-03, grad_scale: 1.152921504606847e+18 2024-08-18 16:48:59,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3996190.0, ans=0.125 2024-08-18 16:49:01,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3996190.0, ans=0.0 2024-08-18 16:49:02,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3996190.0, ans=0.2 2024-08-18 16:49:02,527 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.29 vs. limit=15.0 2024-08-18 16:49:08,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3996190.0, ans=0.1 2024-08-18 16:49:10,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3996290.0, ans=0.2 2024-08-18 16:49:10,709 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.96 vs. limit=15.0 2024-08-18 16:49:21,742 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-18 16:49:30,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3996390.0, ans=0.125 2024-08-18 16:49:31,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3996390.0, ans=0.1 2024-08-18 16:49:54,039 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-18 16:50:06,160 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 13050, loss[loss=0.09938, beats_loss=0.01163, ecapa_loss=0.0001408, whisper_loss=0.08634, over 21919.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01054, ecapa_loss=0.0001442, whisper_loss=0.08932, over 3832596.54 frames. ], batch size: 89, lr: 2.25e-03, grad_scale: 1.152921504606847e+18 2024-08-18 16:50:06,249 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-18 16:50:28,218 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-18 16:50:47,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3996990.0, ans=0.0 2024-08-18 16:50:52,979 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 19 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-18 16:51:00,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3997090.0, ans=0.0 2024-08-18 16:51:06,634 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.279e+01 2.532e+01 2.847e+01 6.978e+01, threshold=5.064e+01, percent-clipped=1.0 2024-08-18 16:51:14,956 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 13100, loss[loss=0.1093, beats_loss=0.01016, ecapa_loss=0.0001444, whisper_loss=0.09775, over 16297.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01056, ecapa_loss=0.000143, whisper_loss=0.08905, over 3819843.92 frames. ], batch size: 65, lr: 2.25e-03, grad_scale: 1.152921504606847e+18 2024-08-18 16:51:19,022 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-18 16:51:40,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3997390.0, ans=0.0 2024-08-18 16:51:59,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3997490.0, ans=0.125 2024-08-18 16:52:19,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3997590.0, ans=0.125 2024-08-18 16:52:25,742 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 13150, loss[loss=0.1152, beats_loss=0.007161, ecapa_loss=0.0001525, whisper_loss=0.1065, over 16152.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01057, ecapa_loss=0.0001421, whisper_loss=0.08922, over 3846450.02 frames. ], batch size: 62, lr: 2.25e-03, grad_scale: 1.152921504606847e+18 2024-08-18 16:52:27,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3997690.0, ans=0.125 2024-08-18 16:52:39,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3997790.0, ans=0.125 2024-08-18 16:52:43,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3997790.0, ans=0.1 2024-08-18 16:52:50,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3997790.0, ans=0.0 2024-08-18 16:52:56,642 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 16 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-18 16:53:02,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3997890.0, ans=0.2 2024-08-18 16:53:06,248 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-18 16:53:08,962 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-18 16:53:24,305 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=8.613e+00 2024-08-18 16:53:25,117 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.700e+01 2.321e+01 2.625e+01 2.876e+01 3.803e+01, threshold=5.250e+01, percent-clipped=0.0 2024-08-18 16:53:33,262 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 13200, loss[loss=0.1182, beats_loss=0.009721, ecapa_loss=0.0001158, whisper_loss=0.1073, over 20882.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01054, ecapa_loss=0.0001427, whisper_loss=0.08969, over 3864501.10 frames. ], batch size: 79, lr: 2.25e-03, grad_scale: 1.152921504606847e+18 2024-08-18 16:53:46,065 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.51 vs. limit=15.0 2024-08-18 16:53:57,506 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-18 16:54:16,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3998490.0, ans=0.125 2024-08-18 16:54:18,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3998490.0, ans=0.1 2024-08-18 16:54:39,486 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 13250, loss[loss=0.1324, beats_loss=0.00511, ecapa_loss=0.0001735, whisper_loss=0.1255, over 13912.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01051, ecapa_loss=0.0001446, whisper_loss=0.0897, over 3857808.99 frames. ], batch size: 53, lr: 2.25e-03, grad_scale: 1.152921504606847e+18 2024-08-18 16:54:46,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3998690.0, ans=0.2 2024-08-18 16:55:09,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3998890.0, ans=0.125 2024-08-18 16:55:19,149 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 23 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-18 16:55:27,769 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 20 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-18 16:55:33,849 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 20 from LS+wenet, 26 from Vox, 46 fro AS 2024-08-18 16:55:35,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3999090.0, ans=0.125 2024-08-18 16:55:40,317 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.403e+01 2.676e+01 3.041e+01 3.478e+02, threshold=5.351e+01, percent-clipped=3.0 2024-08-18 16:55:48,177 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 13300, loss[loss=0.1218, beats_loss=0.009048, ecapa_loss=0.0002137, whisper_loss=0.1106, over 20137.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01055, ecapa_loss=0.000144, whisper_loss=0.08963, over 3876754.68 frames. ], batch size: 87, lr: 2.25e-03, grad_scale: 1.152921504606847e+18 2024-08-18 16:55:52,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3999190.0, ans=10.0 2024-08-18 16:56:08,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3999290.0, ans=0.1 2024-08-18 16:56:11,336 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-18 16:56:21,584 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.575e+00 2024-08-18 16:56:22,108 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.67 vs. limit=15.0 2024-08-18 16:56:24,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3999390.0, ans=0.2 2024-08-18 16:56:52,948 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 13350, loss[loss=0.1092, beats_loss=0.008064, ecapa_loss=0.0001784, whisper_loss=0.09936, over 20979.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01056, ecapa_loss=0.0001431, whisper_loss=0.08979, over 3885055.54 frames. ], batch size: 87, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:56:56,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3999690.0, ans=0.125 2024-08-18 16:57:05,209 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.73 vs. limit=22.5 2024-08-18 16:57:22,438 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.55 vs. limit=15.0 2024-08-18 16:57:24,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3999890.0, ans=0.07 2024-08-18 16:57:34,380 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-400000.pt 2024-08-18 16:57:37,282 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-18 16:57:42,036 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-18 16:57:45,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3999990.0, ans=0.2 2024-08-18 16:57:49,495 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.41 vs. limit=15.0 2024-08-18 16:57:56,428 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.674e+01 2.330e+01 2.620e+01 2.943e+01 5.095e+01, threshold=5.239e+01, percent-clipped=0.0 2024-08-18 16:57:59,351 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-18 16:58:02,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4000190.0, ans=0.125 2024-08-18 16:58:02,316 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.59 vs. limit=15.0 2024-08-18 16:58:03,313 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 13400, loss[loss=0.1044, beats_loss=0.01008, ecapa_loss=0.0001708, whisper_loss=0.09263, over 19000.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01055, ecapa_loss=0.0001444, whisper_loss=0.08933, over 3875070.13 frames. ], batch size: 78, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:58:11,639 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 24 from LS+wenet, 12 from Vox, 45 fro AS 2024-08-18 16:58:15,787 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-18 16:58:36,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4000390.0, ans=0.0 2024-08-18 16:58:49,905 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.81 vs. limit=15.0 2024-08-18 16:58:54,771 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 34 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-18 16:59:13,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4000590.0, ans=0.0 2024-08-18 16:59:15,436 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 13450, loss[loss=0.1309, beats_loss=0.01041, ecapa_loss=0.0001297, whisper_loss=0.1192, over 20601.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01052, ecapa_loss=0.0001432, whisper_loss=0.09007, over 3874814.32 frames. ], batch size: 79, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:59:16,043 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.24 vs. limit=15.0 2024-08-18 16:59:33,932 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 28 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-18 16:59:34,709 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.13 vs. limit=15.0 2024-08-18 16:59:39,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4000790.0, ans=0.125 2024-08-18 16:59:46,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4000790.0, ans=0.0 2024-08-18 17:00:01,190 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 37 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-18 17:00:01,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4000890.0, ans=0.0 2024-08-18 17:00:03,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4000890.0, ans=0.125 2024-08-18 17:00:05,901 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.30 vs. limit=15.0 2024-08-18 17:00:13,481 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.81 vs. limit=15.0 2024-08-18 17:00:15,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4000990.0, ans=0.2 2024-08-18 17:00:22,195 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.44 vs. limit=15.0 2024-08-18 17:00:29,109 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.614e+01 2.273e+01 2.579e+01 2.870e+01 2.046e+02, threshold=5.158e+01, percent-clipped=2.0 2024-08-18 17:00:36,120 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 13500, loss[loss=0.112, beats_loss=0.01023, ecapa_loss=0.0001289, whisper_loss=0.1005, over 23686.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0105, ecapa_loss=0.0001437, whisper_loss=0.09005, over 3907710.47 frames. ], batch size: 92, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:00:36,771 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.46 vs. limit=15.0 2024-08-18 17:00:54,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4001290.0, ans=0.1 2024-08-18 17:01:15,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4001390.0, ans=0.1 2024-08-18 17:01:54,583 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 13550, loss[loss=0.09079, beats_loss=0.01178, ecapa_loss=0.0001369, whisper_loss=0.07765, over 23175.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01053, ecapa_loss=0.000143, whisper_loss=0.08953, over 3870892.98 frames. ], batch size: 93, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:01:59,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4001690.0, ans=0.2 2024-08-18 17:02:08,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4001790.0, ans=0.125 2024-08-18 17:02:09,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4001790.0, ans=0.125 2024-08-18 17:02:17,401 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-18 17:02:17,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4001790.0, ans=0.125 2024-08-18 17:02:34,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn1.whiten.whitening_limit, batch_count=4001890.0, ans=22.5 2024-08-18 17:02:51,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4001990.0, ans=0.0 2024-08-18 17:02:56,206 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 26 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-18 17:03:04,833 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.302e+01 2.500e+01 2.862e+01 4.462e+01, threshold=4.999e+01, percent-clipped=0.0 2024-08-18 17:03:05,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4002090.0, ans=0.125 2024-08-18 17:03:12,556 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 13600, loss[loss=0.08978, beats_loss=0.01171, ecapa_loss=0.0001424, whisper_loss=0.07665, over 21654.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0106, ecapa_loss=0.000143, whisper_loss=0.08917, over 3891711.13 frames. ], batch size: 90, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:03:28,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4002290.0, ans=0.125 2024-08-18 17:03:28,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4002290.0, ans=0.125 2024-08-18 17:03:40,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4002390.0, ans=0.2 2024-08-18 17:03:45,140 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-18 17:04:00,618 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.77 vs. limit=22.5 2024-08-18 17:04:15,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4002590.0, ans=0.125 2024-08-18 17:04:18,336 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 13650, loss[loss=0.08499, beats_loss=0.009633, ecapa_loss=0.000143, whisper_loss=0.07392, over 14205.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01069, ecapa_loss=0.0001439, whisper_loss=0.08907, over 3880346.34 frames. ], batch size: 55, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:04:32,440 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2024-08-18 17:04:37,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4002790.0, ans=0.125 2024-08-18 17:04:37,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4002790.0, ans=0.125 2024-08-18 17:04:55,795 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.90 vs. limit=10.0 2024-08-18 17:05:07,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4003090.0, ans=0.125 2024-08-18 17:05:07,732 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.96 vs. limit=6.0 2024-08-18 17:05:13,227 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.369e+01 2.636e+01 3.031e+01 4.631e+02, threshold=5.273e+01, percent-clipped=3.0 2024-08-18 17:05:15,497 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.73 vs. limit=6.0 2024-08-18 17:05:19,302 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 13700, loss[loss=0.1145, beats_loss=0.009934, ecapa_loss=0.0001388, whisper_loss=0.1032, over 18961.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01071, ecapa_loss=0.0001431, whisper_loss=0.08885, over 3861675.78 frames. ], batch size: 73, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:05:21,350 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.93 vs. limit=15.0 2024-08-18 17:05:23,046 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 15 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-18 17:05:26,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4003190.0, ans=0.0 2024-08-18 17:05:50,252 WARNING [optim.py:496] (0/4) Scaling gradients by 0.08100207895040512, model_norm_threshold=52.72901916503906 2024-08-18 17:05:50,418 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.24, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.003e+05, grad_sumsq=1.003e+05, orig_rms_sq=1.000e+00 2024-08-18 17:05:53,028 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 27 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-18 17:05:57,903 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 23 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-18 17:06:09,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4003590.0, ans=0.125 2024-08-18 17:06:13,956 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-18 17:06:18,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4003590.0, ans=0.125 2024-08-18 17:06:20,850 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 13750, loss[loss=0.1013, beats_loss=0.01169, ecapa_loss=0.0001452, whisper_loss=0.08812, over 20516.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01072, ecapa_loss=0.0001434, whisper_loss=0.08899, over 3862591.12 frames. ], batch size: 85, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:06:36,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4003790.0, ans=0.125 2024-08-18 17:06:49,734 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-18 17:06:54,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4003890.0, ans=0.0 2024-08-18 17:07:02,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4003990.0, ans=0.125 2024-08-18 17:07:07,623 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4003990.0, ans=0.125 2024-08-18 17:07:16,830 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.418e+01 2.607e+01 2.994e+01 6.510e+02, threshold=5.215e+01, percent-clipped=3.0 2024-08-18 17:07:23,407 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 13800, loss[loss=0.1174, beats_loss=0.009413, ecapa_loss=0.0001731, whisper_loss=0.1063, over 21522.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01069, ecapa_loss=0.0001444, whisper_loss=0.08911, over 3866707.04 frames. ], batch size: 89, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:07:30,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4004190.0, ans=0.125 2024-08-18 17:07:32,473 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 19 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-18 17:07:47,904 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 37 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-18 17:08:00,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4004490.0, ans=0.0 2024-08-18 17:08:02,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4004490.0, ans=0.0 2024-08-18 17:08:07,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4004490.0, ans=0.125 2024-08-18 17:08:13,619 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.94 vs. limit=15.0 2024-08-18 17:08:14,448 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 14 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-18 17:08:26,994 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 13850, loss[loss=0.1346, beats_loss=0.009434, ecapa_loss=0.0001398, whisper_loss=0.1238, over 23173.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01061, ecapa_loss=0.0001438, whisper_loss=0.09023, over 3889173.24 frames. ], batch size: 91, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:08:37,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4004690.0, ans=0.125 2024-08-18 17:08:40,689 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-18 17:08:42,569 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.19 vs. limit=15.0 2024-08-18 17:08:52,347 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 31 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-18 17:08:53,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4004890.0, ans=0.125 2024-08-18 17:09:06,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4004990.0, ans=0.125 2024-08-18 17:09:18,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4005090.0, ans=0.0 2024-08-18 17:09:18,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4005090.0, ans=0.2 2024-08-18 17:09:18,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4005090.0, ans=0.2 2024-08-18 17:09:24,458 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.330e+01 2.560e+01 2.931e+01 5.468e+01, threshold=5.120e+01, percent-clipped=1.0 2024-08-18 17:09:25,044 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.87 vs. limit=15.0 2024-08-18 17:09:30,737 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 13900, loss[loss=0.1079, beats_loss=0.009292, ecapa_loss=0.000155, whisper_loss=0.09708, over 14598.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01051, ecapa_loss=0.0001444, whisper_loss=0.09116, over 3898754.73 frames. ], batch size: 56, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:09:31,618 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.55 vs. limit=6.0 2024-08-18 17:09:33,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4005190.0, ans=0.0 2024-08-18 17:09:52,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4005290.0, ans=0.1 2024-08-18 17:09:57,595 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-18 17:10:25,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4005590.0, ans=0.04949747468305833 2024-08-18 17:10:25,848 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.51 vs. limit=15.0 2024-08-18 17:10:29,960 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 14 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-18 17:10:36,316 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 13950, loss[loss=0.1113, beats_loss=0.009776, ecapa_loss=0.0001208, whisper_loss=0.1003, over 20961.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01043, ecapa_loss=0.0001433, whisper_loss=0.09134, over 3888007.95 frames. ], batch size: 81, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:10:49,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4005790.0, ans=0.125 2024-08-18 17:10:55,177 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-18 17:11:04,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4005890.0, ans=0.0 2024-08-18 17:11:10,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4005890.0, ans=0.125 2024-08-18 17:11:13,217 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-18 17:11:15,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4005990.0, ans=0.125 2024-08-18 17:11:21,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4005990.0, ans=0.0 2024-08-18 17:11:21,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4005990.0, ans=0.125 2024-08-18 17:11:26,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4006090.0, ans=0.1 2024-08-18 17:11:30,190 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 15 from LS+wenet, 31 from Vox, 46 fro AS 2024-08-18 17:11:33,979 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.844e+01 2.308e+01 2.540e+01 2.874e+01 4.374e+01, threshold=5.080e+01, percent-clipped=0.0 2024-08-18 17:11:36,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4006090.0, ans=0.125 2024-08-18 17:11:40,681 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 14000, loss[loss=0.09736, beats_loss=0.01232, ecapa_loss=0.0001106, whisper_loss=0.08393, over 20708.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01054, ecapa_loss=0.0001426, whisper_loss=0.09052, over 3899783.73 frames. ], batch size: 82, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:11:55,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4006290.0, ans=0.125 2024-08-18 17:12:21,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4006490.0, ans=0.125 2024-08-18 17:12:31,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4006590.0, ans=0.035 2024-08-18 17:12:44,368 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 14050, loss[loss=0.1018, beats_loss=0.009679, ecapa_loss=0.00015, whisper_loss=0.09066, over 22985.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01059, ecapa_loss=0.000142, whisper_loss=0.09036, over 3917734.41 frames. ], batch size: 89, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:12:45,678 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 12 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-18 17:13:04,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4006790.0, ans=0.2 2024-08-18 17:13:08,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4006890.0, ans=0.0 2024-08-18 17:13:31,214 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-18 17:13:41,649 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.335e+01 2.606e+01 2.930e+01 4.821e+01, threshold=5.212e+01, percent-clipped=0.0 2024-08-18 17:13:47,598 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 14100, loss[loss=0.1114, beats_loss=0.009492, ecapa_loss=0.0001572, whisper_loss=0.1003, over 22449.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01067, ecapa_loss=0.0001422, whisper_loss=0.08996, over 3939666.89 frames. ], batch size: 90, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:13:52,503 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-18 17:14:08,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4007290.0, ans=0.09899494936611666 2024-08-18 17:14:20,622 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.91 vs. limit=15.0 2024-08-18 17:14:25,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4007490.0, ans=0.125 2024-08-18 17:14:27,658 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 24 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-18 17:14:38,186 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 15 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-18 17:14:38,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4007590.0, ans=0.1 2024-08-18 17:14:39,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4007590.0, ans=0.125 2024-08-18 17:14:39,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4007590.0, ans=0.0 2024-08-18 17:14:43,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4007590.0, ans=0.125 2024-08-18 17:14:50,369 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 14150, loss[loss=0.0912, beats_loss=0.009696, ecapa_loss=0.0001432, whisper_loss=0.08008, over 14443.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01059, ecapa_loss=0.0001423, whisper_loss=0.09042, over 3914837.35 frames. ], batch size: 60, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:14:58,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4007690.0, ans=0.125 2024-08-18 17:14:59,809 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-18 17:15:33,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4007990.0, ans=0.1 2024-08-18 17:15:35,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4007990.0, ans=0.0 2024-08-18 17:15:42,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4008090.0, ans=0.0 2024-08-18 17:15:45,089 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.06 vs. limit=15.0 2024-08-18 17:15:47,909 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.262e+01 2.524e+01 2.804e+01 4.310e+01, threshold=5.048e+01, percent-clipped=0.0 2024-08-18 17:15:54,109 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.95 vs. limit=10.0 2024-08-18 17:15:54,603 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 14200, loss[loss=0.08368, beats_loss=0.01014, ecapa_loss=0.0001549, whisper_loss=0.07199, over 19510.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01059, ecapa_loss=0.0001426, whisper_loss=0.08989, over 3931784.00 frames. ], batch size: 80, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:16:10,130 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.34 vs. limit=15.0 2024-08-18 17:16:12,792 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.73 vs. limit=22.5 2024-08-18 17:16:16,951 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 14 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-18 17:16:31,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4008490.0, ans=0.125 2024-08-18 17:16:43,877 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.518e-03 2024-08-18 17:16:44,965 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-18 17:16:52,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4008590.0, ans=0.2 2024-08-18 17:16:57,440 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 14250, loss[loss=0.1138, beats_loss=0.009188, ecapa_loss=0.0001264, whisper_loss=0.1033, over 19936.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01054, ecapa_loss=0.0001419, whisper_loss=0.09038, over 3923213.10 frames. ], batch size: 74, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:17:01,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4008690.0, ans=0.125 2024-08-18 17:17:12,091 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.65 vs. limit=22.5 2024-08-18 17:17:13,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4008790.0, ans=0.2 2024-08-18 17:17:27,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4008890.0, ans=0.0 2024-08-18 17:17:28,489 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-18 17:17:46,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4008990.0, ans=0.0 2024-08-18 17:17:55,598 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.883e+01 2.243e+01 2.470e+01 2.770e+01 7.680e+01, threshold=4.941e+01, percent-clipped=2.0 2024-08-18 17:17:59,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4009090.0, ans=0.125 2024-08-18 17:18:01,603 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 14300, loss[loss=0.09399, beats_loss=0.01001, ecapa_loss=0.0001451, whisper_loss=0.08253, over 18899.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01054, ecapa_loss=0.0001413, whisper_loss=0.09016, over 3918394.88 frames. ], batch size: 75, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:18:08,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4009190.0, ans=10.0 2024-08-18 17:18:18,466 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 22 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-18 17:18:36,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4009390.0, ans=0.0 2024-08-18 17:18:36,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4009390.0, ans=0.0 2024-08-18 17:18:48,608 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-18 17:18:50,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4009490.0, ans=0.0 2024-08-18 17:19:05,868 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 14350, loss[loss=0.09767, beats_loss=0.01303, ecapa_loss=0.0001437, whisper_loss=0.0832, over 21750.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01051, ecapa_loss=0.0001413, whisper_loss=0.09008, over 3913101.50 frames. ], batch size: 92, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:19:06,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4009690.0, ans=0.125 2024-08-18 17:19:10,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4009690.0, ans=0.0 2024-08-18 17:19:18,486 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 20 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-18 17:19:18,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4009790.0, ans=0.0 2024-08-18 17:19:29,962 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 22 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-18 17:19:31,199 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-18 17:19:36,903 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-18 17:19:51,772 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 23 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-18 17:20:04,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4010090.0, ans=0.125 2024-08-18 17:20:05,175 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.17 vs. limit=22.5 2024-08-18 17:20:05,971 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.040e+01 2.366e+01 2.598e+01 2.848e+01 4.928e+01, threshold=5.195e+01, percent-clipped=0.0 2024-08-18 17:20:13,170 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 14400, loss[loss=0.1107, beats_loss=0.008264, ecapa_loss=0.0001677, whisper_loss=0.1008, over 17218.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01039, ecapa_loss=0.0001431, whisper_loss=0.09075, over 3896170.65 frames. ], batch size: 70, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:20:14,428 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 23 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-18 17:20:21,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4010190.0, ans=0.125 2024-08-18 17:20:22,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4010190.0, ans=0.125 2024-08-18 17:20:25,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4010290.0, ans=0.0 2024-08-18 17:20:37,580 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-18 17:20:40,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4010390.0, ans=0.125 2024-08-18 17:20:49,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4010390.0, ans=0.125 2024-08-18 17:20:50,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4010390.0, ans=0.2 2024-08-18 17:21:00,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4010490.0, ans=0.1 2024-08-18 17:21:21,355 INFO [train_multi_KD3.py:1116] (0/4) Epoch 27, batch 14450, loss[loss=0.107, beats_loss=0.01029, ecapa_loss=0.0001523, whisper_loss=0.09516, over 22728.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01043, ecapa_loss=0.0001437, whisper_loss=0.09029, over 3862286.80 frames. ], batch size: 91, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:21:22,770 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 17 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-18 17:21:26,727 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.70 vs. limit=12.0 2024-08-18 17:21:35,698 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=10.06 vs. limit=12.0 2024-08-18 17:21:48,514 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4010890.0, ans=0.0 2024-08-18 17:21:52,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4010890.0, ans=0.125 2024-08-18 17:21:59,461 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-18 17:22:14,566 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/epoch-27.pt 2024-08-18 17:22:35,122 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 0, loss[loss=0.1289, beats_loss=0.008441, ecapa_loss=0.0001345, whisper_loss=0.1191, over 17401.00 frames. ], tot_loss[loss=0.1289, beats_loss=0.008441, ecapa_loss=0.0001345, whisper_loss=0.1191, over 17401.00 frames. ], batch size: 66, lr: 2.21e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:22:35,123 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-18 17:23:13,200 INFO [train_multi_KD3.py:1149] (0/4) Epoch 28, validation on ASR_libri: loss=0.253, beats_loss=0, ecapa_loss=0.000516, whisper_loss=0.2479, over 922467.00 frames. 2024-08-18 17:23:27,345 INFO [train_multi_KD3.py:1149] (0/4) Epoch 28, validation on SV_voxceleb1: loss=0.004085, beats_loss=0, ecapa_loss=0.0004085, whisper_loss=0, over 939242.00 frames. 2024-08-18 17:24:19,800 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.7192, 1.6447, 1.7849, 1.1996, 1.4377, 1.8639, 2.2964, 1.3513], device='cuda:0') 2024-08-18 17:25:15,827 INFO [train_multi_KD3.py:1149] (0/4) Epoch 28, validation on AT_audioset: loss=0.02306, beats_loss=0.02306, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 17:25:15,830 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32389MB 2024-08-18 17:25:17,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4011080.0, ans=0.1 2024-08-18 17:25:30,064 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.806e+01 2.404e+01 2.660e+01 3.020e+01 3.509e+02, threshold=5.320e+01, percent-clipped=1.0 2024-08-18 17:25:36,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4011080.0, ans=0.125 2024-08-18 17:26:10,361 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 13 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-18 17:27:01,287 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-18 17:27:14,535 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 50, loss[loss=0.09156, beats_loss=0.008028, ecapa_loss=0.0001186, whisper_loss=0.08235, over 16882.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.009172, ecapa_loss=0.0001491, whisper_loss=0.08948, over 890540.20 frames. ], batch size: 62, lr: 2.21e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:27:29,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4011580.0, ans=0.0 2024-08-18 17:27:31,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4011580.0, ans=0.125 2024-08-18 17:27:33,624 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 23 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-18 17:28:06,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=4011780.0, ans=10.0 2024-08-18 17:28:10,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4011780.0, ans=0.1 2024-08-18 17:28:13,582 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=15.0 2024-08-18 17:28:43,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4011980.0, ans=0.125 2024-08-18 17:29:03,709 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 100, loss[loss=0.09045, beats_loss=0.01042, ecapa_loss=0.0001437, whisper_loss=0.07859, over 23372.00 frames. ], tot_loss[loss=0.101, beats_loss=0.009168, ecapa_loss=0.0001484, whisper_loss=0.09036, over 1543141.05 frames. ], batch size: 94, lr: 2.21e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:29:11,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4012080.0, ans=0.0 2024-08-18 17:29:15,948 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.274e+01 2.549e+01 2.774e+01 3.166e+01 3.794e+01, threshold=5.547e+01, percent-clipped=0.0 2024-08-18 17:29:29,893 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-18 17:29:42,631 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-18 17:29:56,743 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 16 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-18 17:30:19,446 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 35 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-18 17:30:25,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4012480.0, ans=0.125 2024-08-18 17:30:38,270 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-18 17:30:41,429 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 150, loss[loss=0.103, beats_loss=0.008131, ecapa_loss=0.0001839, whisper_loss=0.09299, over 19298.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.009129, ecapa_loss=0.0001484, whisper_loss=0.09099, over 2054661.49 frames. ], batch size: 80, lr: 2.21e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:30:50,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4012580.0, ans=0.0 2024-08-18 17:30:56,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4012580.0, ans=0.0 2024-08-18 17:31:01,336 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.13 vs. limit=15.0 2024-08-18 17:31:11,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4012680.0, ans=0.1 2024-08-18 17:31:32,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4012880.0, ans=0.125 2024-08-18 17:31:35,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4012880.0, ans=0.0 2024-08-18 17:31:39,614 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 24 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-18 17:31:51,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4012980.0, ans=0.125 2024-08-18 17:31:55,255 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.27 vs. limit=15.0 2024-08-18 17:31:59,235 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 200, loss[loss=0.08819, beats_loss=0.009582, ecapa_loss=0.0001602, whisper_loss=0.07701, over 18499.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.009351, ecapa_loss=0.000147, whisper_loss=0.09101, over 2430199.84 frames. ], batch size: 73, lr: 2.21e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:32:08,137 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.367e+01 2.619e+01 2.925e+01 1.442e+02, threshold=5.239e+01, percent-clipped=3.0 2024-08-18 17:32:49,109 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.51 vs. limit=15.0 2024-08-18 17:32:49,823 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-18 17:33:00,014 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-18 17:33:02,893 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 28 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-18 17:33:03,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4013480.0, ans=0.0 2024-08-18 17:33:08,133 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 250, loss[loss=0.08846, beats_loss=0.01221, ecapa_loss=0.0001481, whisper_loss=0.07477, over 22482.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.009592, ecapa_loss=0.0001469, whisper_loss=0.09101, over 2760472.93 frames. ], batch size: 91, lr: 2.21e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:33:11,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4013580.0, ans=0.0 2024-08-18 17:33:14,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4013580.0, ans=0.125 2024-08-18 17:33:19,671 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 29 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-18 17:33:34,155 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 20 from LS+wenet, 28 from Vox, 44 fro AS 2024-08-18 17:33:34,744 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.61 vs. limit=22.5 2024-08-18 17:33:36,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4013780.0, ans=0.0 2024-08-18 17:33:39,808 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-18 17:33:56,155 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-18 17:34:09,372 WARNING [optim.py:496] (0/4) Scaling gradients by 0.03464806452393532, model_norm_threshold=52.38976287841797 2024-08-18 17:34:09,537 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.19, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.434e+05, grad_sumsq=4.434e+05, orig_rms_sq=1.000e+00 2024-08-18 17:34:12,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4013980.0, ans=0.0 2024-08-18 17:34:13,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4013980.0, ans=0.125 2024-08-18 17:34:15,889 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 300, loss[loss=0.09414, beats_loss=0.01039, ecapa_loss=0.0001388, whisper_loss=0.08235, over 15358.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.009862, ecapa_loss=0.000146, whisper_loss=0.09014, over 3008181.36 frames. ], batch size: 59, lr: 2.21e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:34:23,787 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+01 2.275e+01 2.490e+01 2.772e+01 1.512e+03, threshold=4.979e+01, percent-clipped=1.0 2024-08-18 17:34:30,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4014180.0, ans=0.2 2024-08-18 17:34:38,817 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.65 vs. limit=22.5 2024-08-18 17:34:40,661 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 14 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-18 17:34:55,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4014380.0, ans=0.1 2024-08-18 17:35:08,956 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.24 vs. limit=22.5 2024-08-18 17:35:14,457 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-18 17:35:19,495 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 350, loss[loss=0.1028, beats_loss=0.01146, ecapa_loss=0.0001537, whisper_loss=0.08984, over 21595.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01002, ecapa_loss=0.0001457, whisper_loss=0.089, over 3182045.33 frames. ], batch size: 91, lr: 2.21e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:35:36,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4014680.0, ans=0.1 2024-08-18 17:35:49,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4014780.0, ans=0.125 2024-08-18 17:35:49,758 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.09 vs. limit=15.0 2024-08-18 17:35:52,817 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-18 17:35:56,718 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 17 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-18 17:35:59,437 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4014880.0, ans=0.125 2024-08-18 17:36:12,658 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 22 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-18 17:36:20,944 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 400, loss[loss=0.1213, beats_loss=0.008931, ecapa_loss=0.0001255, whisper_loss=0.1112, over 20281.00 frames. ], tot_loss[loss=0.09969, beats_loss=0.01018, ecapa_loss=0.0001456, whisper_loss=0.08805, over 3333835.53 frames. ], batch size: 76, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:36:28,115 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.258e+01 2.514e+01 2.865e+01 8.622e+01, threshold=5.028e+01, percent-clipped=3.0 2024-08-18 17:36:28,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4015080.0, ans=0.2 2024-08-18 17:36:32,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4015180.0, ans=0.1 2024-08-18 17:36:32,906 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.65 vs. limit=15.0 2024-08-18 17:36:42,687 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.03 vs. limit=15.0 2024-08-18 17:36:42,720 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.18 vs. limit=15.0 2024-08-18 17:36:43,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4015180.0, ans=0.1 2024-08-18 17:36:48,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4015280.0, ans=0.1 2024-08-18 17:37:00,724 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 30 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-18 17:37:01,890 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-18 17:37:04,125 WARNING [optim.py:496] (0/4) Scaling gradients by 0.06247842684388161, model_norm_threshold=50.280113220214844 2024-08-18 17:37:04,284 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.0.norm.log_scale with proportion 0.15, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.000e+05, grad_sumsq=1.000e+05, orig_rms_sq=1.000e+00 2024-08-18 17:37:05,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4015380.0, ans=0.1 2024-08-18 17:37:08,310 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.839e+00 2024-08-18 17:37:10,912 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 20 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-18 17:37:11,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4015480.0, ans=0.5 2024-08-18 17:37:12,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=4015480.0, ans=0.95 2024-08-18 17:37:15,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4015480.0, ans=0.2 2024-08-18 17:37:21,360 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.04 vs. limit=15.0 2024-08-18 17:37:23,113 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 450, loss[loss=0.1083, beats_loss=0.009656, ecapa_loss=0.000137, whisper_loss=0.09723, over 16155.00 frames. ], tot_loss[loss=0.1, beats_loss=0.0103, ecapa_loss=0.0001454, whisper_loss=0.08826, over 3448072.61 frames. ], batch size: 65, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:37:25,656 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 24 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-18 17:37:29,369 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-18 17:37:33,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4015580.0, ans=0.125 2024-08-18 17:37:40,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4015680.0, ans=0.125 2024-08-18 17:37:45,618 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 26 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-18 17:37:49,183 WARNING [optim.py:496] (0/4) Scaling gradients by 0.02706790715456009, model_norm_threshold=50.280113220214844 2024-08-18 17:37:49,344 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.11, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.902e+05, grad_sumsq=3.776e+07, orig_rms_sq=1.033e-02 2024-08-18 17:38:05,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4015880.0, ans=0.125 2024-08-18 17:38:06,317 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.33 vs. limit=15.0 2024-08-18 17:38:20,524 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 26 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-18 17:38:25,131 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 500, loss[loss=0.1099, beats_loss=0.01048, ecapa_loss=0.0001571, whisper_loss=0.09787, over 17328.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.0103, ecapa_loss=0.0001453, whisper_loss=0.08873, over 3517247.92 frames. ], batch size: 73, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:38:25,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4016080.0, ans=0.125 2024-08-18 17:38:28,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4016080.0, ans=0.125 2024-08-18 17:38:31,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4016080.0, ans=0.0 2024-08-18 17:38:32,625 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.374e+01 2.639e+01 2.877e+01 1.858e+03, threshold=5.278e+01, percent-clipped=3.0 2024-08-18 17:38:39,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4016180.0, ans=0.2 2024-08-18 17:38:41,352 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-18 17:38:45,179 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=8.778e+00 2024-08-18 17:39:16,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4016480.0, ans=0.1 2024-08-18 17:39:27,548 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 550, loss[loss=0.07765, beats_loss=0.01179, ecapa_loss=0.0001457, whisper_loss=0.0644, over 17991.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01032, ecapa_loss=0.0001453, whisper_loss=0.08907, over 3605114.27 frames. ], batch size: 74, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:39:33,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4016580.0, ans=0.125 2024-08-18 17:39:40,000 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 24 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-18 17:39:47,749 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 27 from LS+wenet, 10 from Vox, 28 fro AS 2024-08-18 17:39:48,972 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-18 17:39:59,011 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 20 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-18 17:40:00,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4016780.0, ans=0.125 2024-08-18 17:40:19,846 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 25 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-18 17:40:21,723 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.28 vs. limit=15.0 2024-08-18 17:40:27,239 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 34 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-18 17:40:29,760 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 600, loss[loss=0.07843, beats_loss=0.0124, ecapa_loss=0.0001158, whisper_loss=0.06487, over 14365.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01035, ecapa_loss=0.0001433, whisper_loss=0.0894, over 3667736.74 frames. ], batch size: 55, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:40:31,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4017080.0, ans=0.2 2024-08-18 17:40:33,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4017080.0, ans=0.125 2024-08-18 17:40:34,640 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 27 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-18 17:40:34,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4017080.0, ans=0.0 2024-08-18 17:40:36,910 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.771e+01 2.365e+01 2.589e+01 2.843e+01 3.555e+01, threshold=5.178e+01, percent-clipped=0.0 2024-08-18 17:40:37,082 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-18 17:40:55,787 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-18 17:40:57,113 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-18 17:40:57,796 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.99 vs. limit=15.0 2024-08-18 17:40:58,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4017280.0, ans=0.2 2024-08-18 17:41:06,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4017380.0, ans=0.1 2024-08-18 17:41:31,741 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 650, loss[loss=0.09035, beats_loss=0.01179, ecapa_loss=0.0001334, whisper_loss=0.07722, over 19281.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01035, ecapa_loss=0.0001428, whisper_loss=0.08936, over 3714845.03 frames. ], batch size: 78, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:41:37,739 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-18 17:41:48,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4017680.0, ans=0.1 2024-08-18 17:42:04,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4017780.0, ans=0.125 2024-08-18 17:42:24,554 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-18 17:42:30,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4017980.0, ans=0.0 2024-08-18 17:42:34,214 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 700, loss[loss=0.1207, beats_loss=0.008636, ecapa_loss=0.0001661, whisper_loss=0.1104, over 24014.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01032, ecapa_loss=0.000143, whisper_loss=0.0894, over 3758496.13 frames. ], batch size: 92, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:42:40,139 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 17 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-18 17:42:40,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4018080.0, ans=0.125 2024-08-18 17:42:41,475 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.261e+01 2.566e+01 2.914e+01 5.332e+01, threshold=5.131e+01, percent-clipped=1.0 2024-08-18 17:42:49,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4018180.0, ans=0.125 2024-08-18 17:42:59,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4018280.0, ans=0.125 2024-08-18 17:43:16,561 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-18 17:43:26,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4018480.0, ans=0.0 2024-08-18 17:43:28,455 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.06 vs. limit=12.0 2024-08-18 17:43:36,039 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 750, loss[loss=0.09348, beats_loss=0.01216, ecapa_loss=0.0001325, whisper_loss=0.08, over 18467.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01046, ecapa_loss=0.0001419, whisper_loss=0.08848, over 3756801.22 frames. ], batch size: 76, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:43:37,936 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.62 vs. limit=22.5 2024-08-18 17:43:44,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4018580.0, ans=0.125 2024-08-18 17:43:47,449 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-18 17:43:47,715 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.040e+00 2024-08-18 17:43:48,051 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.48 vs. limit=10.0 2024-08-18 17:43:48,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4018680.0, ans=0.1 2024-08-18 17:44:05,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4018780.0, ans=0.125 2024-08-18 17:44:12,482 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 17:44:13,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=4018880.0, ans=0.5 2024-08-18 17:44:13,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4018880.0, ans=0.0 2024-08-18 17:44:22,109 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 28 from LS+wenet, 26 from Vox, 19 fro AS 2024-08-18 17:44:33,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4018980.0, ans=0.125 2024-08-18 17:44:38,373 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 800, loss[loss=0.09858, beats_loss=0.01036, ecapa_loss=0.0001218, whisper_loss=0.087, over 16541.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01041, ecapa_loss=0.0001427, whisper_loss=0.08884, over 3795881.49 frames. ], batch size: 62, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:44:39,029 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-08-18 17:44:45,669 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.732e+01 2.204e+01 2.473e+01 2.754e+01 3.605e+01, threshold=4.946e+01, percent-clipped=0.0 2024-08-18 17:44:48,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4019080.0, ans=0.125 2024-08-18 17:44:49,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4019180.0, ans=0.0 2024-08-18 17:44:49,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4019180.0, ans=0.0 2024-08-18 17:44:51,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4019180.0, ans=0.2 2024-08-18 17:45:05,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4019280.0, ans=0.125 2024-08-18 17:45:39,350 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.86 vs. limit=8.0 2024-08-18 17:45:39,603 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 12 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-18 17:45:40,590 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 850, loss[loss=0.06293, beats_loss=0.01091, ecapa_loss=0.0001268, whisper_loss=0.05075, over 17553.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01041, ecapa_loss=0.0001416, whisper_loss=0.08852, over 3816767.64 frames. ], batch size: 69, lr: 2.20e-03, grad_scale: 1.152921504606847e+18 2024-08-18 17:45:46,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4019580.0, ans=0.125 2024-08-18 17:46:04,526 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-18 17:46:10,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4019780.0, ans=0.0 2024-08-18 17:46:16,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4019780.0, ans=0.125 2024-08-18 17:46:17,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4019880.0, ans=0.125 2024-08-18 17:46:17,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4019880.0, ans=0.95 2024-08-18 17:46:19,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4019880.0, ans=0.125 2024-08-18 17:46:36,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4019980.0, ans=0.125 2024-08-18 17:46:37,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4019980.0, ans=0.125 2024-08-18 17:46:42,975 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 900, loss[loss=0.1096, beats_loss=0.01115, ecapa_loss=0.0001378, whisper_loss=0.09705, over 21357.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01036, ecapa_loss=0.0001416, whisper_loss=0.08894, over 3821402.90 frames. ], batch size: 85, lr: 2.20e-03, grad_scale: 1.152921504606847e+18 2024-08-18 17:46:50,212 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.258e+01 2.407e+01 2.605e+01 4.279e+01, threshold=4.815e+01, percent-clipped=0.0 2024-08-18 17:47:07,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4020280.0, ans=0.0 2024-08-18 17:47:31,596 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-18 17:47:42,978 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 24 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-18 17:47:45,247 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 950, loss[loss=0.1154, beats_loss=0.01073, ecapa_loss=0.0001564, whisper_loss=0.1031, over 18874.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01036, ecapa_loss=0.0001412, whisper_loss=0.08895, over 3823995.30 frames. ], batch size: 73, lr: 2.20e-03, grad_scale: 1.152921504606847e+18 2024-08-18 17:47:46,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4020580.0, ans=0.1 2024-08-18 17:48:00,374 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-18 17:48:06,611 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 35 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-18 17:48:19,476 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.69 vs. limit=15.0 2024-08-18 17:48:20,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4020780.0, ans=0.125 2024-08-18 17:48:47,222 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 1000, loss[loss=0.0834, beats_loss=0.01247, ecapa_loss=0.0001517, whisper_loss=0.06941, over 20393.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01035, ecapa_loss=0.0001414, whisper_loss=0.08965, over 3828204.07 frames. ], batch size: 88, lr: 2.20e-03, grad_scale: 1.152921504606847e+18 2024-08-18 17:48:52,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4021080.0, ans=0.0 2024-08-18 17:48:54,811 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.721e+01 2.179e+01 2.463e+01 2.751e+01 3.706e+01, threshold=4.926e+01, percent-clipped=0.0 2024-08-18 17:48:55,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4021080.0, ans=0.125 2024-08-18 17:49:05,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4021180.0, ans=0.0 2024-08-18 17:49:11,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4021280.0, ans=0.0 2024-08-18 17:49:13,308 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 19 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-18 17:49:20,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4021280.0, ans=0.125 2024-08-18 17:49:26,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4021380.0, ans=0.125 2024-08-18 17:49:27,303 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.16 vs. limit=22.5 2024-08-18 17:49:39,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4021480.0, ans=0.95 2024-08-18 17:49:46,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4021480.0, ans=0.1 2024-08-18 17:49:50,268 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 1050, loss[loss=0.07316, beats_loss=0.01225, ecapa_loss=0.0001188, whisper_loss=0.05971, over 15872.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01041, ecapa_loss=0.0001407, whisper_loss=0.08916, over 3806428.86 frames. ], batch size: 62, lr: 2.20e-03, grad_scale: 1.152921504606847e+18 2024-08-18 17:49:51,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4021580.0, ans=0.0 2024-08-18 17:50:02,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4021680.0, ans=0.0 2024-08-18 17:50:04,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4021680.0, ans=0.2 2024-08-18 17:50:15,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4021780.0, ans=0.2 2024-08-18 17:50:19,699 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.26 vs. limit=15.0 2024-08-18 17:50:25,585 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-18 17:50:31,655 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.88 vs. limit=15.0 2024-08-18 17:50:41,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4021980.0, ans=0.1 2024-08-18 17:50:53,998 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 1100, loss[loss=0.114, beats_loss=0.008607, ecapa_loss=0.000141, whisper_loss=0.104, over 23719.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01034, ecapa_loss=0.0001408, whisper_loss=0.08894, over 3788505.93 frames. ], batch size: 90, lr: 2.20e-03, grad_scale: 1.152921504606847e+18 2024-08-18 17:51:01,990 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.946e+01 2.358e+01 2.562e+01 2.926e+01 4.573e+02, threshold=5.124e+01, percent-clipped=2.0 2024-08-18 17:51:02,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4022080.0, ans=0.125 2024-08-18 17:51:12,476 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 25 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-18 17:51:15,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4022180.0, ans=0.125 2024-08-18 17:51:42,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4022380.0, ans=0.1 2024-08-18 17:51:45,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4022480.0, ans=0.125 2024-08-18 17:51:48,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4022480.0, ans=0.09899494936611666 2024-08-18 17:51:58,586 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 1150, loss[loss=0.08639, beats_loss=0.01131, ecapa_loss=0.000147, whisper_loss=0.07361, over 16530.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01025, ecapa_loss=0.0001416, whisper_loss=0.08989, over 3785303.75 frames. ], batch size: 65, lr: 2.20e-03, grad_scale: 1.152921504606847e+18 2024-08-18 17:52:10,628 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=8.889e+00 2024-08-18 17:52:14,364 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.42 vs. limit=6.0 2024-08-18 17:52:24,650 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 13 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-18 17:52:26,127 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 17 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-18 17:52:30,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4022780.0, ans=0.125 2024-08-18 17:52:35,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4022780.0, ans=0.125 2024-08-18 17:52:47,526 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 19 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-18 17:53:05,525 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 1200, loss[loss=0.09294, beats_loss=0.01274, ecapa_loss=0.0001454, whisper_loss=0.07875, over 17916.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01029, ecapa_loss=0.000142, whisper_loss=0.08924, over 3775666.29 frames. ], batch size: 73, lr: 2.20e-03, grad_scale: 1.152921504606847e+18 2024-08-18 17:53:13,802 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.256e+01 2.483e+01 2.794e+01 3.745e+01, threshold=4.967e+01, percent-clipped=0.0 2024-08-18 17:53:18,570 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4023180.0, ans=10.0 2024-08-18 17:53:39,787 WARNING [optim.py:496] (0/4) Scaling gradients by 0.03793781250715256, model_norm_threshold=49.66889572143555 2024-08-18 17:53:39,953 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.273e+05, grad_sumsq=2.197e+07, orig_rms_sq=1.035e-02 2024-08-18 17:53:58,713 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 32 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-18 17:54:11,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4023480.0, ans=0.95 2024-08-18 17:54:14,825 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 1250, loss[loss=0.1119, beats_loss=0.009747, ecapa_loss=0.0001404, whisper_loss=0.1007, over 17592.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01045, ecapa_loss=0.0001412, whisper_loss=0.08917, over 3818747.26 frames. ], batch size: 68, lr: 2.20e-03, grad_scale: 1.152921504606847e+18 2024-08-18 17:54:16,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4023580.0, ans=0.1 2024-08-18 17:54:25,941 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-18 17:54:28,395 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.79 vs. limit=15.0 2024-08-18 17:54:33,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4023680.0, ans=0.2 2024-08-18 17:54:34,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4023680.0, ans=0.0 2024-08-18 17:54:42,000 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-18 17:54:51,710 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 27 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-18 17:54:53,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4023780.0, ans=0.0 2024-08-18 17:55:00,495 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 26 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-18 17:55:12,506 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.00 vs. limit=10.0 2024-08-18 17:55:16,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=4023980.0, ans=15.0 2024-08-18 17:55:29,671 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 1300, loss[loss=0.1164, beats_loss=0.01245, ecapa_loss=0.00011, whisper_loss=0.1028, over 14271.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01042, ecapa_loss=0.0001413, whisper_loss=0.08981, over 3826874.65 frames. ], batch size: 54, lr: 2.20e-03, grad_scale: 1.152921504606847e+18 2024-08-18 17:55:31,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4024080.0, ans=0.125 2024-08-18 17:55:38,800 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.331e+01 2.596e+01 3.040e+01 1.309e+03, threshold=5.193e+01, percent-clipped=2.0 2024-08-18 17:55:39,002 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-18 17:55:47,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4024180.0, ans=0.125 2024-08-18 17:55:53,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4024180.0, ans=0.125 2024-08-18 17:55:54,868 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-18 17:56:04,903 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.44 vs. limit=15.0 2024-08-18 17:56:06,237 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-18 17:56:31,759 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.84 vs. limit=12.0 2024-08-18 17:56:42,450 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 1350, loss[loss=0.088, beats_loss=0.00886, ecapa_loss=0.0001273, whisper_loss=0.07786, over 14921.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01039, ecapa_loss=0.000141, whisper_loss=0.0901, over 3833184.26 frames. ], batch size: 55, lr: 2.20e-03, grad_scale: 1.152921504606847e+18 2024-08-18 17:56:42,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4024580.0, ans=0.1 2024-08-18 17:56:58,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4024680.0, ans=0.125 2024-08-18 17:57:00,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4024680.0, ans=0.125 2024-08-18 17:57:02,290 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.85 vs. limit=15.0 2024-08-18 17:57:16,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4024780.0, ans=0.125 2024-08-18 17:57:34,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4024880.0, ans=0.125 2024-08-18 17:57:48,850 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 14 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-18 17:57:54,317 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 1400, loss[loss=0.1126, beats_loss=0.01076, ecapa_loss=0.0001338, whisper_loss=0.1005, over 22782.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01041, ecapa_loss=0.0001411, whisper_loss=0.08969, over 3809311.31 frames. ], batch size: 92, lr: 2.20e-03, grad_scale: 1.152921504606847e+18 2024-08-18 17:57:56,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4025080.0, ans=0.125 2024-08-18 17:57:59,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4025080.0, ans=0.125 2024-08-18 17:58:02,909 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.705e+01 2.174e+01 2.386e+01 2.635e+01 4.112e+01, threshold=4.772e+01, percent-clipped=0.0 2024-08-18 17:58:04,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4025080.0, ans=0.2 2024-08-18 17:58:27,632 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-18 17:58:42,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4025380.0, ans=0.07 2024-08-18 17:59:06,901 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 1450, loss[loss=0.09665, beats_loss=0.00979, ecapa_loss=0.0001474, whisper_loss=0.08539, over 21789.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01046, ecapa_loss=0.0001411, whisper_loss=0.08909, over 3805954.41 frames. ], batch size: 87, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:59:41,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4025580.0, ans=0.125 2024-08-18 17:59:45,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4025580.0, ans=0.1 2024-08-18 17:59:46,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4025580.0, ans=0.1 2024-08-18 18:00:03,732 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-18 18:00:19,557 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.34 vs. limit=12.0 2024-08-18 18:00:22,607 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.81 vs. limit=10.0 2024-08-18 18:00:23,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4025880.0, ans=0.125 2024-08-18 18:00:31,354 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 31 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-18 18:00:34,725 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-18 18:00:46,944 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 1500, loss[loss=0.1053, beats_loss=0.009673, ecapa_loss=0.0001423, whisper_loss=0.09423, over 16224.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01039, ecapa_loss=0.0001412, whisper_loss=0.08916, over 3786966.85 frames. ], batch size: 63, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 18:00:54,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4026080.0, ans=0.125 2024-08-18 18:00:57,453 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.252e+01 2.528e+01 2.901e+01 4.004e+01, threshold=5.056e+01, percent-clipped=0.0 2024-08-18 18:01:00,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4026180.0, ans=0.125 2024-08-18 18:01:06,142 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-18 18:01:13,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4026180.0, ans=0.125 2024-08-18 18:01:24,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4026280.0, ans=0.125 2024-08-18 18:01:26,590 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.40 vs. limit=22.5 2024-08-18 18:01:46,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4026480.0, ans=0.125 2024-08-18 18:01:59,055 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 1550, loss[loss=0.1213, beats_loss=0.009314, ecapa_loss=0.0001277, whisper_loss=0.1108, over 20375.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01042, ecapa_loss=0.0001404, whisper_loss=0.08898, over 3774981.64 frames. ], batch size: 75, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 18:02:21,141 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-18 18:02:22,877 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.67 vs. limit=6.0 2024-08-18 18:02:23,621 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 22 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-18 18:02:39,101 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 38 from LS+wenet, 13 from Vox, 42 fro AS 2024-08-18 18:02:47,128 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-18 18:02:53,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4026880.0, ans=0.125 2024-08-18 18:03:08,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4027080.0, ans=0.125 2024-08-18 18:03:09,368 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 1600, loss[loss=0.09864, beats_loss=0.01008, ecapa_loss=0.0001208, whisper_loss=0.08735, over 17347.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01038, ecapa_loss=0.00014, whisper_loss=0.08966, over 3812735.19 frames. ], batch size: 66, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 18:03:19,054 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.241e+01 2.454e+01 2.842e+01 4.448e+01, threshold=4.908e+01, percent-clipped=0.0 2024-08-18 18:03:32,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4027180.0, ans=0.1 2024-08-18 18:03:38,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4027280.0, ans=0.0 2024-08-18 18:03:39,613 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-18 18:03:39,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4027280.0, ans=0.07 2024-08-18 18:03:52,038 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 15 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-18 18:03:53,581 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 16 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-18 18:03:57,757 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 22 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-18 18:04:12,497 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.74 vs. limit=15.0 2024-08-18 18:04:18,835 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 1650, loss[loss=0.06352, beats_loss=0.01238, ecapa_loss=0.0001501, whisper_loss=0.04964, over 16439.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01041, ecapa_loss=0.0001405, whisper_loss=0.08933, over 3823398.96 frames. ], batch size: 72, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 18:04:26,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4027580.0, ans=0.0 2024-08-18 18:04:27,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4027580.0, ans=0.125 2024-08-18 18:04:47,659 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.26 vs. limit=15.0 2024-08-18 18:05:01,278 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 24 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-18 18:05:01,898 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.29 vs. limit=22.5 2024-08-18 18:05:13,832 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.37 vs. limit=15.0 2024-08-18 18:05:21,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4027980.0, ans=0.2 2024-08-18 18:05:23,009 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 28 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-18 18:05:24,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4027980.0, ans=0.1 2024-08-18 18:05:26,416 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 1700, loss[loss=0.09829, beats_loss=0.01018, ecapa_loss=0.0001181, whisper_loss=0.08693, over 21443.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01034, ecapa_loss=0.0001405, whisper_loss=0.08916, over 3801049.80 frames. ], batch size: 84, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:05:30,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4028080.0, ans=0.125 2024-08-18 18:05:37,768 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.383e+01 2.589e+01 2.926e+01 5.501e+01, threshold=5.178e+01, percent-clipped=1.0 2024-08-18 18:05:40,663 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 19 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-18 18:05:43,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4028180.0, ans=0.0 2024-08-18 18:05:46,430 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.42 vs. limit=15.0 2024-08-18 18:05:49,521 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 37 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-18 18:06:11,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4028380.0, ans=0.0 2024-08-18 18:06:11,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4028380.0, ans=0.5 2024-08-18 18:06:32,293 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 1750, loss[loss=0.1132, beats_loss=0.009885, ecapa_loss=0.0001447, whisper_loss=0.1019, over 23213.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01031, ecapa_loss=0.0001405, whisper_loss=0.08989, over 3792507.00 frames. ], batch size: 90, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:06:36,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4028580.0, ans=0.125 2024-08-18 18:06:52,948 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 25 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-18 18:07:16,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4028780.0, ans=0.125 2024-08-18 18:07:23,996 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 15 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-18 18:07:45,809 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 23 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-18 18:07:51,183 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 1800, loss[loss=0.08828, beats_loss=0.009971, ecapa_loss=0.0001481, whisper_loss=0.07682, over 20167.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01026, ecapa_loss=0.0001407, whisper_loss=0.08981, over 3777414.82 frames. ], batch size: 82, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:08:03,005 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.849e+01 2.191e+01 2.428e+01 2.696e+01 4.164e+01, threshold=4.856e+01, percent-clipped=0.0 2024-08-18 18:08:15,670 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.54 vs. limit=15.0 2024-08-18 18:08:17,945 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.40 vs. limit=15.0 2024-08-18 18:08:25,651 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-18 18:08:33,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4029280.0, ans=0.125 2024-08-18 18:08:34,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4029380.0, ans=0.125 2024-08-18 18:08:41,839 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 24 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-18 18:08:49,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4029480.0, ans=0.0 2024-08-18 18:08:52,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4029480.0, ans=0.125 2024-08-18 18:08:53,340 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-18 18:08:56,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4029480.0, ans=0.0 2024-08-18 18:09:04,171 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 1850, loss[loss=0.1035, beats_loss=0.01047, ecapa_loss=0.0001247, whisper_loss=0.09181, over 15732.00 frames. ], tot_loss[loss=0.101, beats_loss=0.0103, ecapa_loss=0.0001403, whisper_loss=0.08931, over 3790566.42 frames. ], batch size: 63, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:09:07,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4029580.0, ans=0.1 2024-08-18 18:09:13,346 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 24 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-18 18:09:14,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4029580.0, ans=0.0 2024-08-18 18:09:28,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4029680.0, ans=0.125 2024-08-18 18:09:55,786 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.09 vs. limit=12.0 2024-08-18 18:10:03,868 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-18 18:10:15,856 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 1900, loss[loss=0.1075, beats_loss=0.009695, ecapa_loss=0.0001471, whisper_loss=0.09635, over 22722.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01035, ecapa_loss=0.0001404, whisper_loss=0.08906, over 3810824.63 frames. ], batch size: 89, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:10:16,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4030080.0, ans=0.1 2024-08-18 18:10:20,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4030080.0, ans=0.125 2024-08-18 18:10:22,802 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 29 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-18 18:10:25,046 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.38 vs. limit=22.5 2024-08-18 18:10:27,310 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.264e+01 2.517e+01 2.852e+01 3.741e+01, threshold=5.034e+01, percent-clipped=0.0 2024-08-18 18:10:44,309 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-18 18:10:44,619 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 18:11:13,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4030480.0, ans=0.125 2024-08-18 18:11:27,276 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 1950, loss[loss=0.0964, beats_loss=0.01085, ecapa_loss=0.0001824, whisper_loss=0.08373, over 20602.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01036, ecapa_loss=0.000141, whisper_loss=0.08878, over 3805805.12 frames. ], batch size: 92, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:11:35,036 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2024-08-18 18:11:39,672 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 17 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-18 18:11:45,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4030680.0, ans=0.1 2024-08-18 18:11:53,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4030780.0, ans=0.125 2024-08-18 18:12:04,679 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 24 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-18 18:12:14,131 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.868e+01 2024-08-18 18:12:31,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4030980.0, ans=0.0 2024-08-18 18:12:38,237 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 2000, loss[loss=0.08796, beats_loss=0.01278, ecapa_loss=0.000106, whisper_loss=0.07412, over 16508.00 frames. ], tot_loss[loss=0.1, beats_loss=0.01042, ecapa_loss=0.0001401, whisper_loss=0.08819, over 3796588.08 frames. ], batch size: 64, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:12:39,953 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-18 18:12:40,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4031080.0, ans=0.1 2024-08-18 18:12:48,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4031080.0, ans=0.125 2024-08-18 18:12:49,414 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.230e+01 2.583e+01 2.898e+01 3.757e+01, threshold=5.165e+01, percent-clipped=0.0 2024-08-18 18:13:03,271 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 19 from LS+wenet, 10 from Vox, 32 fro AS 2024-08-18 18:13:05,668 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 29 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-18 18:13:10,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4031280.0, ans=0.125 2024-08-18 18:13:21,904 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 19 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-18 18:13:29,318 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 22 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-18 18:13:34,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4031480.0, ans=0.0 2024-08-18 18:13:36,623 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4031480.0, ans=0.125 2024-08-18 18:13:40,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4031480.0, ans=0.125 2024-08-18 18:13:42,429 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.91 vs. limit=22.5 2024-08-18 18:13:50,370 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 2050, loss[loss=0.08431, beats_loss=0.01322, ecapa_loss=0.0001587, whisper_loss=0.0695, over 21497.00 frames. ], tot_loss[loss=0.09961, beats_loss=0.01051, ecapa_loss=0.0001394, whisper_loss=0.08771, over 3811140.69 frames. ], batch size: 93, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:13:56,016 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 26 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-18 18:13:58,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4031580.0, ans=0.125 2024-08-18 18:14:01,341 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-18 18:14:06,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4031680.0, ans=0.125 2024-08-18 18:14:16,529 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 18 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-18 18:14:29,108 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 14 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-18 18:14:31,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4031880.0, ans=0.125 2024-08-18 18:14:32,586 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-18 18:14:33,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4031880.0, ans=0.0 2024-08-18 18:14:42,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4031880.0, ans=0.125 2024-08-18 18:14:59,855 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 2100, loss[loss=0.0917, beats_loss=0.009674, ecapa_loss=0.0001568, whisper_loss=0.08046, over 15486.00 frames. ], tot_loss[loss=0.09968, beats_loss=0.01058, ecapa_loss=0.000139, whisper_loss=0.08772, over 3810788.15 frames. ], batch size: 60, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:15:06,968 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-18 18:15:09,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4032080.0, ans=0.125 2024-08-18 18:15:11,528 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.315e+01 2.589e+01 2.844e+01 4.091e+01, threshold=5.179e+01, percent-clipped=0.0 2024-08-18 18:16:01,217 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 23 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-18 18:16:05,603 WARNING [optim.py:496] (0/4) Scaling gradients by 0.018764860928058624, model_norm_threshold=51.787418365478516 2024-08-18 18:16:05,769 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.26, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.003e+06, grad_sumsq=1.941e+08, orig_rms_sq=1.032e-02 2024-08-18 18:16:09,783 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-18 18:16:11,872 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.55 vs. limit=22.5 2024-08-18 18:16:12,685 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 2150, loss[loss=0.1178, beats_loss=0.0109, ecapa_loss=0.000126, whisper_loss=0.1056, over 24155.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01056, ecapa_loss=0.0001375, whisper_loss=0.08896, over 3837928.38 frames. ], batch size: 93, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:16:21,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4032580.0, ans=0.0 2024-08-18 18:16:24,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4032580.0, ans=0.1 2024-08-18 18:16:28,440 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 18 from LS+wenet, 31 from Vox, 29 fro AS 2024-08-18 18:16:29,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4032680.0, ans=0.125 2024-08-18 18:16:30,927 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-18 18:16:36,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4032680.0, ans=0.07 2024-08-18 18:16:39,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4032780.0, ans=0.0 2024-08-18 18:16:42,167 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 28 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-18 18:17:13,517 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 22 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-18 18:17:17,419 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-18 18:17:23,047 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 2200, loss[loss=0.0975, beats_loss=0.01272, ecapa_loss=0.0001416, whisper_loss=0.08336, over 17242.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0106, ecapa_loss=0.0001383, whisper_loss=0.08929, over 3828861.16 frames. ], batch size: 72, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:17:25,526 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 28 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-18 18:17:34,267 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.249e+01 2.473e+01 2.891e+01 2.760e+03, threshold=4.945e+01, percent-clipped=3.0 2024-08-18 18:17:35,804 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 21 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-18 18:17:42,277 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 23 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-18 18:17:43,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4033180.0, ans=0.2 2024-08-18 18:17:44,884 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 18 from LS+wenet, 31 from Vox, 28 fro AS 2024-08-18 18:17:55,297 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-18 18:18:13,965 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 21 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-18 18:18:18,102 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-18 18:18:34,891 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 2250, loss[loss=0.1005, beats_loss=0.01006, ecapa_loss=0.000186, whisper_loss=0.08859, over 22110.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01053, ecapa_loss=0.0001399, whisper_loss=0.08966, over 3827496.27 frames. ], batch size: 90, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:18:43,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4033580.0, ans=0.0 2024-08-18 18:18:45,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4033580.0, ans=10.0 2024-08-18 18:18:48,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4033680.0, ans=0.0 2024-08-18 18:18:59,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4033680.0, ans=0.125 2024-08-18 18:19:05,499 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-18 18:19:05,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4033780.0, ans=0.05 2024-08-18 18:19:10,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4033780.0, ans=0.125 2024-08-18 18:19:26,301 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-18 18:19:28,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4033880.0, ans=0.125 2024-08-18 18:19:31,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4033980.0, ans=0.0 2024-08-18 18:19:44,270 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 2300, loss[loss=0.1134, beats_loss=0.009826, ecapa_loss=0.0001422, whisper_loss=0.1022, over 22160.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01056, ecapa_loss=0.0001396, whisper_loss=0.09002, over 3829012.39 frames. ], batch size: 90, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:19:47,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4034080.0, ans=0.0 2024-08-18 18:19:53,397 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 17 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-18 18:19:55,751 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.937e+01 2.302e+01 2.462e+01 2.661e+01 7.808e+01, threshold=4.924e+01, percent-clipped=1.0 2024-08-18 18:20:03,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4034180.0, ans=0.1 2024-08-18 18:20:08,718 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.77 vs. limit=22.5 2024-08-18 18:20:09,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4034180.0, ans=0.2 2024-08-18 18:20:39,336 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 26 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-18 18:20:49,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4034480.0, ans=0.0 2024-08-18 18:20:49,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4034480.0, ans=0.2 2024-08-18 18:20:52,183 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 2350, loss[loss=0.1091, beats_loss=0.011, ecapa_loss=0.0001381, whisper_loss=0.09671, over 22172.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01051, ecapa_loss=0.0001404, whisper_loss=0.09023, over 3860145.15 frames. ], batch size: 87, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:21:02,075 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 16 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-18 18:21:08,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4034680.0, ans=0.2 2024-08-18 18:21:13,632 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.29 vs. limit=15.0 2024-08-18 18:21:29,836 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-18 18:21:34,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4034880.0, ans=0.015 2024-08-18 18:21:48,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4034980.0, ans=0.125 2024-08-18 18:22:01,329 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 2400, loss[loss=0.1105, beats_loss=0.01099, ecapa_loss=0.0001156, whisper_loss=0.09838, over 19912.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01045, ecapa_loss=0.0001406, whisper_loss=0.09036, over 3882659.55 frames. ], batch size: 78, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:22:02,043 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.46 vs. limit=15.0 2024-08-18 18:22:04,338 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-18 18:22:06,918 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4035080.0, ans=0.2 2024-08-18 18:22:11,483 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.715e+01 2.282e+01 2.511e+01 2.769e+01 4.268e+01, threshold=5.023e+01, percent-clipped=0.0 2024-08-18 18:23:07,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4035480.0, ans=0.0 2024-08-18 18:23:09,682 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 2450, loss[loss=0.07722, beats_loss=0.0129, ecapa_loss=0.0001213, whisper_loss=0.0631, over 18304.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01049, ecapa_loss=0.0001403, whisper_loss=0.08934, over 3852700.10 frames. ], batch size: 74, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:23:18,144 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-18 18:23:20,516 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 27 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-18 18:23:40,497 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 24 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-18 18:23:49,592 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 14 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-18 18:23:58,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4035880.0, ans=0.0 2024-08-18 18:24:11,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4035980.0, ans=0.1 2024-08-18 18:24:24,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4035980.0, ans=0.05 2024-08-18 18:24:30,539 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 2500, loss[loss=0.1079, beats_loss=0.01136, ecapa_loss=0.0001127, whisper_loss=0.09543, over 16129.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0105, ecapa_loss=0.0001397, whisper_loss=0.08948, over 3851991.87 frames. ], batch size: 59, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:24:31,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4036080.0, ans=0.125 2024-08-18 18:24:40,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4036080.0, ans=0.125 2024-08-18 18:24:40,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4036080.0, ans=0.0 2024-08-18 18:24:44,634 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.285e+01 2.484e+01 2.880e+01 1.174e+02, threshold=4.969e+01, percent-clipped=1.0 2024-08-18 18:24:46,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4036180.0, ans=0.0 2024-08-18 18:24:54,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4036180.0, ans=0.2 2024-08-18 18:25:32,713 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 39 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-18 18:25:40,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4036380.0, ans=0.1 2024-08-18 18:25:54,309 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 18:25:54,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4036480.0, ans=0.125 2024-08-18 18:26:03,564 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 2550, loss[loss=0.09673, beats_loss=0.00917, ecapa_loss=0.0001481, whisper_loss=0.08608, over 22957.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01047, ecapa_loss=0.0001407, whisper_loss=0.0899, over 3873923.37 frames. ], batch size: 92, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:26:03,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4036580.0, ans=0.2 2024-08-18 18:26:06,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4036580.0, ans=0.125 2024-08-18 18:26:18,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4036580.0, ans=0.2 2024-08-18 18:26:43,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4036780.0, ans=0.125 2024-08-18 18:26:47,314 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 19 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-18 18:27:09,724 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-18 18:27:26,362 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-18 18:27:29,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=4036980.0, ans=0.5 2024-08-18 18:27:31,650 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 2600, loss[loss=0.08769, beats_loss=0.01412, ecapa_loss=0.0001282, whisper_loss=0.07229, over 13438.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0105, ecapa_loss=0.0001408, whisper_loss=0.08951, over 3861469.91 frames. ], batch size: 55, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:27:31,790 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 28 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-18 18:27:36,156 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 23 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-18 18:27:42,903 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 30 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-18 18:27:43,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4037080.0, ans=0.0 2024-08-18 18:27:43,996 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.385e+01 2.553e+01 2.816e+01 4.584e+01, threshold=5.105e+01, percent-clipped=0.0 2024-08-18 18:27:47,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4037180.0, ans=0.125 2024-08-18 18:27:54,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4037180.0, ans=0.1 2024-08-18 18:28:12,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4037280.0, ans=0.1 2024-08-18 18:28:30,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4037380.0, ans=0.2 2024-08-18 18:28:45,505 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 29 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-18 18:28:47,389 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 32 from Vox, 34 fro AS 2024-08-18 18:28:57,644 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-18 18:28:59,083 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 2650, loss[loss=0.1175, beats_loss=0.009655, ecapa_loss=0.0001233, whisper_loss=0.1066, over 23408.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01041, ecapa_loss=0.0001419, whisper_loss=0.09072, over 3878057.25 frames. ], batch size: 90, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:29:01,046 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 15 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-18 18:29:05,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4037580.0, ans=0.125 2024-08-18 18:29:15,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4037580.0, ans=0.0 2024-08-18 18:29:16,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4037680.0, ans=0.2 2024-08-18 18:29:27,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4037680.0, ans=0.1 2024-08-18 18:30:03,776 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 12 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-18 18:30:16,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4037980.0, ans=0.125 2024-08-18 18:30:18,388 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 21 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-18 18:30:34,412 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-18 18:30:34,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4038080.0, ans=0.2 2024-08-18 18:30:35,765 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 2700, loss[loss=0.1079, beats_loss=0.01018, ecapa_loss=0.0001326, whisper_loss=0.09641, over 21269.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01039, ecapa_loss=0.0001416, whisper_loss=0.09024, over 3879375.72 frames. ], batch size: 81, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:30:40,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4038080.0, ans=0.125 2024-08-18 18:30:42,014 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.79 vs. limit=15.0 2024-08-18 18:30:48,666 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.289e+01 2.510e+01 2.864e+01 4.358e+01, threshold=5.020e+01, percent-clipped=0.0 2024-08-18 18:30:52,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4038180.0, ans=0.1 2024-08-18 18:31:08,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4038280.0, ans=0.0 2024-08-18 18:31:09,816 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.55 vs. limit=10.0 2024-08-18 18:31:32,270 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 23 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-18 18:31:49,081 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 2750, loss[loss=0.09207, beats_loss=0.01105, ecapa_loss=0.0001487, whisper_loss=0.07953, over 19837.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01044, ecapa_loss=0.00014, whisper_loss=0.08948, over 3881210.62 frames. ], batch size: 84, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:31:52,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4038580.0, ans=0.125 2024-08-18 18:31:52,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4038580.0, ans=0.125 2024-08-18 18:31:54,758 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-18 18:32:24,491 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 35 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-18 18:32:27,893 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.81 vs. limit=15.0 2024-08-18 18:32:30,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4038880.0, ans=0.125 2024-08-18 18:32:38,462 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 37 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-18 18:32:48,661 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.83 vs. limit=15.0 2024-08-18 18:32:59,356 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.45 vs. limit=15.0 2024-08-18 18:33:01,158 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 2800, loss[loss=0.1202, beats_loss=0.009931, ecapa_loss=0.0001286, whisper_loss=0.109, over 23629.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01042, ecapa_loss=0.0001398, whisper_loss=0.09011, over 3892339.29 frames. ], batch size: 91, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:33:13,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4039080.0, ans=0.0 2024-08-18 18:33:14,156 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.362e+01 2.601e+01 2.838e+01 4.412e+01, threshold=5.203e+01, percent-clipped=0.0 2024-08-18 18:33:15,850 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 11 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-18 18:33:27,568 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 25 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-18 18:33:41,254 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 26 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-18 18:33:55,279 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.05 vs. limit=6.0 2024-08-18 18:33:58,624 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 18 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-18 18:34:12,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4039480.0, ans=0.1 2024-08-18 18:34:18,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4039480.0, ans=0.1 2024-08-18 18:34:27,144 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.53 vs. limit=10.0 2024-08-18 18:34:28,186 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 2850, loss[loss=0.1145, beats_loss=0.009447, ecapa_loss=0.0001778, whisper_loss=0.1033, over 21035.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01042, ecapa_loss=0.0001402, whisper_loss=0.09037, over 3898833.14 frames. ], batch size: 89, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:34:32,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4039580.0, ans=0.0 2024-08-18 18:34:32,932 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.46 vs. limit=15.0 2024-08-18 18:34:41,185 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2024-08-18 18:35:10,815 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-18 18:35:12,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4039780.0, ans=0.125 2024-08-18 18:35:20,593 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 19 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-18 18:35:24,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4039880.0, ans=0.125 2024-08-18 18:35:45,983 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-404000.pt 2024-08-18 18:35:51,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4039980.0, ans=0.125 2024-08-18 18:36:05,261 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 2900, loss[loss=0.09164, beats_loss=0.01147, ecapa_loss=0.0001324, whisper_loss=0.07885, over 14354.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01049, ecapa_loss=0.0001408, whisper_loss=0.08948, over 3875065.47 frames. ], batch size: 57, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:36:09,752 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.53 vs. limit=15.0 2024-08-18 18:36:11,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4040080.0, ans=0.125 2024-08-18 18:36:21,854 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.256e+01 2.587e+01 2.877e+01 4.773e+01, threshold=5.175e+01, percent-clipped=0.0 2024-08-18 18:36:23,706 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 18 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-18 18:36:36,573 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-18 18:36:42,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4040280.0, ans=0.0 2024-08-18 18:36:47,104 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 34 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-18 18:36:48,382 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 19 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-18 18:36:52,943 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 26 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-18 18:36:53,318 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.50 vs. limit=22.5 2024-08-18 18:37:04,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4040380.0, ans=0.2 2024-08-18 18:37:32,884 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 2950, loss[loss=0.1072, beats_loss=0.01064, ecapa_loss=0.0001372, whisper_loss=0.09522, over 21356.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01052, ecapa_loss=0.000141, whisper_loss=0.08945, over 3887103.83 frames. ], batch size: 87, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:37:35,065 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 24 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-18 18:37:36,945 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 22 from LS+wenet, 31 from Vox, 41 fro AS 2024-08-18 18:37:52,304 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 21 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-18 18:38:18,133 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.61 vs. limit=22.5 2024-08-18 18:38:36,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4040880.0, ans=0.2 2024-08-18 18:38:49,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4040980.0, ans=0.125 2024-08-18 18:38:54,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4040980.0, ans=0.2 2024-08-18 18:39:11,659 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 3000, loss[loss=0.1162, beats_loss=0.01027, ecapa_loss=0.0001256, whisper_loss=0.1047, over 23775.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01053, ecapa_loss=0.0001409, whisper_loss=0.08995, over 3913132.54 frames. ], batch size: 91, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:39:11,660 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-18 18:39:56,882 INFO [train_multi_KD3.py:1149] (0/4) Epoch 28, validation on ASR_libri: loss=0.2533, beats_loss=0, ecapa_loss=0.0005204, whisper_loss=0.2481, over 922467.00 frames. 2024-08-18 18:40:14,588 INFO [train_multi_KD3.py:1149] (0/4) Epoch 28, validation on SV_voxceleb1: loss=0.004036, beats_loss=0, ecapa_loss=0.0004036, whisper_loss=0, over 939242.00 frames. 2024-08-18 18:41:48,131 INFO [train_multi_KD3.py:1149] (0/4) Epoch 28, validation on AT_audioset: loss=0.02306, beats_loss=0.02306, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 18:41:48,135 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32389MB 2024-08-18 18:41:58,249 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.290e+01 2.609e+01 2.861e+01 5.437e+01, threshold=5.217e+01, percent-clipped=0.0 2024-08-18 18:41:58,968 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4041080.0, ans=0.125 2024-08-18 18:42:02,859 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.99 vs. limit=15.0 2024-08-18 18:42:09,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=4041180.0, ans=0.5 2024-08-18 18:42:25,344 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 15 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-18 18:43:42,585 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 3050, loss[loss=0.09207, beats_loss=0.01245, ecapa_loss=0.0001073, whisper_loss=0.07855, over 15620.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01049, ecapa_loss=0.0001414, whisper_loss=0.09067, over 3912088.80 frames. ], batch size: 60, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:43:49,563 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 27 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-18 18:43:51,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4041580.0, ans=0.07 2024-08-18 18:44:20,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4041680.0, ans=0.0 2024-08-18 18:44:28,290 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 18 from LS+wenet, 26 from Vox, 49 fro AS 2024-08-18 18:44:49,971 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 18 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-18 18:44:59,540 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 32 from Vox, 34 fro AS 2024-08-18 18:45:24,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4041980.0, ans=0.0 2024-08-18 18:45:49,931 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 3100, loss[loss=0.1243, beats_loss=0.008739, ecapa_loss=0.0001785, whisper_loss=0.1138, over 23154.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0105, ecapa_loss=0.0001421, whisper_loss=0.0903, over 3880007.38 frames. ], batch size: 96, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:46:10,559 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.308e+01 2.550e+01 2.809e+01 3.973e+01, threshold=5.100e+01, percent-clipped=1.0 2024-08-18 18:46:13,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4042180.0, ans=0.2 2024-08-18 18:46:37,948 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.19 vs. limit=22.5 2024-08-18 18:46:49,336 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.69 vs. limit=15.0 2024-08-18 18:46:52,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4042280.0, ans=0.07 2024-08-18 18:47:00,558 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.45 vs. limit=22.5 2024-08-18 18:47:02,294 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 20 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-18 18:47:04,001 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 29 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-18 18:47:13,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4042380.0, ans=0.125 2024-08-18 18:47:40,848 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 3150, loss[loss=0.1013, beats_loss=0.01096, ecapa_loss=0.0001495, whisper_loss=0.08884, over 18479.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01049, ecapa_loss=0.0001434, whisper_loss=0.09083, over 3897143.64 frames. ], batch size: 74, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:47:51,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4042580.0, ans=0.04949747468305833 2024-08-18 18:47:51,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4042580.0, ans=0.2 2024-08-18 18:47:56,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4042580.0, ans=0.125 2024-08-18 18:47:59,076 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 20 from LS+wenet, 32 from Vox, 42 fro AS 2024-08-18 18:48:09,141 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-18 18:48:14,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=4042680.0, ans=0.1 2024-08-18 18:48:24,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4042780.0, ans=10.0 2024-08-18 18:48:46,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4042880.0, ans=0.125 2024-08-18 18:48:47,352 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.30 vs. limit=22.5 2024-08-18 18:48:51,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4042880.0, ans=0.0 2024-08-18 18:48:51,808 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.67 vs. limit=15.0 2024-08-18 18:48:54,749 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.78 vs. limit=22.5 2024-08-18 18:49:06,956 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 26 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-18 18:49:16,951 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 3200, loss[loss=0.1169, beats_loss=0.00899, ecapa_loss=0.000139, whisper_loss=0.1066, over 23479.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01043, ecapa_loss=0.0001445, whisper_loss=0.0912, over 3852944.10 frames. ], batch size: 91, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:49:17,085 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 30 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-18 18:49:24,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4043080.0, ans=0.125 2024-08-18 18:49:27,750 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 19 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-18 18:49:29,059 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.766e+01 2.339e+01 2.552e+01 3.080e+01 4.481e+01, threshold=5.104e+01, percent-clipped=0.0 2024-08-18 18:49:38,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4043180.0, ans=0.0 2024-08-18 18:50:08,219 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 13 from LS+wenet, 11 from Vox, 34 fro AS 2024-08-18 18:50:08,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4043380.0, ans=0.0 2024-08-18 18:50:27,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4043480.0, ans=0.0 2024-08-18 18:50:34,873 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 3250, loss[loss=0.1023, beats_loss=0.01072, ecapa_loss=0.0001376, whisper_loss=0.09015, over 18227.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01043, ecapa_loss=0.0001444, whisper_loss=0.09091, over 3847766.90 frames. ], batch size: 73, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:50:41,892 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-18 18:50:43,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4043580.0, ans=0.95 2024-08-18 18:50:45,273 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.44 vs. limit=22.5 2024-08-18 18:50:52,873 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 16 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-18 18:50:57,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4043680.0, ans=0.125 2024-08-18 18:51:07,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4043780.0, ans=0.0 2024-08-18 18:51:10,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4043780.0, ans=0.09899494936611666 2024-08-18 18:51:13,294 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-18 18:51:31,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4043880.0, ans=0.2 2024-08-18 18:51:38,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=4043980.0, ans=0.025 2024-08-18 18:51:40,351 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 17 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-18 18:51:48,328 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 19 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-18 18:51:48,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4044080.0, ans=0.125 2024-08-18 18:51:50,247 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 3300, loss[loss=0.09826, beats_loss=0.009385, ecapa_loss=0.0001358, whisper_loss=0.08751, over 15142.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01038, ecapa_loss=0.0001446, whisper_loss=0.0916, over 3860310.51 frames. ], batch size: 57, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:52:03,004 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+01 2.380e+01 2.621e+01 2.872e+01 4.395e+01, threshold=5.242e+01, percent-clipped=0.0 2024-08-18 18:52:13,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4044180.0, ans=0.0 2024-08-18 18:52:23,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4044280.0, ans=0.2 2024-08-18 18:52:52,938 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-18 18:52:56,766 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 24 from LS+wenet, 9 from Vox, 27 fro AS 2024-08-18 18:53:04,812 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 26 from LS+wenet, 11 from Vox, 20 fro AS 2024-08-18 18:53:11,114 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 3350, loss[loss=0.1028, beats_loss=0.007679, ecapa_loss=0.0001463, whisper_loss=0.09367, over 19508.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01034, ecapa_loss=0.0001442, whisper_loss=0.09201, over 3851240.48 frames. ], batch size: 75, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:53:18,184 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-18 18:53:52,390 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-18 18:53:57,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4044880.0, ans=0.125 2024-08-18 18:54:16,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4044980.0, ans=0.125 2024-08-18 18:54:22,058 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 17 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-18 18:54:28,103 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 3400, loss[loss=0.1061, beats_loss=0.01061, ecapa_loss=0.0001356, whisper_loss=0.09418, over 23316.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01036, ecapa_loss=0.0001443, whisper_loss=0.09147, over 3841001.31 frames. ], batch size: 94, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:54:32,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4045080.0, ans=0.0 2024-08-18 18:54:38,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4045080.0, ans=0.125 2024-08-18 18:54:39,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4045080.0, ans=0.0 2024-08-18 18:54:40,788 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.177e+01 2.415e+01 2.723e+01 4.499e+01, threshold=4.829e+01, percent-clipped=0.0 2024-08-18 18:54:56,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4045180.0, ans=0.125 2024-08-18 18:55:11,566 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=8.426e+00 2024-08-18 18:55:14,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4045280.0, ans=0.1 2024-08-18 18:55:16,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4045380.0, ans=0.125 2024-08-18 18:55:23,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4045380.0, ans=0.0 2024-08-18 18:55:39,264 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.94 vs. limit=15.0 2024-08-18 18:55:51,589 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 3450, loss[loss=0.1165, beats_loss=0.006153, ecapa_loss=0.0001636, whisper_loss=0.1087, over 20145.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01043, ecapa_loss=0.0001452, whisper_loss=0.09085, over 3829553.16 frames. ], batch size: 80, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:56:07,589 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 28 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-18 18:56:15,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4045680.0, ans=0.1 2024-08-18 18:56:33,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=4045780.0, ans=0.025 2024-08-18 18:56:35,028 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 25 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-18 18:56:42,861 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.11 vs. limit=15.0 2024-08-18 18:56:57,094 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.30 vs. limit=15.0 2024-08-18 18:57:02,325 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.77 vs. limit=15.0 2024-08-18 18:57:11,329 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 3500, loss[loss=0.1111, beats_loss=0.009409, ecapa_loss=0.0001251, whisper_loss=0.1005, over 19281.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0104, ecapa_loss=0.0001447, whisper_loss=0.0914, over 3861255.20 frames. ], batch size: 75, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:57:18,857 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-18 18:57:23,084 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.329e+01 2.522e+01 2.820e+01 3.952e+01, threshold=5.043e+01, percent-clipped=0.0 2024-08-18 18:57:26,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4046180.0, ans=0.125 2024-08-18 18:58:27,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4046480.0, ans=0.1 2024-08-18 18:58:32,229 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 3550, loss[loss=0.09956, beats_loss=0.008551, ecapa_loss=0.0001779, whisper_loss=0.08923, over 22107.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01045, ecapa_loss=0.0001448, whisper_loss=0.09046, over 3897094.29 frames. ], batch size: 93, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:58:56,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4046680.0, ans=0.125 2024-08-18 18:59:02,851 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 33 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-18 18:59:12,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4046780.0, ans=0.2 2024-08-18 18:59:48,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4046980.0, ans=0.2 2024-08-18 18:59:52,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4046980.0, ans=0.125 2024-08-18 18:59:57,116 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 3600, loss[loss=0.1207, beats_loss=0.009751, ecapa_loss=0.0001185, whisper_loss=0.1097, over 23281.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01041, ecapa_loss=0.0001448, whisper_loss=0.09109, over 3901705.54 frames. ], batch size: 88, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:59:57,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4047080.0, ans=0.2 2024-08-18 19:00:02,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4047080.0, ans=0.09899494936611666 2024-08-18 19:00:02,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4047080.0, ans=0.0 2024-08-18 19:00:08,870 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.393e+01 2.591e+01 2.982e+01 4.231e+01, threshold=5.182e+01, percent-clipped=0.0 2024-08-18 19:00:34,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4047280.0, ans=0.125 2024-08-18 19:01:11,819 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 3650, loss[loss=0.1013, beats_loss=0.01082, ecapa_loss=0.0001243, whisper_loss=0.08924, over 21600.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01042, ecapa_loss=0.0001447, whisper_loss=0.09032, over 3863617.07 frames. ], batch size: 84, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 19:01:20,213 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 25 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-18 19:01:22,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4047580.0, ans=0.125 2024-08-18 19:01:40,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4047680.0, ans=0.125 2024-08-18 19:01:40,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4047680.0, ans=0.125 2024-08-18 19:01:40,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4047680.0, ans=0.125 2024-08-18 19:02:02,819 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.03 vs. limit=6.0 2024-08-18 19:02:18,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4047980.0, ans=0.07 2024-08-18 19:02:35,148 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 3700, loss[loss=0.1022, beats_loss=0.00862, ecapa_loss=0.0001448, whisper_loss=0.09214, over 14511.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01042, ecapa_loss=0.0001443, whisper_loss=0.09041, over 3855036.48 frames. ], batch size: 56, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:02:43,564 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 17 from LS+wenet, 27 from Vox, 46 fro AS 2024-08-18 19:02:44,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4048080.0, ans=0.125 2024-08-18 19:02:47,991 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.250e+01 2.400e+01 2.703e+01 3.510e+01, threshold=4.800e+01, percent-clipped=0.0 2024-08-18 19:02:53,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4048180.0, ans=0.0 2024-08-18 19:02:53,754 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.87 vs. limit=15.0 2024-08-18 19:02:55,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4048180.0, ans=0.125 2024-08-18 19:03:07,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4048280.0, ans=0.125 2024-08-18 19:03:13,907 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 20 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-18 19:03:16,143 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 19:03:23,857 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-18 19:03:28,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4048380.0, ans=0.0 2024-08-18 19:03:53,226 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 3750, loss[loss=0.1013, beats_loss=0.009384, ecapa_loss=0.0001737, whisper_loss=0.09014, over 19560.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01047, ecapa_loss=0.000144, whisper_loss=0.09037, over 3877148.93 frames. ], batch size: 80, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:04:06,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4048580.0, ans=0.125 2024-08-18 19:04:17,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4048680.0, ans=0.2 2024-08-18 19:04:35,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4048780.0, ans=0.04949747468305833 2024-08-18 19:04:48,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4048880.0, ans=0.0 2024-08-18 19:04:53,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4048880.0, ans=0.0 2024-08-18 19:05:00,292 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-18 19:05:02,326 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.80 vs. limit=12.0 2024-08-18 19:05:15,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4048980.0, ans=0.025 2024-08-18 19:05:18,252 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 3800, loss[loss=0.08547, beats_loss=0.01158, ecapa_loss=0.0001404, whisper_loss=0.07249, over 17729.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01045, ecapa_loss=0.0001451, whisper_loss=0.09027, over 3880205.13 frames. ], batch size: 69, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:05:23,714 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 19:05:31,267 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.387e+01 2.639e+01 2.992e+01 4.413e+01, threshold=5.277e+01, percent-clipped=0.0 2024-08-18 19:05:31,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4049080.0, ans=0.07 2024-08-18 19:05:34,091 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.12 vs. limit=15.0 2024-08-18 19:05:40,309 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 19 from LS+wenet, 29 from Vox, 26 fro AS 2024-08-18 19:05:42,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4049180.0, ans=0.5 2024-08-18 19:05:45,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4049180.0, ans=0.125 2024-08-18 19:05:52,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4049280.0, ans=0.125 2024-08-18 19:06:03,208 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.16 vs. limit=15.0 2024-08-18 19:06:16,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4049380.0, ans=0.2 2024-08-18 19:06:35,725 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 31 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-18 19:06:39,859 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 3850, loss[loss=0.1094, beats_loss=0.01176, ecapa_loss=0.0001457, whisper_loss=0.09615, over 19480.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01054, ecapa_loss=0.0001448, whisper_loss=0.09012, over 3886256.10 frames. ], batch size: 79, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:06:56,804 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 34 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-18 19:07:01,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4049680.0, ans=0.2 2024-08-18 19:07:11,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4049780.0, ans=0.2 2024-08-18 19:07:29,038 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 17 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-18 19:07:31,605 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 25 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-18 19:07:32,858 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-18 19:07:37,024 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.78 vs. limit=15.0 2024-08-18 19:07:43,204 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 17 from LS+wenet, 36 from Vox, 40 fro AS 2024-08-18 19:07:45,744 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 3900, loss[loss=0.1055, beats_loss=0.009941, ecapa_loss=0.0001475, whisper_loss=0.09404, over 15839.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01059, ecapa_loss=0.0001449, whisper_loss=0.09004, over 3883810.06 frames. ], batch size: 63, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:07:52,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4050080.0, ans=0.125 2024-08-18 19:07:56,323 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.489e+01 2.786e+01 3.014e+01 3.884e+02, threshold=5.572e+01, percent-clipped=4.0 2024-08-18 19:08:07,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4050180.0, ans=0.2 2024-08-18 19:08:09,865 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-18 19:08:12,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4050280.0, ans=0.2 2024-08-18 19:08:12,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4050280.0, ans=0.125 2024-08-18 19:08:16,265 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 25 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-18 19:08:17,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4050280.0, ans=0.1 2024-08-18 19:08:51,722 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 3950, loss[loss=0.1086, beats_loss=0.01061, ecapa_loss=0.0001493, whisper_loss=0.09653, over 21584.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01053, ecapa_loss=0.0001455, whisper_loss=0.09107, over 3909451.17 frames. ], batch size: 88, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:08:53,507 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.75 vs. limit=15.0 2024-08-18 19:09:05,775 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-18 19:09:20,397 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 29 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-18 19:09:30,925 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-18 19:09:34,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4050880.0, ans=0.1 2024-08-18 19:09:51,798 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 19:09:53,841 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 27 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-18 19:09:56,258 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 4000, loss[loss=0.07776, beats_loss=0.009, ecapa_loss=0.0001247, whisper_loss=0.06752, over 17316.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01038, ecapa_loss=0.0001477, whisper_loss=0.09147, over 3905039.12 frames. ], batch size: 66, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:09:56,507 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-18 19:09:56,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4051080.0, ans=0.0 2024-08-18 19:09:57,744 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-18 19:09:58,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4051080.0, ans=0.125 2024-08-18 19:10:04,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4051080.0, ans=0.0 2024-08-18 19:10:06,939 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.265e+01 2.552e+01 2.868e+01 4.279e+01, threshold=5.105e+01, percent-clipped=0.0 2024-08-18 19:10:07,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4051080.0, ans=0.0 2024-08-18 19:10:12,358 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 30 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-18 19:10:15,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4051180.0, ans=0.0 2024-08-18 19:10:20,078 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-18 19:10:24,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4051280.0, ans=0.0 2024-08-18 19:10:28,220 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 29 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-18 19:10:38,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4051380.0, ans=0.125 2024-08-18 19:10:44,150 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-18 19:10:44,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=4051380.0, ans=22.5 2024-08-18 19:10:51,014 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 19 from LS+wenet, 13 from Vox, 42 fro AS 2024-08-18 19:10:57,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4051480.0, ans=0.5 2024-08-18 19:10:58,002 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.30 vs. limit=15.0 2024-08-18 19:11:02,676 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 4050, loss[loss=0.1037, beats_loss=0.01195, ecapa_loss=0.0001374, whisper_loss=0.09037, over 23263.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01036, ecapa_loss=0.0001472, whisper_loss=0.09144, over 3899069.95 frames. ], batch size: 95, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:11:11,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4051580.0, ans=0.125 2024-08-18 19:11:15,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4051680.0, ans=0.2 2024-08-18 19:11:22,959 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 32 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-18 19:11:34,668 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 25 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-18 19:11:54,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4051980.0, ans=0.2 2024-08-18 19:12:00,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4051980.0, ans=0.05 2024-08-18 19:12:02,968 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4051980.0, ans=0.0 2024-08-18 19:12:03,260 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.49 vs. limit=12.0 2024-08-18 19:12:09,233 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 4100, loss[loss=0.1256, beats_loss=0.00868, ecapa_loss=0.0001677, whisper_loss=0.1153, over 23225.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01045, ecapa_loss=0.0001466, whisper_loss=0.09104, over 3882010.22 frames. ], batch size: 93, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:12:14,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4052080.0, ans=0.07 2024-08-18 19:12:19,683 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.351e+01 2.549e+01 2.874e+01 5.187e+01, threshold=5.098e+01, percent-clipped=1.0 2024-08-18 19:13:01,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4052480.0, ans=0.2 2024-08-18 19:13:01,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4052480.0, ans=0.0 2024-08-18 19:13:07,572 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 26 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-18 19:13:15,107 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 4150, loss[loss=0.1066, beats_loss=0.0114, ecapa_loss=0.0001551, whisper_loss=0.09363, over 21250.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01053, ecapa_loss=0.0001463, whisper_loss=0.09098, over 3952155.44 frames. ], batch size: 87, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:13:23,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4052580.0, ans=0.1 2024-08-18 19:13:28,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4052680.0, ans=0.1 2024-08-18 19:13:35,102 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-18 19:13:36,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4052680.0, ans=0.125 2024-08-18 19:13:51,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4052780.0, ans=0.0 2024-08-18 19:14:05,301 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.75 vs. limit=15.0 2024-08-18 19:14:11,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4052980.0, ans=0.05 2024-08-18 19:14:13,091 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.14 vs. limit=15.0 2024-08-18 19:14:17,832 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 24 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-18 19:14:21,284 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 4200, loss[loss=0.08492, beats_loss=0.01339, ecapa_loss=0.0001807, whisper_loss=0.06972, over 13093.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01049, ecapa_loss=0.0001467, whisper_loss=0.09143, over 3944913.60 frames. ], batch size: 57, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:14:24,766 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.83 vs. limit=22.5 2024-08-18 19:14:32,300 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.283e+01 2.553e+01 2.911e+01 4.394e+01, threshold=5.106e+01, percent-clipped=0.0 2024-08-18 19:14:35,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4053180.0, ans=0.1 2024-08-18 19:14:36,691 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.04 vs. limit=15.0 2024-08-18 19:14:45,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4053180.0, ans=0.125 2024-08-18 19:14:50,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4053280.0, ans=0.125 2024-08-18 19:15:22,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4053480.0, ans=0.125 2024-08-18 19:15:23,107 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 25 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-18 19:15:27,042 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 4250, loss[loss=0.09236, beats_loss=0.01069, ecapa_loss=0.0001544, whisper_loss=0.08012, over 15276.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01045, ecapa_loss=0.000146, whisper_loss=0.09131, over 3911908.24 frames. ], batch size: 60, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:15:33,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4053580.0, ans=0.125 2024-08-18 19:15:33,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4053580.0, ans=0.125 2024-08-18 19:16:33,928 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 4300, loss[loss=0.1058, beats_loss=0.0109, ecapa_loss=0.0001267, whisper_loss=0.09368, over 22427.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01055, ecapa_loss=0.0001444, whisper_loss=0.09045, over 3921846.16 frames. ], batch size: 91, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:16:41,038 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 23 from LS+wenet, 19 from Vox, 16 fro AS 2024-08-18 19:16:44,515 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.769e+01 2.278e+01 2.525e+01 2.871e+01 4.782e+01, threshold=5.050e+01, percent-clipped=0.0 2024-08-18 19:16:44,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4054080.0, ans=0.1 2024-08-18 19:16:49,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4054180.0, ans=0.2 2024-08-18 19:16:50,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4054180.0, ans=0.125 2024-08-18 19:16:57,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4054180.0, ans=0.2 2024-08-18 19:16:57,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4054180.0, ans=0.0 2024-08-18 19:17:04,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4054280.0, ans=0.07 2024-08-18 19:17:06,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4054280.0, ans=0.125 2024-08-18 19:17:07,853 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2024-08-18 19:17:14,548 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4054380.0, ans=0.1 2024-08-18 19:17:24,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4054380.0, ans=0.125 2024-08-18 19:17:34,643 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.38 vs. limit=22.5 2024-08-18 19:17:40,321 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 4350, loss[loss=0.09198, beats_loss=0.007859, ecapa_loss=0.0001757, whisper_loss=0.08237, over 17821.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01045, ecapa_loss=0.0001461, whisper_loss=0.09013, over 3891687.64 frames. ], batch size: 71, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:17:40,551 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-18 19:17:45,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4054580.0, ans=0.1 2024-08-18 19:17:49,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4054580.0, ans=0.125 2024-08-18 19:18:02,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=4054680.0, ans=0.5 2024-08-18 19:18:09,623 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-18 19:18:17,460 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-18 19:18:30,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4054880.0, ans=0.0 2024-08-18 19:18:31,182 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.75 vs. limit=15.0 2024-08-18 19:18:38,865 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.89 vs. limit=15.0 2024-08-18 19:18:45,578 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 4400, loss[loss=0.0873, beats_loss=0.01155, ecapa_loss=0.0001243, whisper_loss=0.07451, over 14833.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01043, ecapa_loss=0.0001464, whisper_loss=0.08978, over 3869114.51 frames. ], batch size: 57, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:18:47,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4055080.0, ans=0.0 2024-08-18 19:18:48,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4055080.0, ans=0.125 2024-08-18 19:18:55,954 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.724e+01 2.281e+01 2.472e+01 2.660e+01 4.951e+01, threshold=4.945e+01, percent-clipped=0.0 2024-08-18 19:18:57,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4055180.0, ans=0.125 2024-08-18 19:19:05,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4055180.0, ans=0.125 2024-08-18 19:19:08,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4055180.0, ans=0.125 2024-08-18 19:19:24,486 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.24 vs. limit=22.5 2024-08-18 19:19:36,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4055380.0, ans=0.0 2024-08-18 19:19:37,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4055480.0, ans=0.125 2024-08-18 19:19:51,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4055580.0, ans=0.1 2024-08-18 19:19:52,146 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 4450, loss[loss=0.1143, beats_loss=0.01044, ecapa_loss=0.0001526, whisper_loss=0.1024, over 17281.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01041, ecapa_loss=0.0001457, whisper_loss=0.08986, over 3823777.86 frames. ], batch size: 70, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:19:59,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4055580.0, ans=0.0 2024-08-18 19:20:06,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4055680.0, ans=0.125 2024-08-18 19:20:13,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4055680.0, ans=0.0 2024-08-18 19:20:28,394 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 17 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-18 19:20:31,696 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 24 from LS+wenet, 13 from Vox, 17 fro AS 2024-08-18 19:20:40,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=4055880.0, ans=15.0 2024-08-18 19:20:59,952 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 4500, loss[loss=0.1043, beats_loss=0.01012, ecapa_loss=0.0001488, whisper_loss=0.09267, over 22203.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01047, ecapa_loss=0.0001449, whisper_loss=0.08933, over 3843481.15 frames. ], batch size: 90, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:21:00,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4056080.0, ans=0.04949747468305833 2024-08-18 19:21:04,801 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2024-08-18 19:21:05,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4056080.0, ans=0.125 2024-08-18 19:21:10,759 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.274e+01 2.537e+01 2.836e+01 4.716e+01, threshold=5.074e+01, percent-clipped=0.0 2024-08-18 19:21:20,312 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-18 19:21:23,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4056180.0, ans=0.125 2024-08-18 19:21:32,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4056280.0, ans=0.1 2024-08-18 19:21:34,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4056280.0, ans=0.125 2024-08-18 19:21:34,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4056280.0, ans=0.5 2024-08-18 19:21:52,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4056480.0, ans=0.0 2024-08-18 19:21:58,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4056480.0, ans=0.1 2024-08-18 19:22:06,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4056580.0, ans=0.07 2024-08-18 19:22:07,183 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 4550, loss[loss=0.09075, beats_loss=0.01173, ecapa_loss=0.0001212, whisper_loss=0.07781, over 20137.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01036, ecapa_loss=0.0001453, whisper_loss=0.09034, over 3877558.93 frames. ], batch size: 81, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:22:09,880 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 21 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-18 19:22:25,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4056680.0, ans=0.125 2024-08-18 19:22:35,276 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-18 19:22:42,633 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-18 19:22:50,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4056880.0, ans=0.1 2024-08-18 19:22:53,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4056880.0, ans=0.0 2024-08-18 19:22:57,436 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.63 vs. limit=10.0 2024-08-18 19:23:01,290 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.13 vs. limit=10.0 2024-08-18 19:23:04,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4056980.0, ans=0.125 2024-08-18 19:23:14,102 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 4600, loss[loss=0.1061, beats_loss=0.01054, ecapa_loss=0.0001362, whisper_loss=0.09415, over 18854.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01045, ecapa_loss=0.0001441, whisper_loss=0.08999, over 3864177.49 frames. ], batch size: 74, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:23:25,097 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.308e+01 2.504e+01 2.960e+01 4.674e+01, threshold=5.007e+01, percent-clipped=0.0 2024-08-18 19:23:58,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4057380.0, ans=0.125 2024-08-18 19:23:59,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4057380.0, ans=0.0 2024-08-18 19:24:04,749 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 34 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-18 19:24:05,240 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.84 vs. limit=15.0 2024-08-18 19:24:07,098 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.71 vs. limit=8.0 2024-08-18 19:24:08,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4057480.0, ans=0.125 2024-08-18 19:24:20,247 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 4650, loss[loss=0.07818, beats_loss=0.01168, ecapa_loss=0.0001314, whisper_loss=0.06518, over 14711.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01048, ecapa_loss=0.000144, whisper_loss=0.08978, over 3877452.97 frames. ], batch size: 58, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:24:24,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4057580.0, ans=0.125 2024-08-18 19:24:27,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4057580.0, ans=0.0 2024-08-18 19:24:40,052 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 17 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-18 19:24:44,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4057680.0, ans=0.0 2024-08-18 19:24:54,163 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.04 vs. limit=15.0 2024-08-18 19:24:55,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4057780.0, ans=0.125 2024-08-18 19:25:00,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4057880.0, ans=0.125 2024-08-18 19:25:04,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4057880.0, ans=0.0 2024-08-18 19:25:22,546 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-18 19:25:24,017 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 22 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-18 19:25:26,290 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 4700, loss[loss=0.08722, beats_loss=0.01088, ecapa_loss=0.000124, whisper_loss=0.0751, over 22629.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01044, ecapa_loss=0.0001442, whisper_loss=0.09032, over 3863796.89 frames. ], batch size: 92, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:25:29,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4058080.0, ans=0.0 2024-08-18 19:25:36,747 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.335e+01 2.621e+01 2.898e+01 4.887e+01, threshold=5.242e+01, percent-clipped=0.0 2024-08-18 19:25:56,871 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-18 19:25:58,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4058280.0, ans=0.125 2024-08-18 19:26:08,674 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 21 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-18 19:26:25,824 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-18 19:26:32,097 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 4750, loss[loss=0.096, beats_loss=0.01268, ecapa_loss=9.918e-05, whisper_loss=0.08233, over 18291.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01046, ecapa_loss=0.0001443, whisper_loss=0.08985, over 3907473.68 frames. ], batch size: 69, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:26:48,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4058680.0, ans=0.1 2024-08-18 19:26:55,234 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-18 19:27:01,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4058780.0, ans=0.0 2024-08-18 19:27:10,398 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.24 vs. limit=15.0 2024-08-18 19:27:26,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4058980.0, ans=0.0 2024-08-18 19:27:27,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4058980.0, ans=0.09899494936611666 2024-08-18 19:27:38,348 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 4800, loss[loss=0.1088, beats_loss=0.01119, ecapa_loss=0.0001337, whisper_loss=0.09631, over 22324.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01049, ecapa_loss=0.0001444, whisper_loss=0.08979, over 3892163.18 frames. ], batch size: 90, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:27:49,123 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.817e+01 2.300e+01 2.541e+01 2.799e+01 4.808e+02, threshold=5.082e+01, percent-clipped=2.0 2024-08-18 19:28:24,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4059380.0, ans=0.2 2024-08-18 19:28:26,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4059380.0, ans=0.2 2024-08-18 19:28:45,693 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 4850, loss[loss=0.08212, beats_loss=0.01199, ecapa_loss=0.0001507, whisper_loss=0.06863, over 21445.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0105, ecapa_loss=0.0001444, whisper_loss=0.08988, over 3906928.28 frames. ], batch size: 92, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:28:45,830 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-18 19:28:47,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4059580.0, ans=0.1 2024-08-18 19:29:01,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4059680.0, ans=0.0 2024-08-18 19:29:30,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4059880.0, ans=0.125 2024-08-18 19:29:50,971 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 4900, loss[loss=0.1152, beats_loss=0.01052, ecapa_loss=0.0001356, whisper_loss=0.1033, over 23590.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01043, ecapa_loss=0.000145, whisper_loss=0.09035, over 3910235.85 frames. ], batch size: 93, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:29:52,558 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-18 19:29:59,038 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-18 19:30:01,412 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.212e+01 2.467e+01 2.752e+01 9.926e+01, threshold=4.934e+01, percent-clipped=3.0 2024-08-18 19:30:01,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4060080.0, ans=0.04949747468305833 2024-08-18 19:30:12,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4060180.0, ans=0.0 2024-08-18 19:30:15,225 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.35 vs. limit=15.0 2024-08-18 19:30:21,810 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.12 vs. limit=10.0 2024-08-18 19:30:26,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4060280.0, ans=0.0 2024-08-18 19:30:28,278 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-18 19:30:37,716 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-18 19:30:39,724 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.76 vs. limit=15.0 2024-08-18 19:30:40,362 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 24 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-18 19:30:43,003 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-18 19:30:47,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4060480.0, ans=0.125 2024-08-18 19:30:57,402 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 4950, loss[loss=0.1001, beats_loss=0.009964, ecapa_loss=0.0001622, whisper_loss=0.08856, over 20502.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01046, ecapa_loss=0.0001445, whisper_loss=0.08989, over 3914196.25 frames. ], batch size: 84, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:31:08,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4060580.0, ans=0.0 2024-08-18 19:31:12,624 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 21 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-18 19:31:14,536 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.18 vs. limit=15.0 2024-08-18 19:31:32,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4060780.0, ans=0.0 2024-08-18 19:31:32,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4060780.0, ans=0.0 2024-08-18 19:31:45,244 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-18 19:31:47,074 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.78 vs. limit=22.5 2024-08-18 19:31:53,671 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 22 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-18 19:32:01,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4060980.0, ans=10.0 2024-08-18 19:32:03,839 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 5000, loss[loss=0.09008, beats_loss=0.01178, ecapa_loss=0.0001342, whisper_loss=0.07696, over 15837.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01044, ecapa_loss=0.0001454, whisper_loss=0.09013, over 3876625.39 frames. ], batch size: 64, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:32:05,293 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 41 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-18 19:32:10,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4061080.0, ans=0.2 2024-08-18 19:32:10,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4061080.0, ans=0.1 2024-08-18 19:32:14,148 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.772e+01 2.344e+01 2.610e+01 2.936e+01 3.838e+01, threshold=5.220e+01, percent-clipped=0.0 2024-08-18 19:32:16,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4061180.0, ans=0.125 2024-08-18 19:32:17,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4061180.0, ans=0.0 2024-08-18 19:32:18,571 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4061180.0, ans=0.05 2024-08-18 19:32:21,778 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.46 vs. limit=22.5 2024-08-18 19:32:58,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4061480.0, ans=0.125 2024-08-18 19:33:09,161 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 5050, loss[loss=0.1197, beats_loss=0.009897, ecapa_loss=0.000126, whisper_loss=0.1086, over 23225.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01045, ecapa_loss=0.0001458, whisper_loss=0.09116, over 3905598.61 frames. ], batch size: 90, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:33:12,092 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 19:33:18,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4061580.0, ans=0.125 2024-08-18 19:33:25,244 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2024-08-18 19:33:31,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4061680.0, ans=0.2 2024-08-18 19:33:46,025 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.94 vs. limit=15.0 2024-08-18 19:33:51,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4061880.0, ans=0.125 2024-08-18 19:34:00,177 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=6.988e-01 2024-08-18 19:34:01,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4061980.0, ans=0.95 2024-08-18 19:34:02,910 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-18 19:34:03,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4061980.0, ans=0.125 2024-08-18 19:34:03,490 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.33 vs. limit=15.0 2024-08-18 19:34:09,305 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 31 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-18 19:34:09,514 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4061980.0, ans=0.1 2024-08-18 19:34:09,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4061980.0, ans=0.2 2024-08-18 19:34:14,212 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 5100, loss[loss=0.0992, beats_loss=0.01043, ecapa_loss=0.000147, whisper_loss=0.08729, over 22462.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01051, ecapa_loss=0.0001459, whisper_loss=0.09029, over 3904000.70 frames. ], batch size: 92, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:34:16,844 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-18 19:34:24,693 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.714e+01 2.303e+01 2.611e+01 2.910e+01 2.012e+02, threshold=5.222e+01, percent-clipped=3.0 2024-08-18 19:34:39,639 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 19:34:41,851 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 33 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-18 19:34:57,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4062380.0, ans=0.0 2024-08-18 19:35:00,580 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.12 vs. limit=15.0 2024-08-18 19:35:00,933 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.49 vs. limit=15.0 2024-08-18 19:35:07,902 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 24 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-18 19:35:19,335 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 5150, loss[loss=0.1086, beats_loss=0.00978, ecapa_loss=0.0001495, whisper_loss=0.09738, over 20693.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0105, ecapa_loss=0.0001442, whisper_loss=0.09108, over 3907648.53 frames. ], batch size: 83, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:35:20,715 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 14 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-18 19:35:21,162 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.67 vs. limit=15.0 2024-08-18 19:35:32,859 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 28 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-18 19:35:35,197 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 23 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-18 19:35:40,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4062680.0, ans=0.125 2024-08-18 19:35:48,292 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 24 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-18 19:35:54,932 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4062780.0, ans=0.125 2024-08-18 19:36:12,903 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-18 19:36:15,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4062980.0, ans=0.125 2024-08-18 19:36:17,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=4062980.0, ans=15.0 2024-08-18 19:36:20,906 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 19:36:23,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4063080.0, ans=0.125 2024-08-18 19:36:24,444 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 5200, loss[loss=0.0944, beats_loss=0.009404, ecapa_loss=0.0001402, whisper_loss=0.0836, over 17946.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0105, ecapa_loss=0.000144, whisper_loss=0.09049, over 3849051.64 frames. ], batch size: 72, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:36:27,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4063080.0, ans=0.0 2024-08-18 19:36:34,932 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.237e+01 2.499e+01 2.869e+01 3.918e+01, threshold=4.998e+01, percent-clipped=0.0 2024-08-18 19:36:44,090 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 27 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-18 19:36:56,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4063280.0, ans=0.0 2024-08-18 19:36:57,389 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.41 vs. limit=12.0 2024-08-18 19:37:08,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4063380.0, ans=0.125 2024-08-18 19:37:09,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4063380.0, ans=0.125 2024-08-18 19:37:15,251 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-18 19:37:27,927 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-18 19:37:29,312 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 5250, loss[loss=0.08638, beats_loss=0.01026, ecapa_loss=0.0001802, whisper_loss=0.07432, over 20644.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0104, ecapa_loss=0.0001443, whisper_loss=0.09055, over 3863090.37 frames. ], batch size: 90, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:37:33,683 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-18 19:37:38,622 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-18 19:37:41,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4063680.0, ans=0.1 2024-08-18 19:38:09,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=4063880.0, ans=15.0 2024-08-18 19:38:18,796 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2024-08-18 19:38:34,949 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 5300, loss[loss=0.1009, beats_loss=0.01009, ecapa_loss=0.0001399, whisper_loss=0.08936, over 22402.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0105, ecapa_loss=0.0001433, whisper_loss=0.08979, over 3877223.21 frames. ], batch size: 89, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:38:45,249 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.271e+01 2.459e+01 2.862e+01 3.681e+01, threshold=4.918e+01, percent-clipped=0.0 2024-08-18 19:38:53,538 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-18 19:39:09,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4064280.0, ans=0.1 2024-08-18 19:39:09,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=4064280.0, ans=15.0 2024-08-18 19:39:26,138 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-18 19:39:29,916 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 22 from LS+wenet, 33 from Vox, 31 fro AS 2024-08-18 19:39:40,412 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 5350, loss[loss=0.09599, beats_loss=0.01059, ecapa_loss=0.0001796, whisper_loss=0.0836, over 14912.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0105, ecapa_loss=0.000143, whisper_loss=0.08952, over 3867757.84 frames. ], batch size: 61, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:39:56,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4064680.0, ans=0.125 2024-08-18 19:39:58,559 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 16 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-18 19:39:59,350 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=15.0 2024-08-18 19:40:00,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4064680.0, ans=0.125 2024-08-18 19:40:36,164 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 25 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-18 19:40:45,369 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 5400, loss[loss=0.1082, beats_loss=0.006708, ecapa_loss=0.0001459, whisper_loss=0.1001, over 18612.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01043, ecapa_loss=0.0001423, whisper_loss=0.09006, over 3882240.42 frames. ], batch size: 71, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:40:55,813 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.323e+01 2.486e+01 2.757e+01 7.615e+01, threshold=4.971e+01, percent-clipped=1.0 2024-08-18 19:40:57,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4065180.0, ans=0.0 2024-08-18 19:40:58,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4065180.0, ans=0.2 2024-08-18 19:41:08,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4065180.0, ans=0.0 2024-08-18 19:41:10,327 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.61 vs. limit=15.0 2024-08-18 19:41:25,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4065380.0, ans=0.125 2024-08-18 19:41:26,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4065380.0, ans=0.125 2024-08-18 19:41:29,373 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 20 from LS+wenet, 27 from Vox, 46 fro AS 2024-08-18 19:41:41,195 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 27 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-18 19:41:45,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4065480.0, ans=0.0 2024-08-18 19:41:50,147 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 5450, loss[loss=0.1012, beats_loss=0.01181, ecapa_loss=0.000123, whisper_loss=0.0882, over 22329.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01049, ecapa_loss=0.0001422, whisper_loss=0.08998, over 3915776.78 frames. ], batch size: 89, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:41:51,501 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 25 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-18 19:41:53,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4065580.0, ans=0.125 2024-08-18 19:41:55,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4065580.0, ans=0.125 2024-08-18 19:42:04,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4065680.0, ans=0.0 2024-08-18 19:42:06,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4065680.0, ans=0.09899494936611666 2024-08-18 19:42:14,656 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-18 19:42:21,237 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-18 19:42:35,928 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.13 vs. limit=22.5 2024-08-18 19:42:38,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4065880.0, ans=0.0 2024-08-18 19:42:48,297 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-18 19:42:54,889 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 5500, loss[loss=0.1277, beats_loss=0.008443, ecapa_loss=0.0001685, whisper_loss=0.1176, over 21751.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0105, ecapa_loss=0.0001431, whisper_loss=0.08954, over 3908893.46 frames. ], batch size: 88, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:42:55,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4066080.0, ans=0.125 2024-08-18 19:42:56,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4066080.0, ans=0.1 2024-08-18 19:43:02,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4066080.0, ans=0.125 2024-08-18 19:43:05,260 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.744e+01 2.282e+01 2.535e+01 2.838e+01 1.372e+02, threshold=5.070e+01, percent-clipped=2.0 2024-08-18 19:43:07,279 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.27 vs. limit=15.0 2024-08-18 19:43:07,997 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-18 19:43:15,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4066180.0, ans=0.125 2024-08-18 19:43:22,746 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.86 vs. limit=12.0 2024-08-18 19:43:26,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4066280.0, ans=0.2 2024-08-18 19:43:32,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4066280.0, ans=0.0 2024-08-18 19:43:33,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4066280.0, ans=0.0 2024-08-18 19:43:34,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4066380.0, ans=0.125 2024-08-18 19:43:35,031 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.85 vs. limit=6.0 2024-08-18 19:43:38,329 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.49 vs. limit=12.0 2024-08-18 19:43:50,638 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-18 19:44:02,115 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 5550, loss[loss=0.09235, beats_loss=0.0108, ecapa_loss=0.0001277, whisper_loss=0.08028, over 18068.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01048, ecapa_loss=0.0001435, whisper_loss=0.0902, over 3913650.61 frames. ], batch size: 71, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:44:04,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4066580.0, ans=0.125 2024-08-18 19:44:05,545 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.13 vs. limit=6.0 2024-08-18 19:44:30,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4066780.0, ans=0.0 2024-08-18 19:44:48,481 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.21 vs. limit=10.0 2024-08-18 19:44:49,855 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.43 vs. limit=22.5 2024-08-18 19:44:51,184 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4066880.0, ans=0.0 2024-08-18 19:44:55,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4066880.0, ans=0.0 2024-08-18 19:44:58,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4066880.0, ans=0.0 2024-08-18 19:45:14,742 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 5600, loss[loss=0.1158, beats_loss=0.009657, ecapa_loss=0.0001594, whisper_loss=0.1045, over 23102.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01039, ecapa_loss=0.0001437, whisper_loss=0.09119, over 3921280.51 frames. ], batch size: 94, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:45:25,980 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.465e+01 2.692e+01 2.981e+01 3.503e+02, threshold=5.385e+01, percent-clipped=2.0 2024-08-18 19:45:31,847 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 29 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-18 19:45:38,304 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 35 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-18 19:45:41,299 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 19:45:43,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4067280.0, ans=0.125 2024-08-18 19:45:56,297 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.93 vs. limit=15.0 2024-08-18 19:45:58,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4067380.0, ans=0.0 2024-08-18 19:46:18,753 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.93 vs. limit=22.5 2024-08-18 19:46:23,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4067480.0, ans=0.125 2024-08-18 19:46:27,900 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-18 19:46:28,866 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 5650, loss[loss=0.1047, beats_loss=0.01081, ecapa_loss=0.0001237, whisper_loss=0.09267, over 23262.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01043, ecapa_loss=0.0001443, whisper_loss=0.09058, over 3932255.31 frames. ], batch size: 91, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:46:29,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4067580.0, ans=0.1 2024-08-18 19:46:34,706 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.20 vs. limit=22.5 2024-08-18 19:46:42,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4067680.0, ans=0.125 2024-08-18 19:46:51,926 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-18 19:46:57,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4067680.0, ans=0.125 2024-08-18 19:47:17,819 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 20 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-18 19:47:34,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4067980.0, ans=0.04949747468305833 2024-08-18 19:47:35,381 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-18 19:47:37,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4067980.0, ans=0.1 2024-08-18 19:47:45,261 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 5700, loss[loss=0.135, beats_loss=0.006877, ecapa_loss=0.0001953, whisper_loss=0.1262, over 18261.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0105, ecapa_loss=0.000144, whisper_loss=0.09028, over 3935701.64 frames. ], batch size: 73, lr: 2.19e-03, grad_scale: 1.152921504606847e+18 2024-08-18 19:47:51,085 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.551e-02 2024-08-18 19:47:58,012 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.845e+01 2.343e+01 2.551e+01 2.885e+01 3.907e+01, threshold=5.102e+01, percent-clipped=0.0 2024-08-18 19:48:00,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4068180.0, ans=0.0 2024-08-18 19:48:05,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4068180.0, ans=0.125 2024-08-18 19:48:11,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4068180.0, ans=0.125 2024-08-18 19:48:13,993 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.38 vs. limit=12.0 2024-08-18 19:48:26,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4068280.0, ans=0.125 2024-08-18 19:48:46,167 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.23 vs. limit=10.0 2024-08-18 19:48:53,905 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 19:48:54,846 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 8 from Vox, 29 fro AS 2024-08-18 19:49:00,667 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 5750, loss[loss=0.1034, beats_loss=0.01126, ecapa_loss=0.0001657, whisper_loss=0.09052, over 21910.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01058, ecapa_loss=0.0001435, whisper_loss=0.08937, over 3911457.78 frames. ], batch size: 91, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:49:06,016 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 22 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-18 19:49:25,594 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-18 19:49:38,697 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2024-08-18 19:49:45,071 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 22 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-18 19:49:47,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4068880.0, ans=0.2 2024-08-18 19:49:54,179 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 17 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-18 19:49:54,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4068880.0, ans=0.1 2024-08-18 19:50:20,445 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 33 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-18 19:50:22,524 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 5800, loss[loss=0.1147, beats_loss=0.008374, ecapa_loss=0.0001624, whisper_loss=0.1047, over 23730.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01055, ecapa_loss=0.000143, whisper_loss=0.08989, over 3887047.30 frames. ], batch size: 94, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:50:28,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4069080.0, ans=0.125 2024-08-18 19:50:31,764 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 28 from LS+wenet, 12 from Vox, 42 fro AS 2024-08-18 19:50:35,680 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.304e+01 2.520e+01 2.839e+01 4.509e+01, threshold=5.039e+01, percent-clipped=0.0 2024-08-18 19:50:37,817 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-18 19:51:01,131 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 21 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-18 19:51:20,608 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-18 19:51:22,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4069480.0, ans=0.0 2024-08-18 19:51:29,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4069480.0, ans=0.0 2024-08-18 19:51:37,336 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 5850, loss[loss=0.124, beats_loss=0.009613, ecapa_loss=0.0001228, whisper_loss=0.1132, over 22270.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01059, ecapa_loss=0.0001422, whisper_loss=0.08946, over 3885904.58 frames. ], batch size: 88, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:51:40,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4069580.0, ans=0.0 2024-08-18 19:52:08,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4069780.0, ans=0.125 2024-08-18 19:52:09,399 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 25 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-18 19:52:18,593 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.44 vs. limit=10.0 2024-08-18 19:52:51,745 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 5900, loss[loss=0.08713, beats_loss=0.01241, ecapa_loss=0.0001574, whisper_loss=0.07314, over 17427.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.0107, ecapa_loss=0.0001425, whisper_loss=0.08817, over 3892336.49 frames. ], batch size: 72, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:52:59,625 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.671e-01 2024-08-18 19:53:03,416 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.303e+01 2.495e+01 2.775e+01 3.811e+01, threshold=4.989e+01, percent-clipped=0.0 2024-08-18 19:53:17,849 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2024-08-18 19:53:18,839 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-18 19:53:24,977 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.44 vs. limit=10.0 2024-08-18 19:53:34,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4070380.0, ans=0.125 2024-08-18 19:53:48,783 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 31 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-18 19:53:56,949 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.72 vs. limit=10.0 2024-08-18 19:53:57,788 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-18 19:53:58,793 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 5950, loss[loss=0.09702, beats_loss=0.00977, ecapa_loss=0.0001439, whisper_loss=0.08581, over 17269.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01063, ecapa_loss=0.0001431, whisper_loss=0.08926, over 3881913.16 frames. ], batch size: 67, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:54:00,094 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 29 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-18 19:54:03,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4070580.0, ans=0.125 2024-08-18 19:54:13,110 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.19 vs. limit=15.0 2024-08-18 19:54:26,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4070780.0, ans=0.125 2024-08-18 19:54:30,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4070780.0, ans=0.1 2024-08-18 19:54:56,974 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-18 19:54:58,381 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 19:55:04,511 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 6000, loss[loss=0.0867, beats_loss=0.01117, ecapa_loss=0.0001552, whisper_loss=0.07398, over 20289.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01062, ecapa_loss=0.0001421, whisper_loss=0.0898, over 3892830.49 frames. ], batch size: 85, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:55:04,512 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-18 19:55:42,592 INFO [train_multi_KD3.py:1149] (0/4) Epoch 28, validation on ASR_libri: loss=0.2546, beats_loss=0, ecapa_loss=0.0005279, whisper_loss=0.2493, over 922467.00 frames. 2024-08-18 19:55:59,715 INFO [train_multi_KD3.py:1149] (0/4) Epoch 28, validation on SV_voxceleb1: loss=0.003985, beats_loss=0, ecapa_loss=0.0003985, whisper_loss=0, over 939242.00 frames. 2024-08-18 19:57:44,491 INFO [train_multi_KD3.py:1149] (0/4) Epoch 28, validation on AT_audioset: loss=0.02306, beats_loss=0.02306, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 19:57:44,495 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32389MB 2024-08-18 19:57:49,575 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 16 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-18 19:57:56,137 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.336e+01 2.599e+01 2.933e+01 4.741e+01, threshold=5.199e+01, percent-clipped=0.0 2024-08-18 19:58:03,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4071180.0, ans=0.0 2024-08-18 19:58:07,112 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 23 from LS+wenet, 31 from Vox, 41 fro AS 2024-08-18 19:58:10,127 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 19:58:12,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4071280.0, ans=0.0 2024-08-18 19:58:47,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4071480.0, ans=0.0 2024-08-18 19:58:47,825 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.55 vs. limit=15.0 2024-08-18 19:58:52,590 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 6050, loss[loss=0.1001, beats_loss=0.01133, ecapa_loss=0.0001103, whisper_loss=0.08768, over 23326.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01065, ecapa_loss=0.000142, whisper_loss=0.08953, over 3895926.17 frames. ], batch size: 93, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:58:58,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4071580.0, ans=0.125 2024-08-18 19:59:08,871 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-18 19:59:26,665 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.72 vs. limit=15.0 2024-08-18 19:59:28,303 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.58 vs. limit=15.0 2024-08-18 19:59:50,444 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.56 vs. limit=22.5 2024-08-18 19:59:51,187 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 24 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-18 19:59:53,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4071980.0, ans=0.125 2024-08-18 19:59:59,691 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 6100, loss[loss=0.118, beats_loss=0.008496, ecapa_loss=0.0001377, whisper_loss=0.1082, over 23143.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01062, ecapa_loss=0.0001424, whisper_loss=0.09004, over 3884185.78 frames. ], batch size: 90, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:00:06,110 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.89 vs. limit=15.0 2024-08-18 20:00:06,239 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.34 vs. limit=10.0 2024-08-18 20:00:07,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4072080.0, ans=0.125 2024-08-18 20:00:12,352 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.299e+01 2.594e+01 2.933e+01 3.314e+02, threshold=5.188e+01, percent-clipped=1.0 2024-08-18 20:00:21,709 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-18 20:00:24,470 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-18 20:00:27,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4072280.0, ans=0.0 2024-08-18 20:01:06,114 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 6150, loss[loss=0.1206, beats_loss=0.009587, ecapa_loss=0.0001661, whisper_loss=0.1094, over 22211.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01061, ecapa_loss=0.0001431, whisper_loss=0.09026, over 3934745.58 frames. ], batch size: 90, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:01:07,780 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 27 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-18 20:01:13,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4072580.0, ans=0.125 2024-08-18 20:01:38,413 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 24 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-18 20:01:39,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4072780.0, ans=0.125 2024-08-18 20:01:47,911 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 15 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-18 20:01:56,946 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.56 vs. limit=15.0 2024-08-18 20:01:58,278 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.12 vs. limit=15.0 2024-08-18 20:02:00,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4072980.0, ans=0.0 2024-08-18 20:02:08,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4072980.0, ans=0.2 2024-08-18 20:02:12,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4073080.0, ans=0.125 2024-08-18 20:02:13,476 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 6200, loss[loss=0.1176, beats_loss=0.007955, ecapa_loss=0.0001586, whisper_loss=0.1081, over 15934.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01066, ecapa_loss=0.0001427, whisper_loss=0.08974, over 3903061.17 frames. ], batch size: 61, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:02:20,110 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.86 vs. limit=6.0 2024-08-18 20:02:24,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4073080.0, ans=0.125 2024-08-18 20:02:26,335 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.940e+01 2.303e+01 2.554e+01 2.870e+01 1.661e+02, threshold=5.109e+01, percent-clipped=2.0 2024-08-18 20:02:29,095 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-18 20:02:39,593 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4073180.0, ans=0.0 2024-08-18 20:02:49,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4073280.0, ans=0.125 2024-08-18 20:02:53,657 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.42 vs. limit=15.0 2024-08-18 20:03:02,108 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-18 20:03:10,892 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 22 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-18 20:03:24,500 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 6250, loss[loss=0.08424, beats_loss=0.01139, ecapa_loss=0.0001448, whisper_loss=0.0714, over 22436.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01064, ecapa_loss=0.000143, whisper_loss=0.08941, over 3898812.54 frames. ], batch size: 91, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:03:26,313 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-18 20:03:43,582 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 23 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-18 20:03:44,197 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.34 vs. limit=10.0 2024-08-18 20:03:59,033 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 26 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-18 20:04:04,465 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.35 vs. limit=10.0 2024-08-18 20:04:07,932 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 12 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-18 20:04:23,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4073980.0, ans=0.2 2024-08-18 20:04:29,489 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.29 vs. limit=10.0 2024-08-18 20:04:30,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4073980.0, ans=0.125 2024-08-18 20:04:30,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4073980.0, ans=0.125 2024-08-18 20:04:36,676 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 6300, loss[loss=0.09913, beats_loss=0.01095, ecapa_loss=0.0001594, whisper_loss=0.08659, over 22146.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01065, ecapa_loss=0.0001431, whisper_loss=0.08902, over 3889835.65 frames. ], batch size: 93, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:04:38,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4074080.0, ans=0.5 2024-08-18 20:04:46,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4074080.0, ans=0.2 2024-08-18 20:04:49,050 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.796e+01 2.375e+01 2.574e+01 2.890e+01 4.000e+02, threshold=5.149e+01, percent-clipped=1.0 2024-08-18 20:04:49,259 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-18 20:05:02,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4074180.0, ans=0.1 2024-08-18 20:05:03,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4074280.0, ans=0.0 2024-08-18 20:05:25,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4074380.0, ans=0.1 2024-08-18 20:05:32,782 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2024-08-18 20:05:36,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4074480.0, ans=0.5 2024-08-18 20:05:37,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4074480.0, ans=0.125 2024-08-18 20:05:45,067 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 6350, loss[loss=0.1037, beats_loss=0.01054, ecapa_loss=0.0001644, whisper_loss=0.09152, over 16640.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01061, ecapa_loss=0.000144, whisper_loss=0.08947, over 3906520.95 frames. ], batch size: 67, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:05:45,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4074580.0, ans=0.0 2024-08-18 20:05:48,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4074580.0, ans=0.125 2024-08-18 20:06:00,908 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.81 vs. limit=10.0 2024-08-18 20:06:04,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4074680.0, ans=0.0 2024-08-18 20:06:12,425 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.31 vs. limit=15.0 2024-08-18 20:06:35,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4074880.0, ans=0.125 2024-08-18 20:06:50,354 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 6400, loss[loss=0.1279, beats_loss=0.008319, ecapa_loss=0.0001595, whisper_loss=0.1179, over 14631.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01052, ecapa_loss=0.0001439, whisper_loss=0.08983, over 3911710.57 frames. ], batch size: 58, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:06:54,914 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.57 vs. limit=15.0 2024-08-18 20:07:02,019 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.918e+01 2.365e+01 2.555e+01 2.895e+01 7.791e+01, threshold=5.110e+01, percent-clipped=1.0 2024-08-18 20:07:02,248 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 21 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-18 20:07:07,800 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2024-08-18 20:07:20,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4075280.0, ans=0.125 2024-08-18 20:07:24,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4075280.0, ans=0.1 2024-08-18 20:07:27,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4075380.0, ans=0.1 2024-08-18 20:07:45,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4075480.0, ans=0.2 2024-08-18 20:07:54,041 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 6450, loss[loss=0.1192, beats_loss=0.009772, ecapa_loss=0.0001551, whisper_loss=0.1079, over 22218.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0105, ecapa_loss=0.0001439, whisper_loss=0.09034, over 3931315.67 frames. ], batch size: 89, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:07:54,175 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 41 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-18 20:08:01,115 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.20 vs. limit=12.0 2024-08-18 20:08:08,088 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-18 20:08:10,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4075680.0, ans=0.2 2024-08-18 20:08:11,983 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4075680.0, ans=0.2 2024-08-18 20:08:14,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4075680.0, ans=0.1 2024-08-18 20:08:15,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4075680.0, ans=0.125 2024-08-18 20:08:17,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4075680.0, ans=0.125 2024-08-18 20:08:21,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4075780.0, ans=0.2 2024-08-18 20:08:30,454 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.53 vs. limit=10.0 2024-08-18 20:08:31,013 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 24 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-18 20:08:33,011 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.91 vs. limit=12.0 2024-08-18 20:08:34,445 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.65 vs. limit=10.0 2024-08-18 20:08:36,559 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.90 vs. limit=22.5 2024-08-18 20:08:48,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4075980.0, ans=0.1 2024-08-18 20:08:55,062 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-18 20:08:57,335 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 6500, loss[loss=0.113, beats_loss=0.00952, ecapa_loss=0.0001535, whisper_loss=0.102, over 22881.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01049, ecapa_loss=0.0001448, whisper_loss=0.09032, over 3968591.49 frames. ], batch size: 91, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:08:57,498 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-18 20:08:59,706 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.31 vs. limit=15.0 2024-08-18 20:09:08,801 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.695e+01 2.274e+01 2.478e+01 2.663e+01 4.004e+01, threshold=4.956e+01, percent-clipped=0.0 2024-08-18 20:09:19,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4076180.0, ans=0.2 2024-08-18 20:09:28,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4076280.0, ans=0.0 2024-08-18 20:09:39,936 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 20 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-18 20:09:45,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4076380.0, ans=0.125 2024-08-18 20:09:58,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4076480.0, ans=0.5 2024-08-18 20:09:59,430 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-18 20:10:01,924 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 6550, loss[loss=0.1063, beats_loss=0.01134, ecapa_loss=0.0001327, whisper_loss=0.09366, over 22613.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01056, ecapa_loss=0.0001435, whisper_loss=0.09019, over 3951408.76 frames. ], batch size: 92, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:10:19,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4076680.0, ans=0.0 2024-08-18 20:10:19,448 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.96 vs. limit=15.0 2024-08-18 20:10:32,952 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 23 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-18 20:10:40,104 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.02 vs. limit=22.5 2024-08-18 20:10:43,409 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-18 20:10:50,055 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-18 20:10:52,436 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-18 20:11:04,824 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 31 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-18 20:11:06,264 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 6600, loss[loss=0.1094, beats_loss=0.009814, ecapa_loss=0.0001885, whisper_loss=0.09766, over 20403.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01055, ecapa_loss=0.0001442, whisper_loss=0.09055, over 3962438.45 frames. ], batch size: 87, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:11:06,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4077080.0, ans=0.1 2024-08-18 20:11:17,597 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.939e+01 2.459e+01 2.687e+01 3.202e+01 5.546e+01, threshold=5.373e+01, percent-clipped=1.0 2024-08-18 20:11:20,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4077180.0, ans=0.125 2024-08-18 20:11:27,459 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.49 vs. limit=15.0 2024-08-18 20:11:33,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4077280.0, ans=0.125 2024-08-18 20:11:33,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4077280.0, ans=0.04949747468305833 2024-08-18 20:11:34,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4077280.0, ans=0.125 2024-08-18 20:11:41,946 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 28 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-18 20:11:43,207 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-18 20:11:58,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4077480.0, ans=0.125 2024-08-18 20:12:02,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4077480.0, ans=0.1 2024-08-18 20:12:10,159 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 6650, loss[loss=0.1138, beats_loss=0.01115, ecapa_loss=0.0001208, whisper_loss=0.1015, over 17705.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01055, ecapa_loss=0.0001442, whisper_loss=0.09055, over 3979312.51 frames. ], batch size: 66, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:12:10,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4077580.0, ans=0.1 2024-08-18 20:12:29,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4077680.0, ans=0.2 2024-08-18 20:12:30,034 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2024-08-18 20:12:31,399 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.78 vs. limit=22.5 2024-08-18 20:12:37,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4077780.0, ans=0.1 2024-08-18 20:12:42,494 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 10 from Vox, 50 fro AS 2024-08-18 20:12:46,700 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 12 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-18 20:12:53,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4077880.0, ans=0.1 2024-08-18 20:12:59,271 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.08 vs. limit=15.0 2024-08-18 20:13:00,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4077980.0, ans=0.125 2024-08-18 20:13:03,938 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-18 20:13:10,674 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-18 20:13:14,511 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 6700, loss[loss=0.1121, beats_loss=0.00728, ecapa_loss=0.0001579, whisper_loss=0.1032, over 22524.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01052, ecapa_loss=0.0001444, whisper_loss=0.09066, over 3939770.94 frames. ], batch size: 89, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:13:18,295 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 26 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-18 20:13:24,354 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.13 vs. limit=22.5 2024-08-18 20:13:26,209 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.353e+01 2.592e+01 3.066e+01 1.135e+02, threshold=5.185e+01, percent-clipped=5.0 2024-08-18 20:13:33,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4078180.0, ans=0.09899494936611666 2024-08-18 20:13:40,237 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-18 20:13:43,557 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=10.93 vs. limit=12.0 2024-08-18 20:13:54,703 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 17 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-18 20:13:55,015 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.791e+05 2024-08-18 20:14:04,090 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.46 vs. limit=10.0 2024-08-18 20:14:07,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4078480.0, ans=0.0 2024-08-18 20:14:11,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4078480.0, ans=0.0 2024-08-18 20:14:19,046 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 6750, loss[loss=0.09322, beats_loss=0.01013, ecapa_loss=0.0001194, whisper_loss=0.0819, over 20300.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01044, ecapa_loss=0.000145, whisper_loss=0.09015, over 3885187.63 frames. ], batch size: 79, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:14:29,896 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 15 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-18 20:14:34,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.whiten.whitening_limit, batch_count=4078680.0, ans=12.0 2024-08-18 20:14:45,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4078780.0, ans=0.1 2024-08-18 20:14:48,003 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 19 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-18 20:14:53,151 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 24 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-18 20:15:13,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4078980.0, ans=0.125 2024-08-18 20:15:13,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4078980.0, ans=0.0 2024-08-18 20:15:24,193 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 6800, loss[loss=0.09202, beats_loss=0.01058, ecapa_loss=0.0001265, whisper_loss=0.08018, over 17371.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01042, ecapa_loss=0.000145, whisper_loss=0.08994, over 3873871.20 frames. ], batch size: 69, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:15:29,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4079080.0, ans=0.07 2024-08-18 20:15:35,430 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.259e+01 2.465e+01 2.807e+01 3.943e+01, threshold=4.931e+01, percent-clipped=0.0 2024-08-18 20:15:44,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4079180.0, ans=0.1 2024-08-18 20:15:52,603 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 17 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-18 20:15:57,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4079280.0, ans=0.125 2024-08-18 20:15:59,716 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.44 vs. limit=15.0 2024-08-18 20:16:05,897 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 11 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-18 20:16:07,128 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 20 from LS+wenet, 24 from Vox, 52 fro AS 2024-08-18 20:16:13,352 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-18 20:16:20,579 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.19 vs. limit=6.0 2024-08-18 20:16:28,601 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 6850, loss[loss=0.1081, beats_loss=0.0102, ecapa_loss=0.0001502, whisper_loss=0.09638, over 21273.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01048, ecapa_loss=0.000145, whisper_loss=0.08899, over 3870212.17 frames. ], batch size: 87, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:16:31,667 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.64 vs. limit=15.0 2024-08-18 20:16:40,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4079680.0, ans=0.2 2024-08-18 20:16:44,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4079680.0, ans=0.125 2024-08-18 20:16:48,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4079680.0, ans=0.125 2024-08-18 20:16:52,406 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.51 vs. limit=22.5 2024-08-18 20:17:11,220 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 27 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-18 20:17:20,850 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.74 vs. limit=12.0 2024-08-18 20:17:21,450 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-408000.pt 2024-08-18 20:17:32,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4079980.0, ans=0.125 2024-08-18 20:17:34,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4080080.0, ans=0.125 2024-08-18 20:17:34,997 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 6900, loss[loss=0.08875, beats_loss=0.01127, ecapa_loss=0.0001363, whisper_loss=0.07612, over 20579.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01055, ecapa_loss=0.0001449, whisper_loss=0.08919, over 3852503.45 frames. ], batch size: 84, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:17:35,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4080080.0, ans=0.125 2024-08-18 20:17:35,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4080080.0, ans=0.125 2024-08-18 20:17:38,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4080080.0, ans=0.2 2024-08-18 20:17:42,322 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.81 vs. limit=22.5 2024-08-18 20:17:45,758 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 13 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-18 20:17:46,852 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.343e+01 2.661e+01 3.031e+01 5.071e+01, threshold=5.321e+01, percent-clipped=1.0 2024-08-18 20:17:52,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4080180.0, ans=0.125 2024-08-18 20:18:01,518 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.76 vs. limit=15.0 2024-08-18 20:18:06,291 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-18 20:18:08,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4080280.0, ans=0.125 2024-08-18 20:18:10,055 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-18 20:18:11,971 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.86 vs. limit=12.0 2024-08-18 20:18:22,689 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-18 20:18:38,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4080580.0, ans=0.2 2024-08-18 20:18:38,915 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 6950, loss[loss=0.08232, beats_loss=0.01193, ecapa_loss=0.0001679, whisper_loss=0.06871, over 22052.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01065, ecapa_loss=0.000144, whisper_loss=0.08865, over 3847826.90 frames. ], batch size: 91, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:18:54,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4080680.0, ans=0.125 2024-08-18 20:18:55,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4080680.0, ans=0.05 2024-08-18 20:18:57,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4080680.0, ans=0.125 2024-08-18 20:19:06,066 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-18 20:19:07,619 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.305e-02 2024-08-18 20:19:10,927 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-18 20:19:11,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4080780.0, ans=0.125 2024-08-18 20:19:12,153 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 26 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-18 20:19:13,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4080780.0, ans=0.1 2024-08-18 20:19:43,050 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 7000, loss[loss=0.1122, beats_loss=0.006627, ecapa_loss=0.0001793, whisper_loss=0.1038, over 15126.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01059, ecapa_loss=0.0001445, whisper_loss=0.08899, over 3855083.60 frames. ], batch size: 59, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:19:52,735 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.89 vs. limit=15.0 2024-08-18 20:19:54,717 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.235e+01 2.482e+01 2.791e+01 3.681e+01, threshold=4.964e+01, percent-clipped=0.0 2024-08-18 20:20:14,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4081280.0, ans=0.0 2024-08-18 20:20:18,407 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.83 vs. limit=22.5 2024-08-18 20:20:39,872 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 26 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-18 20:20:45,781 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4081480.0, ans=0.07 2024-08-18 20:20:47,845 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 7050, loss[loss=0.1029, beats_loss=0.00939, ecapa_loss=0.0001479, whisper_loss=0.09204, over 21341.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01054, ecapa_loss=0.0001447, whisper_loss=0.08966, over 3886216.96 frames. ], batch size: 85, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:20:57,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4081580.0, ans=0.2 2024-08-18 20:21:00,227 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.47 vs. limit=15.0 2024-08-18 20:21:00,722 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-18 20:21:12,232 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-18 20:21:16,338 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 17 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-18 20:21:17,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4081780.0, ans=0.0 2024-08-18 20:21:28,690 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2024-08-18 20:21:35,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4081880.0, ans=0.125 2024-08-18 20:21:39,275 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 33 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-18 20:21:52,369 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 7100, loss[loss=0.1075, beats_loss=0.01074, ecapa_loss=0.0001313, whisper_loss=0.09546, over 20787.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0106, ecapa_loss=0.000143, whisper_loss=0.08935, over 3873037.35 frames. ], batch size: 82, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:22:04,118 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.686e+01 2.313e+01 2.533e+01 2.792e+01 3.997e+01, threshold=5.065e+01, percent-clipped=0.0 2024-08-18 20:22:21,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4082280.0, ans=0.2 2024-08-18 20:22:24,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4082280.0, ans=0.0 2024-08-18 20:22:37,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4082380.0, ans=0.0 2024-08-18 20:22:51,756 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 37 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-18 20:22:57,943 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 7150, loss[loss=0.09342, beats_loss=0.009985, ecapa_loss=0.0001745, whisper_loss=0.08169, over 15902.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01054, ecapa_loss=0.0001433, whisper_loss=0.09022, over 3851202.96 frames. ], batch size: 62, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:22:59,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4082580.0, ans=0.2 2024-08-18 20:23:00,102 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.64 vs. limit=12.0 2024-08-18 20:23:01,817 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 14 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-18 20:23:13,418 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-18 20:23:29,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4082780.0, ans=0.0 2024-08-18 20:23:37,396 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-18 20:23:58,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4082980.0, ans=0.125 2024-08-18 20:24:03,592 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 7200, loss[loss=0.1039, beats_loss=0.009443, ecapa_loss=0.0001692, whisper_loss=0.09279, over 20046.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01054, ecapa_loss=0.0001429, whisper_loss=0.09013, over 3872347.63 frames. ], batch size: 83, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:24:05,126 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-18 20:24:05,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4083080.0, ans=0.125 2024-08-18 20:24:09,575 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.73 vs. limit=22.5 2024-08-18 20:24:14,862 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+01 2.258e+01 2.559e+01 2.767e+01 6.438e+01, threshold=5.118e+01, percent-clipped=2.0 2024-08-18 20:24:17,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4083180.0, ans=0.2 2024-08-18 20:24:17,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4083180.0, ans=0.125 2024-08-18 20:24:19,681 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.81 vs. limit=15.0 2024-08-18 20:24:27,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4083180.0, ans=0.2 2024-08-18 20:24:55,379 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-18 20:25:05,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4083480.0, ans=0.125 2024-08-18 20:25:08,625 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 7250, loss[loss=0.09001, beats_loss=0.01245, ecapa_loss=0.0001111, whisper_loss=0.07645, over 16341.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01056, ecapa_loss=0.0001425, whisper_loss=0.08982, over 3853198.47 frames. ], batch size: 64, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:25:26,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4083680.0, ans=0.125 2024-08-18 20:25:35,878 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2024-08-18 20:25:36,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4083780.0, ans=0.1 2024-08-18 20:25:38,806 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-18 20:25:42,915 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 28 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-18 20:25:43,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4083780.0, ans=0.125 2024-08-18 20:25:50,776 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-18 20:25:53,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4083880.0, ans=0.2 2024-08-18 20:26:16,589 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 7300, loss[loss=0.08409, beats_loss=0.01032, ecapa_loss=0.000156, whisper_loss=0.07221, over 16591.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0105, ecapa_loss=0.0001433, whisper_loss=0.09, over 3821528.07 frames. ], batch size: 67, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:26:32,303 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.408e+01 2.606e+01 2.923e+01 5.019e+01, threshold=5.211e+01, percent-clipped=0.0 2024-08-18 20:26:46,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4084180.0, ans=0.125 2024-08-18 20:26:55,351 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 13 from Vox, 46 fro AS 2024-08-18 20:27:04,547 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 24 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-18 20:27:17,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4084480.0, ans=0.0 2024-08-18 20:27:33,130 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 7350, loss[loss=0.08126, beats_loss=0.01264, ecapa_loss=0.0001233, whisper_loss=0.06739, over 14863.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01055, ecapa_loss=0.0001431, whisper_loss=0.08927, over 3813625.86 frames. ], batch size: 62, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:27:34,540 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 20 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-18 20:27:39,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4084580.0, ans=0.125 2024-08-18 20:27:44,150 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-18 20:27:55,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4084680.0, ans=0.0 2024-08-18 20:28:28,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4084880.0, ans=0.0 2024-08-18 20:28:32,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4084880.0, ans=0.125 2024-08-18 20:28:38,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=4084880.0, ans=0.05 2024-08-18 20:28:40,268 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-18 20:28:50,911 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-18 20:28:58,252 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.07 vs. limit=22.5 2024-08-18 20:29:06,963 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 7400, loss[loss=0.07572, beats_loss=0.0106, ecapa_loss=0.0001543, whisper_loss=0.06358, over 17376.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01055, ecapa_loss=0.0001431, whisper_loss=0.08956, over 3839962.86 frames. ], batch size: 71, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:29:25,007 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.347e+01 2.572e+01 2.832e+01 4.744e+01, threshold=5.145e+01, percent-clipped=0.0 2024-08-18 20:29:27,357 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.99 vs. limit=15.0 2024-08-18 20:29:33,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4085180.0, ans=0.0 2024-08-18 20:29:34,655 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 31 from Vox, 29 fro AS 2024-08-18 20:29:36,514 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.49 vs. limit=22.5 2024-08-18 20:29:37,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4085180.0, ans=0.0 2024-08-18 20:29:43,558 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.56 vs. limit=10.0 2024-08-18 20:30:01,155 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4085380.0, ans=0.1 2024-08-18 20:30:01,423 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.52 vs. limit=15.0 2024-08-18 20:30:03,074 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.03 vs. limit=6.0 2024-08-18 20:30:08,437 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4085380.0, ans=0.2 2024-08-18 20:30:16,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4085480.0, ans=0.025 2024-08-18 20:30:29,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4085480.0, ans=0.0 2024-08-18 20:30:38,993 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 7450, loss[loss=0.08457, beats_loss=0.01129, ecapa_loss=0.0001484, whisper_loss=0.0718, over 19171.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01055, ecapa_loss=0.0001435, whisper_loss=0.08919, over 3851151.90 frames. ], batch size: 81, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:31:08,060 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 20:31:47,830 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-18 20:32:15,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4085980.0, ans=0.0 2024-08-18 20:32:22,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4085980.0, ans=0.07 2024-08-18 20:32:27,621 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 7500, loss[loss=0.08903, beats_loss=0.01124, ecapa_loss=0.0001211, whisper_loss=0.07658, over 20041.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01051, ecapa_loss=0.000144, whisper_loss=0.08937, over 3836386.56 frames. ], batch size: 78, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:32:36,649 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.82 vs. limit=12.0 2024-08-18 20:32:47,086 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.294e+01 2.519e+01 2.774e+01 4.079e+01, threshold=5.039e+01, percent-clipped=0.0 2024-08-18 20:33:00,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4086180.0, ans=0.1 2024-08-18 20:33:29,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4086280.0, ans=0.0 2024-08-18 20:33:42,884 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 21 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-18 20:33:59,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4086480.0, ans=0.125 2024-08-18 20:34:02,644 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.99 vs. limit=15.0 2024-08-18 20:34:25,849 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 7550, loss[loss=0.08644, beats_loss=0.009412, ecapa_loss=0.0001632, whisper_loss=0.07539, over 17354.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01054, ecapa_loss=0.0001447, whisper_loss=0.08938, over 3820660.09 frames. ], batch size: 69, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:34:26,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=4086580.0, ans=10.0 2024-08-18 20:34:27,856 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.64 vs. limit=15.0 2024-08-18 20:34:28,001 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.54 vs. limit=22.5 2024-08-18 20:34:30,955 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 22 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-18 20:35:10,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4086780.0, ans=0.0 2024-08-18 20:35:14,849 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-18 20:35:16,328 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 29 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-18 20:35:20,964 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 16 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-18 20:35:27,886 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-18 20:35:45,201 WARNING [optim.py:496] (0/4) Scaling gradients by 0.029811669141054153, model_norm_threshold=50.385860443115234 2024-08-18 20:35:45,370 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.20, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.709e+05, grad_sumsq=5.709e+05, orig_rms_sq=1.000e+00 2024-08-18 20:35:51,139 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 7600, loss[loss=0.1248, beats_loss=0.009387, ecapa_loss=0.0001349, whisper_loss=0.1141, over 23832.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01058, ecapa_loss=0.0001441, whisper_loss=0.08999, over 3849087.59 frames. ], batch size: 90, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:36:04,040 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.298e+01 2.604e+01 3.012e+01 1.690e+03, threshold=5.209e+01, percent-clipped=1.0 2024-08-18 20:36:08,653 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-18 20:36:10,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4087180.0, ans=0.1 2024-08-18 20:36:32,125 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 20:36:33,622 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 38 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-18 20:36:49,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4087380.0, ans=0.0 2024-08-18 20:36:54,719 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 28 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-18 20:37:04,044 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.66 vs. limit=22.5 2024-08-18 20:37:05,865 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 7650, loss[loss=0.07084, beats_loss=0.01228, ecapa_loss=0.0001565, whisper_loss=0.057, over 13798.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01057, ecapa_loss=0.0001449, whisper_loss=0.08957, over 3851497.86 frames. ], batch size: 58, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:37:13,357 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.08 vs. limit=22.5 2024-08-18 20:37:18,852 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.24 vs. limit=15.0 2024-08-18 20:37:41,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4087780.0, ans=0.125 2024-08-18 20:38:07,963 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.15 vs. limit=15.0 2024-08-18 20:38:21,642 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 7700, loss[loss=0.1018, beats_loss=0.01057, ecapa_loss=0.0001362, whisper_loss=0.08986, over 16048.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01057, ecapa_loss=0.0001449, whisper_loss=0.08924, over 3830222.18 frames. ], batch size: 66, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:38:22,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4088080.0, ans=0.2 2024-08-18 20:38:34,623 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.256e+01 2.493e+01 2.776e+01 3.819e+01, threshold=4.986e+01, percent-clipped=0.0 2024-08-18 20:38:56,912 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 14 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-18 20:39:10,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4088380.0, ans=0.0 2024-08-18 20:39:13,824 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.41 vs. limit=15.0 2024-08-18 20:39:35,766 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 7750, loss[loss=0.1044, beats_loss=0.00616, ecapa_loss=0.0001814, whisper_loss=0.09647, over 17172.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01048, ecapa_loss=0.0001453, whisper_loss=0.08975, over 3834362.12 frames. ], batch size: 70, lr: 2.19e-03, grad_scale: 1.152921504606847e+18 2024-08-18 20:39:55,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4088680.0, ans=0.0 2024-08-18 20:40:05,722 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.916e+00 2024-08-18 20:40:11,404 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-18 20:40:49,641 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.65 vs. limit=6.0 2024-08-18 20:40:50,125 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 7800, loss[loss=0.1025, beats_loss=0.01161, ecapa_loss=0.0001103, whisper_loss=0.08981, over 23511.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01055, ecapa_loss=0.0001442, whisper_loss=0.08921, over 3860026.33 frames. ], batch size: 90, lr: 2.18e-03, grad_scale: 1.152921504606847e+18 2024-08-18 20:41:01,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4089080.0, ans=0.0 2024-08-18 20:41:02,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4089080.0, ans=0.125 2024-08-18 20:41:02,868 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.709e+01 2.370e+01 2.620e+01 3.018e+01 4.706e+01, threshold=5.240e+01, percent-clipped=0.0 2024-08-18 20:41:07,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4089180.0, ans=0.125 2024-08-18 20:41:10,704 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.72 vs. limit=22.5 2024-08-18 20:41:24,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4089280.0, ans=0.2 2024-08-18 20:41:37,221 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 22 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-18 20:41:39,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4089380.0, ans=0.1 2024-08-18 20:41:39,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4089380.0, ans=0.09899494936611666 2024-08-18 20:41:46,628 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 20:42:04,523 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 7850, loss[loss=0.0993, beats_loss=0.01122, ecapa_loss=0.000132, whisper_loss=0.08676, over 22854.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0106, ecapa_loss=0.0001437, whisper_loss=0.08933, over 3876956.07 frames. ], batch size: 92, lr: 2.18e-03, grad_scale: 1.152921504606847e+18 2024-08-18 20:42:07,973 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 19 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-18 20:42:13,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4089580.0, ans=0.125 2024-08-18 20:42:33,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4089780.0, ans=0.0 2024-08-18 20:42:38,551 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 19 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-18 20:42:43,674 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 35 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-18 20:43:09,095 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-18 20:43:15,346 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.32 vs. limit=10.0 2024-08-18 20:43:17,757 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 7900, loss[loss=0.09778, beats_loss=0.01157, ecapa_loss=0.0001095, whisper_loss=0.08511, over 22892.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01058, ecapa_loss=0.0001442, whisper_loss=0.08985, over 3871987.44 frames. ], batch size: 91, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:43:21,900 INFO [train_multi_KD3.py:844] (0/4) A total of 97 cuts. 24 from LS+wenet, 27 from Vox, 46 fro AS 2024-08-18 20:43:22,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.whiten.whitening_limit, batch_count=4090080.0, ans=12.0 2024-08-18 20:43:32,447 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.778e+01 2.368e+01 2.642e+01 2.985e+01 1.655e+02, threshold=5.283e+01, percent-clipped=2.0 2024-08-18 20:43:32,643 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 30 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-18 20:43:34,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4090180.0, ans=0.1 2024-08-18 20:43:44,175 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-18 20:43:51,323 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 23 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-18 20:44:13,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4090380.0, ans=0.125 2024-08-18 20:44:17,325 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.45 vs. limit=22.5 2024-08-18 20:44:23,301 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.87 vs. limit=22.5 2024-08-18 20:44:29,982 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 7950, loss[loss=0.1146, beats_loss=0.009845, ecapa_loss=0.0001223, whisper_loss=0.1035, over 21970.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01052, ecapa_loss=0.000144, whisper_loss=0.09071, over 3875402.13 frames. ], batch size: 87, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:44:30,706 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.32 vs. limit=10.0 2024-08-18 20:44:34,170 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 21 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-18 20:44:36,836 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.78 vs. limit=5.0 2024-08-18 20:44:37,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4090580.0, ans=0.0 2024-08-18 20:44:38,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4090580.0, ans=0.125 2024-08-18 20:44:46,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4090680.0, ans=0.125 2024-08-18 20:44:59,904 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-18 20:45:00,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4090780.0, ans=0.125 2024-08-18 20:45:04,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4090780.0, ans=0.125 2024-08-18 20:45:07,331 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 20 from LS+wenet, 26 from Vox, 45 fro AS 2024-08-18 20:45:11,623 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 30 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-18 20:45:32,893 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.96 vs. limit=15.0 2024-08-18 20:45:40,976 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 8000, loss[loss=0.07106, beats_loss=0.01605, ecapa_loss=9.459e-05, whisper_loss=0.05406, over 14227.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01054, ecapa_loss=0.0001429, whisper_loss=0.09036, over 3892863.70 frames. ], batch size: 58, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:45:52,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4091080.0, ans=0.0 2024-08-18 20:45:56,014 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.375e+01 2.587e+01 2.855e+01 4.354e+01, threshold=5.173e+01, percent-clipped=0.0 2024-08-18 20:46:01,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4091180.0, ans=0.0 2024-08-18 20:46:05,479 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 21 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-18 20:46:17,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4091280.0, ans=0.2 2024-08-18 20:46:52,646 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 8050, loss[loss=0.09144, beats_loss=0.00894, ecapa_loss=0.0001997, whisper_loss=0.0805, over 14425.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01046, ecapa_loss=0.0001435, whisper_loss=0.09104, over 3898374.37 frames. ], batch size: 61, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:46:53,342 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.43 vs. limit=15.0 2024-08-18 20:46:55,967 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.95 vs. limit=15.0 2024-08-18 20:47:36,511 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-18 20:48:00,116 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 8100, loss[loss=0.08461, beats_loss=0.01197, ecapa_loss=0.0001875, whisper_loss=0.07076, over 19253.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0104, ecapa_loss=0.0001449, whisper_loss=0.09087, over 3875582.29 frames. ], batch size: 84, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:48:14,188 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.377e+01 2.582e+01 2.787e+01 4.982e+01, threshold=5.165e+01, percent-clipped=0.0 2024-08-18 20:48:15,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4092180.0, ans=0.05 2024-08-18 20:48:16,143 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.90 vs. limit=10.0 2024-08-18 20:48:31,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4092280.0, ans=0.0 2024-08-18 20:48:40,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4092380.0, ans=0.025 2024-08-18 20:48:43,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4092380.0, ans=0.125 2024-08-18 20:48:45,887 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 24 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-18 20:49:10,731 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 8150, loss[loss=0.1146, beats_loss=0.008969, ecapa_loss=0.0001652, whisper_loss=0.104, over 21892.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01044, ecapa_loss=0.0001447, whisper_loss=0.09047, over 3870337.21 frames. ], batch size: 86, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:49:22,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4092580.0, ans=0.04949747468305833 2024-08-18 20:49:33,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4092680.0, ans=0.2 2024-08-18 20:50:10,256 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-18 20:50:10,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4092980.0, ans=0.0 2024-08-18 20:50:22,415 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 8200, loss[loss=0.1079, beats_loss=0.008633, ecapa_loss=0.0001218, whisper_loss=0.09804, over 17913.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01046, ecapa_loss=0.0001438, whisper_loss=0.08961, over 3877291.90 frames. ], batch size: 68, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:50:34,831 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.16 vs. limit=15.0 2024-08-18 20:50:35,268 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.361e+01 2.593e+01 2.842e+01 4.964e+01, threshold=5.187e+01, percent-clipped=0.0 2024-08-18 20:50:41,115 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.06 vs. limit=15.0 2024-08-18 20:50:42,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4093180.0, ans=0.2 2024-08-18 20:50:42,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4093180.0, ans=0.04949747468305833 2024-08-18 20:50:44,478 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 37 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-18 20:51:00,978 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.161e+05 2024-08-18 20:51:06,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4093380.0, ans=0.1 2024-08-18 20:51:11,014 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 36 from LS+wenet, 13 from Vox, 44 fro AS 2024-08-18 20:51:14,900 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-18 20:51:23,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4093480.0, ans=0.0 2024-08-18 20:51:29,651 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 8250, loss[loss=0.1027, beats_loss=0.009791, ecapa_loss=0.0001153, whisper_loss=0.09178, over 17193.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01051, ecapa_loss=0.0001434, whisper_loss=0.08971, over 3854184.67 frames. ], batch size: 65, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:51:36,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4093580.0, ans=0.125 2024-08-18 20:51:42,122 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 36 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-18 20:51:50,455 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 26 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-18 20:51:51,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4093680.0, ans=0.125 2024-08-18 20:51:52,295 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.67 vs. limit=15.0 2024-08-18 20:52:00,308 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-18 20:52:04,331 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 17 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-18 20:52:04,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4093780.0, ans=0.0 2024-08-18 20:52:12,328 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 22 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-18 20:52:19,348 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-18 20:52:28,194 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 23 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-18 20:52:34,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4093980.0, ans=0.0 2024-08-18 20:52:39,851 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 8300, loss[loss=0.1253, beats_loss=0.008146, ecapa_loss=0.0001573, whisper_loss=0.1156, over 17406.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01056, ecapa_loss=0.0001426, whisper_loss=0.08927, over 3853198.75 frames. ], batch size: 66, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:52:53,831 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.411e+01 2.666e+01 2.982e+01 3.666e+02, threshold=5.332e+01, percent-clipped=2.0 2024-08-18 20:52:55,215 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-18 20:53:01,493 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.87 vs. limit=10.0 2024-08-18 20:53:10,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4094280.0, ans=0.1 2024-08-18 20:53:11,230 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.96 vs. limit=15.0 2024-08-18 20:53:14,146 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 24 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-18 20:53:20,868 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.317e+00 2024-08-18 20:53:27,175 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-18 20:53:28,773 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 21 from LS+wenet, 22 from Vox, 50 fro AS 2024-08-18 20:53:33,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4094480.0, ans=0.07 2024-08-18 20:53:39,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4094480.0, ans=0.125 2024-08-18 20:53:48,433 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 8350, loss[loss=0.1002, beats_loss=0.00957, ecapa_loss=0.0001701, whisper_loss=0.08891, over 21700.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01064, ecapa_loss=0.0001426, whisper_loss=0.08872, over 3849080.13 frames. ], batch size: 91, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:54:02,245 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.01 vs. limit=6.0 2024-08-18 20:54:20,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4094780.0, ans=0.0 2024-08-18 20:54:25,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4094780.0, ans=0.025 2024-08-18 20:54:47,870 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-18 20:54:49,649 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.71 vs. limit=15.0 2024-08-18 20:54:50,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4094980.0, ans=0.1 2024-08-18 20:54:55,475 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 8400, loss[loss=0.1134, beats_loss=0.01013, ecapa_loss=0.000141, whisper_loss=0.1019, over 23008.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01053, ecapa_loss=0.0001429, whisper_loss=0.08975, over 3856860.17 frames. ], batch size: 90, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:55:02,041 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 25 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-18 20:55:06,492 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-18 20:55:09,026 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.021e+01 2.413e+01 2.574e+01 2.874e+01 4.308e+01, threshold=5.147e+01, percent-clipped=0.0 2024-08-18 20:55:12,069 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.017e+01 2024-08-18 20:55:15,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4095180.0, ans=0.125 2024-08-18 20:55:16,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4095180.0, ans=0.125 2024-08-18 20:55:16,680 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.29 vs. limit=15.0 2024-08-18 20:55:19,141 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 24 from LS+wenet, 17 from Vox, 16 fro AS 2024-08-18 20:55:19,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4095180.0, ans=0.125 2024-08-18 20:55:29,539 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.51 vs. limit=15.0 2024-08-18 20:55:38,056 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.90 vs. limit=15.0 2024-08-18 20:55:56,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4095480.0, ans=0.2 2024-08-18 20:56:05,826 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 8450, loss[loss=0.1072, beats_loss=0.01213, ecapa_loss=0.0001346, whisper_loss=0.09368, over 22582.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0105, ecapa_loss=0.0001431, whisper_loss=0.08978, over 3877404.98 frames. ], batch size: 92, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:56:19,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4095680.0, ans=0.0 2024-08-18 20:56:25,687 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 23 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-18 20:56:31,081 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 36 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-18 20:56:44,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4095780.0, ans=0.2 2024-08-18 20:56:45,217 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 25 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-18 20:56:53,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4095880.0, ans=0.0 2024-08-18 20:57:15,196 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 25 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-18 20:57:16,809 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 8500, loss[loss=0.1115, beats_loss=0.00977, ecapa_loss=0.0001438, whisper_loss=0.1003, over 16905.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01058, ecapa_loss=0.0001423, whisper_loss=0.08948, over 3901863.03 frames. ], batch size: 66, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:57:20,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4096080.0, ans=0.125 2024-08-18 20:57:30,711 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.28 vs. limit=22.5 2024-08-18 20:57:35,424 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.290e+01 2.484e+01 2.745e+01 4.794e+01, threshold=4.968e+01, percent-clipped=0.0 2024-08-18 20:57:37,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4096180.0, ans=0.0 2024-08-18 20:57:48,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4096280.0, ans=0.125 2024-08-18 20:57:53,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4096280.0, ans=0.125 2024-08-18 20:58:16,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4096380.0, ans=0.1 2024-08-18 20:58:24,387 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 28 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-18 20:58:33,925 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 8550, loss[loss=0.1185, beats_loss=0.008438, ecapa_loss=0.0001603, whisper_loss=0.1084, over 22592.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01061, ecapa_loss=0.0001417, whisper_loss=0.08896, over 3880826.33 frames. ], batch size: 92, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:58:38,986 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 20 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-18 20:58:45,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4096580.0, ans=0.0 2024-08-18 20:59:09,257 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-18 20:59:24,185 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 26 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-18 20:59:32,393 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 21 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-18 20:59:37,685 WARNING [optim.py:496] (0/4) Scaling gradients by 0.05524347350001335, model_norm_threshold=49.67615509033203 2024-08-18 20:59:37,853 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.096e+05, grad_sumsq=1.096e+05, orig_rms_sq=1.000e+00 2024-08-18 20:59:40,997 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-18 20:59:47,828 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 8600, loss[loss=0.1141, beats_loss=0.008462, ecapa_loss=0.0001966, whisper_loss=0.1037, over 18389.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0106, ecapa_loss=0.0001421, whisper_loss=0.08936, over 3898017.38 frames. ], batch size: 79, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:59:59,536 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-18 21:00:02,266 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.313e+01 2.617e+01 3.011e+01 8.992e+02, threshold=5.234e+01, percent-clipped=3.0 2024-08-18 21:00:02,852 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.08 vs. limit=22.5 2024-08-18 21:00:13,175 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 30 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-18 21:00:15,111 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 28 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-18 21:00:17,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4097280.0, ans=0.125 2024-08-18 21:00:18,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4097280.0, ans=0.0 2024-08-18 21:00:22,631 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.62 vs. limit=22.5 2024-08-18 21:00:24,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4097280.0, ans=0.015 2024-08-18 21:00:37,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4097380.0, ans=0.0 2024-08-18 21:00:46,959 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 23 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-18 21:00:51,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4097480.0, ans=0.125 2024-08-18 21:00:53,517 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-18 21:00:55,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4097480.0, ans=0.0 2024-08-18 21:00:57,472 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 8650, loss[loss=0.1122, beats_loss=0.01231, ecapa_loss=0.0001453, whisper_loss=0.09848, over 19003.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01061, ecapa_loss=0.0001422, whisper_loss=0.08943, over 3903859.65 frames. ], batch size: 78, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:00:57,570 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-18 21:01:03,251 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.63 vs. limit=15.0 2024-08-18 21:01:20,510 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 26 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-18 21:01:26,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4097780.0, ans=0.125 2024-08-18 21:01:30,685 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.21 vs. limit=15.0 2024-08-18 21:01:37,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4097780.0, ans=0.0 2024-08-18 21:01:57,588 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 17 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-18 21:01:59,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4097980.0, ans=0.2 2024-08-18 21:02:12,060 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 8700, loss[loss=0.114, beats_loss=0.01025, ecapa_loss=0.0001356, whisper_loss=0.1024, over 21768.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01061, ecapa_loss=0.0001413, whisper_loss=0.08974, over 3884716.29 frames. ], batch size: 88, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:02:27,006 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.217e+01 2.441e+01 2.789e+01 4.170e+01, threshold=4.882e+01, percent-clipped=0.0 2024-08-18 21:02:31,994 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 22 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-18 21:03:10,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4098480.0, ans=0.07 2024-08-18 21:03:11,167 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.08 vs. limit=15.0 2024-08-18 21:03:16,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4098480.0, ans=0.1 2024-08-18 21:03:17,628 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-18 21:03:24,116 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 8750, loss[loss=0.08543, beats_loss=0.01229, ecapa_loss=0.0001535, whisper_loss=0.0716, over 17053.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01054, ecapa_loss=0.0001422, whisper_loss=0.09034, over 3907829.04 frames. ], batch size: 73, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:03:29,502 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-18 21:03:36,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4098580.0, ans=0.0 2024-08-18 21:03:43,526 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-18 21:03:49,214 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 27 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-18 21:03:51,083 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.92 vs. limit=15.0 2024-08-18 21:04:03,546 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.37 vs. limit=15.0 2024-08-18 21:04:08,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=4098780.0, ans=0.95 2024-08-18 21:04:22,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4098880.0, ans=0.125 2024-08-18 21:04:25,407 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 23 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-18 21:04:28,269 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 23 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-18 21:04:36,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4098980.0, ans=0.125 2024-08-18 21:04:37,571 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-18 21:04:41,095 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 8800, loss[loss=0.0852, beats_loss=0.0126, ecapa_loss=0.0001311, whisper_loss=0.07129, over 13234.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01057, ecapa_loss=0.0001411, whisper_loss=0.09035, over 3893355.40 frames. ], batch size: 54, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:04:48,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4099080.0, ans=0.07 2024-08-18 21:04:54,933 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 21 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-18 21:04:56,558 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.312e+01 2.589e+01 2.893e+01 4.195e+01, threshold=5.178e+01, percent-clipped=0.0 2024-08-18 21:05:12,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4099280.0, ans=0.125 2024-08-18 21:05:12,450 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.88 vs. limit=12.0 2024-08-18 21:05:22,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4099280.0, ans=0.1 2024-08-18 21:05:32,828 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 15 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-18 21:05:53,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4099480.0, ans=0.125 2024-08-18 21:05:58,905 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 8850, loss[loss=0.06995, beats_loss=0.01045, ecapa_loss=0.0001811, whisper_loss=0.05769, over 15612.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01056, ecapa_loss=0.0001418, whisper_loss=0.0901, over 3885352.55 frames. ], batch size: 69, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:06:00,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4099580.0, ans=0.1 2024-08-18 21:06:04,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4099580.0, ans=0.125 2024-08-18 21:06:06,313 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 28 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-18 21:06:15,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4099680.0, ans=0.0 2024-08-18 21:06:31,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4099780.0, ans=0.125 2024-08-18 21:06:34,034 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-18 21:07:16,733 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 8900, loss[loss=0.09876, beats_loss=0.01103, ecapa_loss=0.0001418, whisper_loss=0.08632, over 19462.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01063, ecapa_loss=0.0001416, whisper_loss=0.08974, over 3882271.29 frames. ], batch size: 75, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:07:33,459 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.732e+01 2.315e+01 2.485e+01 2.808e+01 3.547e+01, threshold=4.971e+01, percent-clipped=0.0 2024-08-18 21:08:02,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4100280.0, ans=0.1 2024-08-18 21:08:04,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4100280.0, ans=0.0 2024-08-18 21:08:33,629 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.96 vs. limit=15.0 2024-08-18 21:08:37,300 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 22 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-18 21:08:38,883 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 8950, loss[loss=0.09117, beats_loss=0.01137, ecapa_loss=0.0001216, whisper_loss=0.07858, over 20554.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01064, ecapa_loss=0.0001415, whisper_loss=0.08918, over 3862979.73 frames. ], batch size: 82, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:08:45,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4100580.0, ans=0.125 2024-08-18 21:08:47,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4100580.0, ans=0.1 2024-08-18 21:08:48,049 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.27 vs. limit=15.0 2024-08-18 21:09:27,150 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 25 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-18 21:09:35,145 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-18 21:09:41,176 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 36 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-18 21:09:51,916 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 9000, loss[loss=0.1145, beats_loss=0.007438, ecapa_loss=0.0001701, whisper_loss=0.1053, over 16691.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01053, ecapa_loss=0.0001431, whisper_loss=0.08975, over 3845563.57 frames. ], batch size: 66, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:09:51,918 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-18 21:10:26,969 INFO [train_multi_KD3.py:1149] (0/4) Epoch 28, validation on ASR_libri: loss=0.2539, beats_loss=0, ecapa_loss=0.0005164, whisper_loss=0.2487, over 922467.00 frames. 2024-08-18 21:10:44,147 INFO [train_multi_KD3.py:1149] (0/4) Epoch 28, validation on SV_voxceleb1: loss=0.004068, beats_loss=0, ecapa_loss=0.0004068, whisper_loss=0, over 939242.00 frames. 2024-08-18 21:12:26,359 INFO [train_multi_KD3.py:1149] (0/4) Epoch 28, validation on AT_audioset: loss=0.0231, beats_loss=0.0231, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 21:12:26,363 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32389MB 2024-08-18 21:12:40,841 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.331e+01 2.657e+01 3.077e+01 4.248e+01, threshold=5.315e+01, percent-clipped=0.0 2024-08-18 21:12:57,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4101280.0, ans=0.125 2024-08-18 21:13:06,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4101280.0, ans=0.125 2024-08-18 21:13:09,971 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.62 vs. limit=6.0 2024-08-18 21:13:37,780 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-18 21:13:38,883 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 9050, loss[loss=0.09249, beats_loss=0.01168, ecapa_loss=0.0001166, whisper_loss=0.07965, over 15072.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0105, ecapa_loss=0.0001436, whisper_loss=0.08981, over 3838460.42 frames. ], batch size: 60, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:13:46,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4101580.0, ans=0.125 2024-08-18 21:14:12,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4101780.0, ans=0.0 2024-08-18 21:14:19,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4101780.0, ans=0.1 2024-08-18 21:14:34,261 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2024-08-18 21:14:39,097 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 23 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-18 21:14:48,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4101980.0, ans=0.09899494936611666 2024-08-18 21:14:49,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4101980.0, ans=0.125 2024-08-18 21:14:52,427 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 9100, loss[loss=0.09721, beats_loss=0.009304, ecapa_loss=0.0001623, whisper_loss=0.08628, over 22774.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01048, ecapa_loss=0.0001437, whisper_loss=0.08955, over 3849871.10 frames. ], batch size: 92, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:15:04,210 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 21 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-18 21:15:06,453 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.769e+01 2.448e+01 2.687e+01 2.999e+01 3.130e+02, threshold=5.374e+01, percent-clipped=2.0 2024-08-18 21:15:11,131 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 18 from LS+wenet, 10 from Vox, 25 fro AS 2024-08-18 21:15:11,652 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.58 vs. limit=15.0 2024-08-18 21:15:35,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4102380.0, ans=0.2 2024-08-18 21:15:37,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4102380.0, ans=0.125 2024-08-18 21:15:59,806 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.15 vs. limit=22.5 2024-08-18 21:16:05,382 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 9150, loss[loss=0.1201, beats_loss=0.01034, ecapa_loss=0.0001384, whisper_loss=0.1084, over 18195.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01041, ecapa_loss=0.0001442, whisper_loss=0.0903, over 3846881.82 frames. ], batch size: 70, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:16:18,551 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 21 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-18 21:16:25,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4102680.0, ans=0.1 2024-08-18 21:16:41,744 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 23 from LS+wenet, 19 from Vox, 16 fro AS 2024-08-18 21:16:49,595 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-18 21:16:52,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4102880.0, ans=0.0 2024-08-18 21:17:13,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4103080.0, ans=0.0 2024-08-18 21:17:15,280 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 9200, loss[loss=0.1116, beats_loss=0.009328, ecapa_loss=0.0001492, whisper_loss=0.1008, over 22878.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01046, ecapa_loss=0.0001439, whisper_loss=0.09048, over 3862521.29 frames. ], batch size: 92, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:17:21,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=4103080.0, ans=15.0 2024-08-18 21:17:24,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4103080.0, ans=0.125 2024-08-18 21:17:27,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4103080.0, ans=0.2 2024-08-18 21:17:29,683 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.018e+01 2.309e+01 2.617e+01 2.863e+01 2.175e+02, threshold=5.234e+01, percent-clipped=2.0 2024-08-18 21:17:30,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4103180.0, ans=0.0 2024-08-18 21:17:36,157 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-18 21:17:40,431 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-18 21:17:53,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4103280.0, ans=0.1 2024-08-18 21:18:09,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4103380.0, ans=0.0 2024-08-18 21:18:26,849 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 9250, loss[loss=0.1006, beats_loss=0.01002, ecapa_loss=0.0001876, whisper_loss=0.0887, over 20330.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01049, ecapa_loss=0.0001446, whisper_loss=0.09035, over 3860555.90 frames. ], batch size: 88, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:18:29,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4103580.0, ans=0.1 2024-08-18 21:18:52,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4103680.0, ans=0.125 2024-08-18 21:18:56,070 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-18 21:18:58,955 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-18 21:19:31,629 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 26 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-18 21:19:35,030 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.27 vs. limit=10.0 2024-08-18 21:19:40,569 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 9300, loss[loss=0.1002, beats_loss=0.01005, ecapa_loss=0.0001445, whisper_loss=0.08874, over 18634.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01041, ecapa_loss=0.0001456, whisper_loss=0.09063, over 3880089.21 frames. ], batch size: 73, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:19:47,458 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-18 21:19:53,832 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.381e+01 2.641e+01 3.038e+01 1.790e+02, threshold=5.283e+01, percent-clipped=1.0 2024-08-18 21:20:05,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4104180.0, ans=0.2 2024-08-18 21:20:46,091 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 26 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-18 21:20:46,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4104480.0, ans=0.125 2024-08-18 21:20:46,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4104480.0, ans=0.125 2024-08-18 21:20:46,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4104480.0, ans=0.125 2024-08-18 21:20:51,660 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 9350, loss[loss=0.1036, beats_loss=0.009655, ecapa_loss=0.0001266, whisper_loss=0.09271, over 18066.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01031, ecapa_loss=0.0001451, whisper_loss=0.0918, over 3871583.21 frames. ], batch size: 67, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:20:55,506 WARNING [optim.py:496] (0/4) Scaling gradients by 0.06869103014469147, model_norm_threshold=52.8264274597168 2024-08-18 21:20:55,672 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.21, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.260e+05, grad_sumsq=1.216e+07, orig_rms_sq=1.036e-02 2024-08-18 21:21:06,557 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 28 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-18 21:21:12,511 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=9.336e+00 2024-08-18 21:21:22,429 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.40 vs. limit=15.0 2024-08-18 21:21:28,523 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.086e+00 2024-08-18 21:21:54,673 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-18 21:22:01,318 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 9400, loss[loss=0.1321, beats_loss=0.008111, ecapa_loss=0.0001791, whisper_loss=0.1222, over 15805.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01023, ecapa_loss=0.0001445, whisper_loss=0.09255, over 3851334.80 frames. ], batch size: 65, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:22:09,494 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-18 21:22:16,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4105180.0, ans=0.2 2024-08-18 21:22:17,195 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.616e+01 2.374e+01 2.623e+01 3.026e+01 7.690e+02, threshold=5.245e+01, percent-clipped=3.0 2024-08-18 21:22:30,378 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 21 from LS+wenet, 7 from Vox, 31 fro AS 2024-08-18 21:22:31,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4105280.0, ans=0.125 2024-08-18 21:22:34,025 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 15 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-18 21:22:36,258 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.65 vs. limit=6.0 2024-08-18 21:22:36,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4105280.0, ans=0.0 2024-08-18 21:22:48,451 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 20 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-18 21:22:55,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4105380.0, ans=0.1 2024-08-18 21:22:58,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4105480.0, ans=0.125 2024-08-18 21:23:01,059 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-18 21:23:02,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4105480.0, ans=0.0 2024-08-18 21:23:07,870 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.837e+00 2024-08-18 21:23:12,776 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 9450, loss[loss=0.09083, beats_loss=0.01041, ecapa_loss=0.0001305, whisper_loss=0.07912, over 17461.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01039, ecapa_loss=0.0001436, whisper_loss=0.09127, over 3850329.01 frames. ], batch size: 68, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:23:17,484 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.96 vs. limit=15.0 2024-08-18 21:23:18,395 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 16 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-18 21:23:32,557 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-18 21:23:32,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4105680.0, ans=0.1 2024-08-18 21:23:44,676 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-18 21:23:51,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4105780.0, ans=0.025 2024-08-18 21:23:54,588 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-18 21:23:54,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4105880.0, ans=0.125 2024-08-18 21:24:04,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4105880.0, ans=0.1 2024-08-18 21:24:07,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4105880.0, ans=0.0 2024-08-18 21:24:16,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4105980.0, ans=0.1 2024-08-18 21:24:19,903 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 22 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-18 21:24:25,740 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 9500, loss[loss=0.106, beats_loss=0.009577, ecapa_loss=0.000146, whisper_loss=0.09494, over 23350.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01046, ecapa_loss=0.000143, whisper_loss=0.0905, over 3848905.69 frames. ], batch size: 91, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:24:33,373 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.90 vs. limit=15.0 2024-08-18 21:24:42,300 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.355e+01 2.580e+01 2.944e+01 6.232e+01, threshold=5.161e+01, percent-clipped=0.0 2024-08-18 21:24:44,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4106180.0, ans=0.125 2024-08-18 21:24:57,214 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.94 vs. limit=15.0 2024-08-18 21:25:08,781 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4106280.0, ans=0.125 2024-08-18 21:25:40,309 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 9550, loss[loss=0.1056, beats_loss=0.009613, ecapa_loss=0.000157, whisper_loss=0.09447, over 23044.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01047, ecapa_loss=0.0001444, whisper_loss=0.09027, over 3854037.03 frames. ], batch size: 91, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:25:57,431 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 17 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-18 21:26:26,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4106880.0, ans=0.1 2024-08-18 21:26:34,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4106880.0, ans=0.1 2024-08-18 21:26:36,484 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.45 vs. limit=12.0 2024-08-18 21:26:41,040 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.43 vs. limit=15.0 2024-08-18 21:26:47,020 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.90 vs. limit=10.0 2024-08-18 21:26:51,206 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2024-08-18 21:26:53,578 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 9600, loss[loss=0.1167, beats_loss=0.01, ecapa_loss=0.0001474, whisper_loss=0.1052, over 22359.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0104, ecapa_loss=0.0001443, whisper_loss=0.09069, over 3846110.06 frames. ], batch size: 90, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:27:03,393 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-18 21:27:03,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4107080.0, ans=0.0 2024-08-18 21:27:05,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4107080.0, ans=0.1 2024-08-18 21:27:07,899 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.928e+01 2.422e+01 2.704e+01 3.075e+01 1.101e+02, threshold=5.409e+01, percent-clipped=2.0 2024-08-18 21:27:10,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4107180.0, ans=0.125 2024-08-18 21:27:12,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4107180.0, ans=0.125 2024-08-18 21:27:18,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4107180.0, ans=0.125 2024-08-18 21:27:49,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4107380.0, ans=0.0 2024-08-18 21:28:04,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4107480.0, ans=0.125 2024-08-18 21:28:06,496 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 9650, loss[loss=0.1121, beats_loss=0.009784, ecapa_loss=0.0001743, whisper_loss=0.1006, over 16834.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01037, ecapa_loss=0.0001449, whisper_loss=0.09098, over 3856545.60 frames. ], batch size: 70, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:28:16,080 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 19 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-18 21:28:20,546 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 33 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-18 21:28:23,930 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.91 vs. limit=15.0 2024-08-18 21:28:39,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4107780.0, ans=0.2 2024-08-18 21:28:39,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4107780.0, ans=0.125 2024-08-18 21:28:42,007 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 16 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-18 21:28:59,208 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 12 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-18 21:29:12,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4108080.0, ans=0.0 2024-08-18 21:29:13,841 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 9700, loss[loss=0.08191, beats_loss=0.01105, ecapa_loss=0.0001154, whisper_loss=0.0697, over 15453.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01031, ecapa_loss=0.0001469, whisper_loss=0.09093, over 3849491.57 frames. ], batch size: 59, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:29:19,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4108080.0, ans=0.125 2024-08-18 21:29:27,638 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.335e+01 2.610e+01 2.872e+01 4.548e+01, threshold=5.221e+01, percent-clipped=0.0 2024-08-18 21:29:31,821 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 20 from LS+wenet, 31 from Vox, 28 fro AS 2024-08-18 21:29:33,454 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 24 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-18 21:29:33,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4108180.0, ans=0.125 2024-08-18 21:29:47,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4108280.0, ans=0.0 2024-08-18 21:29:57,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4108380.0, ans=0.0 2024-08-18 21:29:59,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4108380.0, ans=0.125 2024-08-18 21:30:05,554 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 24 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-18 21:30:05,785 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-18 21:30:17,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4108480.0, ans=0.2 2024-08-18 21:30:19,280 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.80 vs. limit=15.0 2024-08-18 21:30:22,459 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 9750, loss[loss=0.09837, beats_loss=0.01015, ecapa_loss=0.0001532, whisper_loss=0.08668, over 20137.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01031, ecapa_loss=0.0001466, whisper_loss=0.09019, over 3844025.84 frames. ], batch size: 82, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:30:22,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4108580.0, ans=0.125 2024-08-18 21:30:31,611 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 28 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-18 21:30:44,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4108680.0, ans=0.025 2024-08-18 21:30:54,076 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-18 21:30:55,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4108780.0, ans=0.2 2024-08-18 21:31:02,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4108880.0, ans=0.125 2024-08-18 21:31:08,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4108880.0, ans=0.0 2024-08-18 21:31:09,885 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 30 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-18 21:31:26,932 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.55 vs. limit=12.0 2024-08-18 21:31:30,471 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 9800, loss[loss=0.08406, beats_loss=0.01349, ecapa_loss=0.0001324, whisper_loss=0.06924, over 17266.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01036, ecapa_loss=0.0001459, whisper_loss=0.09014, over 3850392.71 frames. ], batch size: 71, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:31:37,504 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 21:31:38,769 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 19 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-18 21:31:43,473 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.918e+01 2.279e+01 2.411e+01 2.686e+01 7.087e+01, threshold=4.821e+01, percent-clipped=1.0 2024-08-18 21:31:48,631 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 14 from Vox, 46 fro AS 2024-08-18 21:31:50,266 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.50 vs. limit=12.0 2024-08-18 21:32:11,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4109380.0, ans=0.0 2024-08-18 21:32:13,380 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 33 from LS+wenet, 13 from Vox, 41 fro AS 2024-08-18 21:32:19,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4109380.0, ans=0.2 2024-08-18 21:32:21,816 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 35 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-18 21:32:36,149 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 9850, loss[loss=0.09776, beats_loss=0.007347, ecapa_loss=0.0001728, whisper_loss=0.08868, over 16777.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01047, ecapa_loss=0.0001444, whisper_loss=0.08998, over 3841538.50 frames. ], batch size: 67, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:32:40,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4109580.0, ans=0.125 2024-08-18 21:32:47,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4109580.0, ans=0.125 2024-08-18 21:32:51,869 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4109680.0, ans=0.0 2024-08-18 21:32:57,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=4109680.0, ans=0.025 2024-08-18 21:33:16,155 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4109780.0, ans=0.1 2024-08-18 21:33:18,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4109880.0, ans=0.1 2024-08-18 21:33:48,374 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 9900, loss[loss=0.1108, beats_loss=0.009524, ecapa_loss=0.0001526, whisper_loss=0.09975, over 21328.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01057, ecapa_loss=0.0001452, whisper_loss=0.08956, over 3872981.02 frames. ], batch size: 87, lr: 2.18e-03, grad_scale: 1.152921504606847e+18 2024-08-18 21:33:53,963 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 21 from LS+wenet, 36 from Vox, 36 fro AS 2024-08-18 21:33:57,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4110080.0, ans=0.0 2024-08-18 21:34:02,277 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.807e+01 2.316e+01 2.554e+01 2.819e+01 9.199e+01, threshold=5.108e+01, percent-clipped=2.0 2024-08-18 21:34:20,353 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-18 21:34:24,647 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-18 21:34:26,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4110280.0, ans=0.0 2024-08-18 21:34:27,321 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-18 21:34:34,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4110380.0, ans=0.07 2024-08-18 21:34:49,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4110480.0, ans=0.125 2024-08-18 21:34:50,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4110480.0, ans=0.125 2024-08-18 21:34:59,338 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 9950, loss[loss=0.09409, beats_loss=0.01264, ecapa_loss=0.0001327, whisper_loss=0.08012, over 21946.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01054, ecapa_loss=0.0001448, whisper_loss=0.09002, over 3870083.03 frames. ], batch size: 89, lr: 2.18e-03, grad_scale: 1.152921504606847e+18 2024-08-18 21:35:04,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4110580.0, ans=0.0 2024-08-18 21:35:08,309 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.74 vs. limit=22.5 2024-08-18 21:35:32,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4110780.0, ans=0.2 2024-08-18 21:35:44,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4110880.0, ans=0.0 2024-08-18 21:35:46,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4110880.0, ans=0.0 2024-08-18 21:35:51,146 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.99 vs. limit=15.0 2024-08-18 21:35:58,306 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 32 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-18 21:36:01,260 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.60 vs. limit=15.0 2024-08-18 21:36:02,241 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 15 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-18 21:36:07,199 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 10000, loss[loss=0.1193, beats_loss=0.008221, ecapa_loss=0.0001253, whisper_loss=0.1098, over 19977.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01051, ecapa_loss=0.000144, whisper_loss=0.09086, over 3910989.60 frames. ], batch size: 76, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:36:10,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4111080.0, ans=0.125 2024-08-18 21:36:19,592 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4111080.0, ans=0.04949747468305833 2024-08-18 21:36:23,694 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.836e+01 2.302e+01 2.583e+01 2.911e+01 1.277e+02, threshold=5.165e+01, percent-clipped=1.0 2024-08-18 21:36:31,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4111180.0, ans=0.125 2024-08-18 21:36:33,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4111180.0, ans=0.125 2024-08-18 21:36:34,945 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 21:36:51,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4111380.0, ans=0.0 2024-08-18 21:36:56,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4111380.0, ans=0.2 2024-08-18 21:37:03,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4111380.0, ans=0.125 2024-08-18 21:37:06,247 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.17 vs. limit=15.0 2024-08-18 21:37:09,278 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 25 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-18 21:37:21,242 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 10050, loss[loss=0.08455, beats_loss=0.01259, ecapa_loss=0.0001494, whisper_loss=0.07046, over 18127.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01046, ecapa_loss=0.0001433, whisper_loss=0.09045, over 3878987.61 frames. ], batch size: 76, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:37:41,868 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 21:37:41,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4111680.0, ans=0.125 2024-08-18 21:37:45,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4111680.0, ans=0.125 2024-08-18 21:37:47,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4111680.0, ans=0.125 2024-08-18 21:37:55,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4111780.0, ans=0.2 2024-08-18 21:37:58,888 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 19 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-18 21:38:08,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4111880.0, ans=0.125 2024-08-18 21:38:17,566 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=15.20 vs. limit=15.0 2024-08-18 21:38:19,191 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.50 vs. limit=12.0 2024-08-18 21:38:29,387 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.44 vs. limit=5.0 2024-08-18 21:38:30,890 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 10100, loss[loss=0.08703, beats_loss=0.01239, ecapa_loss=0.0001556, whisper_loss=0.07309, over 20746.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01058, ecapa_loss=0.0001442, whisper_loss=0.0899, over 3891899.09 frames. ], batch size: 90, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:38:35,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4112080.0, ans=0.2 2024-08-18 21:38:40,307 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.10 vs. limit=12.0 2024-08-18 21:38:44,100 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-18 21:38:46,460 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+01 2.328e+01 2.605e+01 3.008e+01 2.431e+02, threshold=5.209e+01, percent-clipped=1.0 2024-08-18 21:38:52,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4112180.0, ans=0.07 2024-08-18 21:39:02,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4112280.0, ans=0.0 2024-08-18 21:39:05,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4112280.0, ans=0.0 2024-08-18 21:39:06,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4112280.0, ans=0.125 2024-08-18 21:39:36,999 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 10150, loss[loss=0.1084, beats_loss=0.007027, ecapa_loss=0.00019, whisper_loss=0.09949, over 14646.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01053, ecapa_loss=0.0001443, whisper_loss=0.09016, over 3882352.82 frames. ], batch size: 57, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:39:48,042 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-18 21:39:53,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4112680.0, ans=0.0 2024-08-18 21:39:55,173 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.16 vs. limit=15.0 2024-08-18 21:39:55,340 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.35 vs. limit=15.0 2024-08-18 21:40:03,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4112780.0, ans=0.0 2024-08-18 21:40:05,155 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.38 vs. limit=22.5 2024-08-18 21:40:07,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4112780.0, ans=0.1 2024-08-18 21:40:07,471 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-18 21:40:09,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4112780.0, ans=0.0 2024-08-18 21:40:12,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4112780.0, ans=0.125 2024-08-18 21:40:22,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4112880.0, ans=0.07 2024-08-18 21:40:26,647 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.73 vs. limit=10.0 2024-08-18 21:40:27,305 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-18 21:40:28,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4112980.0, ans=0.0 2024-08-18 21:40:38,749 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-18 21:40:39,258 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.25 vs. limit=12.0 2024-08-18 21:40:39,353 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.94 vs. limit=15.0 2024-08-18 21:40:42,575 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 10200, loss[loss=0.1063, beats_loss=0.01117, ecapa_loss=0.0001327, whisper_loss=0.09383, over 19000.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0105, ecapa_loss=0.0001436, whisper_loss=0.0904, over 3865156.00 frames. ], batch size: 77, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:40:54,985 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-18 21:40:57,268 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.883e+01 2.349e+01 2.556e+01 2.919e+01 5.340e+01, threshold=5.112e+01, percent-clipped=2.0 2024-08-18 21:41:19,223 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 23 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-18 21:41:21,572 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-18 21:41:27,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4113380.0, ans=0.2 2024-08-18 21:41:36,217 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 31 from Vox, 27 fro AS 2024-08-18 21:41:40,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4113480.0, ans=0.125 2024-08-18 21:41:41,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4113480.0, ans=10.0 2024-08-18 21:41:42,992 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 23 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-18 21:41:44,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4113480.0, ans=0.125 2024-08-18 21:41:49,161 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 10250, loss[loss=0.1292, beats_loss=0.008234, ecapa_loss=0.0001462, whisper_loss=0.1195, over 15962.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01047, ecapa_loss=0.0001441, whisper_loss=0.09085, over 3865992.54 frames. ], batch size: 57, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:41:51,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4113580.0, ans=0.1 2024-08-18 21:41:57,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4113580.0, ans=0.0 2024-08-18 21:42:06,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4113680.0, ans=0.0 2024-08-18 21:42:11,148 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 37 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-18 21:42:27,073 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 33 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-18 21:42:30,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4113880.0, ans=0.07 2024-08-18 21:42:47,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4113980.0, ans=0.1 2024-08-18 21:42:54,312 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 10300, loss[loss=0.08502, beats_loss=0.01383, ecapa_loss=0.0001243, whisper_loss=0.06995, over 22515.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01049, ecapa_loss=0.0001447, whisper_loss=0.09046, over 3868489.16 frames. ], batch size: 96, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:42:57,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4114080.0, ans=0.1 2024-08-18 21:43:08,360 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.322e+01 2.569e+01 2.887e+01 6.847e+01, threshold=5.137e+01, percent-clipped=1.0 2024-08-18 21:43:12,548 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4114180.0, ans=0.125 2024-08-18 21:43:15,191 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.97 vs. limit=12.0 2024-08-18 21:43:17,298 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-18 21:43:26,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4114280.0, ans=0.125 2024-08-18 21:43:32,434 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 29 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-18 21:43:51,720 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-18 21:43:57,773 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 10350, loss[loss=0.1127, beats_loss=0.009759, ecapa_loss=0.0001563, whisper_loss=0.1013, over 18794.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0105, ecapa_loss=0.0001438, whisper_loss=0.09063, over 3882243.53 frames. ], batch size: 76, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:44:09,869 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4114680.0, ans=0.05 2024-08-18 21:44:13,603 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 29 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-18 21:44:23,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4114780.0, ans=0.0 2024-08-18 21:44:33,270 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 14 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-18 21:44:41,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4114880.0, ans=0.125 2024-08-18 21:45:03,376 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 10400, loss[loss=0.09139, beats_loss=0.0122, ecapa_loss=0.0001644, whisper_loss=0.07754, over 14443.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01051, ecapa_loss=0.0001444, whisper_loss=0.09011, over 3836201.84 frames. ], batch size: 59, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:45:10,974 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 14 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-18 21:45:16,436 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 14 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-18 21:45:17,374 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.312e+01 2.463e+01 2.686e+01 5.077e+01, threshold=4.927e+01, percent-clipped=0.0 2024-08-18 21:45:22,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=4115180.0, ans=15.0 2024-08-18 21:45:32,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4115280.0, ans=0.125 2024-08-18 21:45:49,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4115380.0, ans=0.125 2024-08-18 21:45:54,582 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 23 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-18 21:45:57,210 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 27 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-18 21:45:59,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4115480.0, ans=0.035 2024-08-18 21:46:08,134 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 10450, loss[loss=0.126, beats_loss=0.007669, ecapa_loss=0.0001278, whisper_loss=0.1171, over 16205.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01044, ecapa_loss=0.0001451, whisper_loss=0.0903, over 3863849.70 frames. ], batch size: 61, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:46:19,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4115680.0, ans=0.1 2024-08-18 21:46:26,076 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 23 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-18 21:46:26,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=4115680.0, ans=15.0 2024-08-18 21:46:33,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4115780.0, ans=0.125 2024-08-18 21:46:46,856 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-18 21:46:48,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4115880.0, ans=0.1 2024-08-18 21:46:50,840 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-18 21:46:57,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4115880.0, ans=0.2 2024-08-18 21:46:59,020 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 31 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-18 21:47:03,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4115980.0, ans=0.07 2024-08-18 21:47:14,638 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 10500, loss[loss=0.09815, beats_loss=0.01335, ecapa_loss=0.0001407, whisper_loss=0.08339, over 22880.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01042, ecapa_loss=0.0001448, whisper_loss=0.09025, over 3838106.95 frames. ], batch size: 93, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:47:24,594 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.91 vs. limit=10.0 2024-08-18 21:47:28,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4116180.0, ans=0.1 2024-08-18 21:47:29,474 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.013e+01 2.302e+01 2.513e+01 2.856e+01 4.600e+01, threshold=5.027e+01, percent-clipped=0.0 2024-08-18 21:47:29,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=4116180.0, ans=0.025 2024-08-18 21:47:34,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4116180.0, ans=0.2 2024-08-18 21:47:40,552 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-18 21:47:54,872 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.18 vs. limit=15.0 2024-08-18 21:47:56,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4116380.0, ans=0.0 2024-08-18 21:48:14,427 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 27 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-18 21:48:25,065 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 10550, loss[loss=0.1006, beats_loss=0.009782, ecapa_loss=0.0001284, whisper_loss=0.08952, over 17220.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01035, ecapa_loss=0.0001468, whisper_loss=0.08994, over 3844405.88 frames. ], batch size: 66, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:48:43,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4116680.0, ans=0.2 2024-08-18 21:49:11,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4116880.0, ans=0.1 2024-08-18 21:49:25,006 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-18 21:49:26,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4116980.0, ans=0.04949747468305833 2024-08-18 21:49:29,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4116980.0, ans=0.0 2024-08-18 21:49:35,580 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 10600, loss[loss=0.105, beats_loss=0.01172, ecapa_loss=0.0001138, whisper_loss=0.09216, over 15693.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01038, ecapa_loss=0.0001465, whisper_loss=0.08979, over 3848770.04 frames. ], batch size: 60, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:49:50,613 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.667e+01 2.337e+01 2.542e+01 2.897e+01 3.687e+01, threshold=5.085e+01, percent-clipped=0.0 2024-08-18 21:49:51,256 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2024-08-18 21:49:52,241 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 17 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-18 21:49:52,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4117180.0, ans=0.05 2024-08-18 21:50:08,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4117280.0, ans=0.125 2024-08-18 21:50:23,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4117380.0, ans=0.0 2024-08-18 21:50:31,128 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.88 vs. limit=12.0 2024-08-18 21:50:32,188 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.05 vs. limit=15.0 2024-08-18 21:50:43,764 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 10650, loss[loss=0.1006, beats_loss=0.01079, ecapa_loss=9.011e-05, whisper_loss=0.08886, over 14967.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01033, ecapa_loss=0.0001456, whisper_loss=0.09057, over 3853775.38 frames. ], batch size: 55, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:50:50,668 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-18 21:50:52,746 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=19.21 vs. limit=22.5 2024-08-18 21:50:55,950 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-18 21:51:13,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4117780.0, ans=0.125 2024-08-18 21:51:13,126 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.864e+01 2024-08-18 21:51:14,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4117780.0, ans=0.125 2024-08-18 21:51:27,083 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 23 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-18 21:51:40,701 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.11 vs. limit=22.5 2024-08-18 21:51:46,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4117980.0, ans=0.0 2024-08-18 21:51:51,056 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 10700, loss[loss=0.1071, beats_loss=0.01125, ecapa_loss=0.0001419, whisper_loss=0.09444, over 19317.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01036, ecapa_loss=0.0001453, whisper_loss=0.09091, over 3861689.36 frames. ], batch size: 77, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:52:05,510 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.628e+01 2.357e+01 2.585e+01 2.828e+01 1.514e+02, threshold=5.170e+01, percent-clipped=1.0 2024-08-18 21:52:12,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4118180.0, ans=0.2 2024-08-18 21:52:15,594 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 24 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-18 21:52:16,803 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 31 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-18 21:52:21,468 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=11.03 vs. limit=12.0 2024-08-18 21:52:28,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4118380.0, ans=0.015 2024-08-18 21:52:34,128 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-18 21:52:35,725 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.98 vs. limit=15.0 2024-08-18 21:52:51,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4118480.0, ans=0.1 2024-08-18 21:52:52,623 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4118480.0, ans=0.125 2024-08-18 21:52:56,091 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 10750, loss[loss=0.09142, beats_loss=0.01184, ecapa_loss=0.0001296, whisper_loss=0.07828, over 21774.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01039, ecapa_loss=0.0001439, whisper_loss=0.0917, over 3885461.62 frames. ], batch size: 88, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:53:04,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4118580.0, ans=0.0 2024-08-18 21:53:06,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4118580.0, ans=0.125 2024-08-18 21:53:06,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4118580.0, ans=0.2 2024-08-18 21:53:07,080 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 38 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-18 21:53:07,586 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.04 vs. limit=10.0 2024-08-18 21:53:10,797 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 37 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-18 21:53:11,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4118680.0, ans=0.0 2024-08-18 21:53:19,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4118680.0, ans=0.2 2024-08-18 21:53:26,072 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 16 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-18 21:53:28,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4118780.0, ans=0.2 2024-08-18 21:53:28,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4118780.0, ans=0.125 2024-08-18 21:53:35,028 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 27 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-18 21:53:38,330 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 21:53:42,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4118880.0, ans=0.0 2024-08-18 21:53:42,384 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.67 vs. limit=15.0 2024-08-18 21:54:01,142 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 10800, loss[loss=0.09606, beats_loss=0.01184, ecapa_loss=0.0001128, whisper_loss=0.08309, over 18657.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0104, ecapa_loss=0.0001443, whisper_loss=0.09177, over 3884635.22 frames. ], batch size: 71, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:54:08,654 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.57 vs. limit=22.5 2024-08-18 21:54:15,505 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.266e+01 2.523e+01 2.854e+01 3.753e+01, threshold=5.046e+01, percent-clipped=0.0 2024-08-18 21:54:17,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4119180.0, ans=0.0 2024-08-18 21:54:22,519 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.81 vs. limit=22.5 2024-08-18 21:54:24,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4119180.0, ans=0.07 2024-08-18 21:54:49,546 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 25 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-18 21:54:56,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4119480.0, ans=0.125 2024-08-18 21:54:59,923 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-18 21:55:03,784 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 22 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-18 21:55:06,183 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 10850, loss[loss=0.08968, beats_loss=0.01316, ecapa_loss=0.0001057, whisper_loss=0.07546, over 15188.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01044, ecapa_loss=0.0001435, whisper_loss=0.09171, over 3911250.68 frames. ], batch size: 59, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:55:39,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4119780.0, ans=0.0 2024-08-18 21:55:39,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4119780.0, ans=0.0 2024-08-18 21:55:40,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4119780.0, ans=0.0 2024-08-18 21:55:52,519 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.88 vs. limit=15.0 2024-08-18 21:55:57,410 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 14 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-18 21:55:59,929 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-412000.pt 2024-08-18 21:56:08,068 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.36 vs. limit=12.0 2024-08-18 21:56:13,960 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 10900, loss[loss=0.1179, beats_loss=0.009795, ecapa_loss=0.000161, whisper_loss=0.1065, over 18242.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01049, ecapa_loss=0.0001424, whisper_loss=0.09078, over 3902476.05 frames. ], batch size: 73, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:56:27,484 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.04 vs. limit=15.0 2024-08-18 21:56:27,995 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.367e+01 2.602e+01 2.908e+01 4.089e+01, threshold=5.204e+01, percent-clipped=0.0 2024-08-18 21:56:34,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4120180.0, ans=0.07 2024-08-18 21:56:37,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4120180.0, ans=0.125 2024-08-18 21:57:01,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=4120380.0, ans=0.05 2024-08-18 21:57:01,423 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.71 vs. limit=22.5 2024-08-18 21:57:19,090 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 10950, loss[loss=0.09226, beats_loss=0.01157, ecapa_loss=0.0001334, whisper_loss=0.07935, over 20669.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0105, ecapa_loss=0.0001423, whisper_loss=0.09089, over 3907167.63 frames. ], batch size: 83, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:57:25,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4120580.0, ans=0.125 2024-08-18 21:57:36,033 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 31 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-18 21:57:53,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4120780.0, ans=0.125 2024-08-18 21:58:04,515 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-18 21:58:16,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4120980.0, ans=0.125 2024-08-18 21:58:18,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4120980.0, ans=0.2 2024-08-18 21:58:18,245 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.21 vs. limit=15.0 2024-08-18 21:58:21,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4120980.0, ans=0.1 2024-08-18 21:58:23,757 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 11000, loss[loss=0.1075, beats_loss=0.009214, ecapa_loss=0.0001553, whisper_loss=0.09675, over 19038.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01044, ecapa_loss=0.000143, whisper_loss=0.09122, over 3934539.43 frames. ], batch size: 77, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:58:38,414 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.764e+01 2.293e+01 2.499e+01 2.865e+01 3.776e+01, threshold=4.999e+01, percent-clipped=0.0 2024-08-18 21:58:55,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4121280.0, ans=0.125 2024-08-18 21:59:16,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4121480.0, ans=0.125 2024-08-18 21:59:28,276 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 11050, loss[loss=0.1283, beats_loss=0.007618, ecapa_loss=0.0001439, whisper_loss=0.1192, over 23006.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01038, ecapa_loss=0.0001429, whisper_loss=0.09094, over 3904580.93 frames. ], batch size: 88, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:59:30,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4121580.0, ans=0.1 2024-08-18 21:59:32,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4121580.0, ans=0.2 2024-08-18 21:59:44,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4121680.0, ans=0.0 2024-08-18 21:59:49,483 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 24 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-18 21:59:54,663 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 33 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-18 21:59:57,243 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 22 from LS+wenet, 34 from Vox, 32 fro AS 2024-08-18 22:00:04,092 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.65 vs. limit=15.0 2024-08-18 22:00:08,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4121880.0, ans=0.0 2024-08-18 22:00:14,066 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.59 vs. limit=22.5 2024-08-18 22:00:21,955 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 20 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-18 22:00:28,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4121980.0, ans=0.2 2024-08-18 22:00:33,541 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 11100, loss[loss=0.1144, beats_loss=0.008041, ecapa_loss=0.0001813, whisper_loss=0.1046, over 19526.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01036, ecapa_loss=0.0001426, whisper_loss=0.0911, over 3899242.76 frames. ], batch size: 81, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 22:00:35,213 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 31 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-18 22:00:36,578 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-18 22:00:47,895 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.978e+01 2.366e+01 2.564e+01 2.884e+01 4.228e+01, threshold=5.128e+01, percent-clipped=0.0 2024-08-18 22:00:54,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4122180.0, ans=0.125 2024-08-18 22:01:16,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4122380.0, ans=0.1 2024-08-18 22:01:36,072 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.01 vs. limit=15.0 2024-08-18 22:01:39,083 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 11150, loss[loss=0.1197, beats_loss=0.008498, ecapa_loss=0.0001494, whisper_loss=0.1097, over 20520.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0103, ecapa_loss=0.0001422, whisper_loss=0.09115, over 3886980.37 frames. ], batch size: 78, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 22:01:57,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4122680.0, ans=0.0 2024-08-18 22:02:15,545 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 21 from LS+wenet, 10 from Vox, 23 fro AS 2024-08-18 22:02:15,762 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4122780.0, ans=0.125 2024-08-18 22:02:19,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4122880.0, ans=0.125 2024-08-18 22:02:27,541 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 22:02:35,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4122980.0, ans=0.0 2024-08-18 22:02:43,781 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 11200, loss[loss=0.08517, beats_loss=0.01213, ecapa_loss=0.0001432, whisper_loss=0.0716, over 13077.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01033, ecapa_loss=0.0001421, whisper_loss=0.09081, over 3860379.52 frames. ], batch size: 54, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 22:02:44,781 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.64 vs. limit=15.0 2024-08-18 22:02:58,231 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.383e+01 2.625e+01 2.814e+01 6.266e+01, threshold=5.250e+01, percent-clipped=1.0 2024-08-18 22:03:25,389 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 26 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-18 22:03:38,003 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.08 vs. limit=15.0 2024-08-18 22:03:41,155 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 16 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-18 22:03:48,698 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 11250, loss[loss=0.08704, beats_loss=0.01292, ecapa_loss=0.0001241, whisper_loss=0.07288, over 20523.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01035, ecapa_loss=0.0001418, whisper_loss=0.09084, over 3860669.59 frames. ], batch size: 83, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 22:04:08,875 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.27 vs. limit=15.0 2024-08-18 22:04:11,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4123680.0, ans=0.0 2024-08-18 22:04:15,056 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 17 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-18 22:04:20,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4123780.0, ans=0.125 2024-08-18 22:04:42,374 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-18 22:04:46,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4123980.0, ans=0.125 2024-08-18 22:04:50,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4123980.0, ans=0.125 2024-08-18 22:04:53,729 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 11300, loss[loss=0.1099, beats_loss=0.0101, ecapa_loss=0.0001405, whisper_loss=0.09838, over 20878.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01034, ecapa_loss=0.0001414, whisper_loss=0.09119, over 3888973.35 frames. ], batch size: 85, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 22:05:02,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=4124080.0, ans=0.2 2024-08-18 22:05:06,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4124180.0, ans=0.1 2024-08-18 22:05:07,792 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.238e+01 2.501e+01 2.783e+01 2.394e+02, threshold=5.001e+01, percent-clipped=1.0 2024-08-18 22:05:12,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4124180.0, ans=0.025 2024-08-18 22:05:17,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4124180.0, ans=0.125 2024-08-18 22:05:17,651 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.02 vs. limit=22.5 2024-08-18 22:05:18,246 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-18 22:05:28,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4124280.0, ans=0.125 2024-08-18 22:05:49,367 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 22 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-18 22:05:52,648 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.08 vs. limit=6.0 2024-08-18 22:05:58,561 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 11350, loss[loss=0.1012, beats_loss=0.011, ecapa_loss=0.0001407, whisper_loss=0.0888, over 22954.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01022, ecapa_loss=0.0001417, whisper_loss=0.09218, over 3891289.33 frames. ], batch size: 94, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 22:06:01,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4124580.0, ans=0.125 2024-08-18 22:06:02,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4124580.0, ans=0.1 2024-08-18 22:06:06,495 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-18 22:06:08,388 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.22 vs. limit=15.0 2024-08-18 22:06:11,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4124680.0, ans=0.0 2024-08-18 22:06:25,155 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-18 22:06:26,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4124780.0, ans=0.125 2024-08-18 22:06:31,558 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-18 22:06:45,414 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.765e+05 2024-08-18 22:06:50,057 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 13 from Vox, 43 fro AS 2024-08-18 22:07:04,821 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 11400, loss[loss=0.1058, beats_loss=0.009483, ecapa_loss=0.000173, whisper_loss=0.09461, over 20061.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01024, ecapa_loss=0.0001421, whisper_loss=0.09213, over 3891821.38 frames. ], batch size: 85, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 22:07:12,435 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 18 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-18 22:07:18,933 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.937e+01 2.399e+01 2.610e+01 2.996e+01 4.711e+01, threshold=5.221e+01, percent-clipped=0.0 2024-08-18 22:07:19,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4125180.0, ans=0.125 2024-08-18 22:07:33,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=4125280.0, ans=0.2 2024-08-18 22:07:34,022 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.10 vs. limit=15.0 2024-08-18 22:07:35,068 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-18 22:07:55,602 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 19 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-18 22:08:09,661 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 11450, loss[loss=0.09861, beats_loss=0.007064, ecapa_loss=0.0001435, whisper_loss=0.09012, over 18642.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01028, ecapa_loss=0.0001428, whisper_loss=0.09178, over 3857270.75 frames. ], batch size: 72, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 22:08:16,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4125580.0, ans=0.09899494936611666 2024-08-18 22:08:22,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4125680.0, ans=0.125 2024-08-18 22:08:37,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4125780.0, ans=0.1 2024-08-18 22:08:40,533 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.45 vs. limit=15.0 2024-08-18 22:08:48,975 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 24 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-18 22:08:50,292 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-18 22:08:51,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4125880.0, ans=0.125 2024-08-18 22:09:08,821 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4125980.0, ans=0.125 2024-08-18 22:09:11,732 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2024-08-18 22:09:13,748 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-18 22:09:15,161 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 11500, loss[loss=0.1072, beats_loss=0.01038, ecapa_loss=0.0001388, whisper_loss=0.09542, over 23239.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01025, ecapa_loss=0.0001418, whisper_loss=0.09245, over 3863044.97 frames. ], batch size: 91, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 22:09:24,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4126080.0, ans=0.1 2024-08-18 22:09:27,696 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.11 vs. limit=15.0 2024-08-18 22:09:29,383 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.741e+01 2.337e+01 2.548e+01 2.823e+01 1.618e+02, threshold=5.097e+01, percent-clipped=1.0 2024-08-18 22:09:30,837 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 25 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-18 22:09:33,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4126180.0, ans=0.125 2024-08-18 22:09:37,968 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4126180.0, ans=0.0 2024-08-18 22:09:44,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4126280.0, ans=0.0 2024-08-18 22:09:48,208 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 31 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-18 22:09:49,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4126280.0, ans=0.125 2024-08-18 22:10:07,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4126480.0, ans=0.125 2024-08-18 22:10:10,301 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-18 22:10:13,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4126480.0, ans=0.1 2024-08-18 22:10:21,113 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 11550, loss[loss=0.1055, beats_loss=0.008975, ecapa_loss=0.0001514, whisper_loss=0.09501, over 19200.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01021, ecapa_loss=0.0001428, whisper_loss=0.09298, over 3909705.88 frames. ], batch size: 73, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 22:10:44,390 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-18 22:10:53,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4126780.0, ans=0.125 2024-08-18 22:10:59,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4126780.0, ans=0.125 2024-08-18 22:11:16,891 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-18 22:11:21,548 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 33 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-18 22:11:23,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4126980.0, ans=0.0 2024-08-18 22:11:23,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4126980.0, ans=0.1 2024-08-18 22:11:29,450 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 11600, loss[loss=0.09305, beats_loss=0.01229, ecapa_loss=0.0001701, whisper_loss=0.07906, over 17590.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01033, ecapa_loss=0.0001434, whisper_loss=0.09189, over 3901821.89 frames. ], batch size: 76, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 22:11:34,074 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.29 vs. limit=15.0 2024-08-18 22:11:36,763 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 20 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-18 22:11:39,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4127080.0, ans=0.1 2024-08-18 22:11:44,869 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.928e+01 2.392e+01 2.631e+01 2.895e+01 1.114e+02, threshold=5.261e+01, percent-clipped=3.0 2024-08-18 22:11:46,654 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.76 vs. limit=22.5 2024-08-18 22:11:53,705 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 32 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-18 22:11:58,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4127280.0, ans=0.1 2024-08-18 22:12:02,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4127280.0, ans=0.0 2024-08-18 22:12:04,701 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 26 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-18 22:12:08,386 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.43 vs. limit=6.0 2024-08-18 22:12:23,292 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 20 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-18 22:12:25,743 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-18 22:12:28,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=4127480.0, ans=0.05 2024-08-18 22:12:39,334 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 11650, loss[loss=0.1158, beats_loss=0.00948, ecapa_loss=0.0001225, whisper_loss=0.105, over 22878.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01041, ecapa_loss=0.0001437, whisper_loss=0.09146, over 3926602.10 frames. ], batch size: 88, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 22:12:43,594 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 14 from Vox, 47 fro AS 2024-08-18 22:12:52,823 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 24 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-18 22:13:12,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4127780.0, ans=0.125 2024-08-18 22:13:21,143 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 28 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-18 22:13:25,292 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 26 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-18 22:13:37,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4127980.0, ans=0.125 2024-08-18 22:13:41,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4127980.0, ans=0.2 2024-08-18 22:13:43,928 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 11700, loss[loss=0.1137, beats_loss=0.0111, ecapa_loss=0.0001213, whisper_loss=0.1014, over 16054.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01047, ecapa_loss=0.0001437, whisper_loss=0.09158, over 3969314.83 frames. ], batch size: 62, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 22:13:48,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4128080.0, ans=0.0 2024-08-18 22:13:53,229 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-18 22:13:58,080 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.372e+01 2.671e+01 2.866e+01 2.586e+02, threshold=5.342e+01, percent-clipped=1.0 2024-08-18 22:13:58,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=4128180.0, ans=0.025 2024-08-18 22:14:05,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4128180.0, ans=0.0 2024-08-18 22:14:11,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4128280.0, ans=0.2 2024-08-18 22:14:19,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4128280.0, ans=0.0 2024-08-18 22:14:20,317 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 32 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-18 22:14:28,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4128380.0, ans=0.125 2024-08-18 22:14:28,325 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.71 vs. limit=10.0 2024-08-18 22:14:32,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4128380.0, ans=0.125 2024-08-18 22:14:43,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4128480.0, ans=0.0 2024-08-18 22:14:47,900 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 11750, loss[loss=0.1048, beats_loss=0.01114, ecapa_loss=0.0001411, whisper_loss=0.0922, over 22854.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01063, ecapa_loss=0.0001433, whisper_loss=0.09045, over 3976078.74 frames. ], batch size: 94, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:14:53,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4128580.0, ans=0.1 2024-08-18 22:14:58,029 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-18 22:15:07,606 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2024-08-18 22:15:08,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4128680.0, ans=0.2 2024-08-18 22:15:10,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4128680.0, ans=0.0 2024-08-18 22:15:21,250 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.29 vs. limit=15.0 2024-08-18 22:15:33,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4128880.0, ans=0.1 2024-08-18 22:15:43,236 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-18 22:15:50,774 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 11800, loss[loss=0.1, beats_loss=0.01212, ecapa_loss=0.0001335, whisper_loss=0.08657, over 20216.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0106, ecapa_loss=0.0001433, whisper_loss=0.09076, over 3994952.86 frames. ], batch size: 82, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:15:52,448 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-18 22:15:57,563 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-18 22:16:00,055 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 24 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-18 22:16:05,133 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-18 22:16:06,191 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.286e+01 2.543e+01 2.807e+01 3.749e+01, threshold=5.086e+01, percent-clipped=0.0 2024-08-18 22:16:06,420 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 32 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-18 22:16:18,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4129280.0, ans=0.125 2024-08-18 22:16:25,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4129280.0, ans=0.04949747468305833 2024-08-18 22:16:31,172 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.20 vs. limit=22.5 2024-08-18 22:16:33,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4129380.0, ans=0.125 2024-08-18 22:16:38,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4129380.0, ans=0.125 2024-08-18 22:16:55,182 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 11850, loss[loss=0.08852, beats_loss=0.01261, ecapa_loss=0.0001463, whisper_loss=0.07445, over 20730.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01063, ecapa_loss=0.0001431, whisper_loss=0.09052, over 3950677.83 frames. ], batch size: 87, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:17:00,210 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 16 from LS+wenet, 30 from Vox, 44 fro AS 2024-08-18 22:17:08,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4129680.0, ans=0.2 2024-08-18 22:17:24,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4129780.0, ans=0.1 2024-08-18 22:17:35,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4129880.0, ans=0.1 2024-08-18 22:17:38,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4129880.0, ans=0.125 2024-08-18 22:17:41,055 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.16 vs. limit=10.0 2024-08-18 22:17:47,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4129980.0, ans=0.125 2024-08-18 22:17:47,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4129980.0, ans=0.125 2024-08-18 22:17:59,273 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 11900, loss[loss=0.108, beats_loss=0.01054, ecapa_loss=0.0001434, whisper_loss=0.096, over 21283.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01061, ecapa_loss=0.0001432, whisper_loss=0.09066, over 3954662.71 frames. ], batch size: 84, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:18:00,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4130080.0, ans=0.125 2024-08-18 22:18:05,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4130080.0, ans=0.125 2024-08-18 22:18:14,414 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.051e+01 2.305e+01 2.478e+01 2.772e+01 3.965e+01, threshold=4.956e+01, percent-clipped=0.0 2024-08-18 22:18:32,825 INFO [train_multi_KD3.py:844] (0/4) A total of 97 cuts. 31 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-18 22:18:43,579 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.39 vs. limit=15.0 2024-08-18 22:18:54,099 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.06 vs. limit=15.0 2024-08-18 22:18:59,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4130480.0, ans=0.125 2024-08-18 22:19:02,585 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.02 vs. limit=15.0 2024-08-18 22:19:03,304 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 11950, loss[loss=0.08939, beats_loss=0.01022, ecapa_loss=0.0001661, whisper_loss=0.07752, over 18511.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01058, ecapa_loss=0.000142, whisper_loss=0.09124, over 3962962.63 frames. ], batch size: 75, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:19:10,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4130580.0, ans=0.0 2024-08-18 22:19:14,240 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-18 22:19:20,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4130680.0, ans=0.025 2024-08-18 22:19:29,203 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 29 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-18 22:19:39,404 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 24 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-18 22:19:43,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4130880.0, ans=0.035 2024-08-18 22:19:49,959 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.63 vs. limit=10.0 2024-08-18 22:19:55,532 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 23 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-18 22:20:06,934 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 12000, loss[loss=0.1141, beats_loss=0.01181, ecapa_loss=0.0001367, whisper_loss=0.1009, over 19866.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01053, ecapa_loss=0.0001422, whisper_loss=0.09104, over 3939580.04 frames. ], batch size: 78, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:20:06,935 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-18 22:20:47,546 INFO [train_multi_KD3.py:1149] (0/4) Epoch 28, validation on ASR_libri: loss=0.2535, beats_loss=0, ecapa_loss=0.0005126, whisper_loss=0.2484, over 922467.00 frames. 2024-08-18 22:21:06,712 INFO [train_multi_KD3.py:1149] (0/4) Epoch 28, validation on SV_voxceleb1: loss=0.004051, beats_loss=0, ecapa_loss=0.0004051, whisper_loss=0, over 939242.00 frames. 2024-08-18 22:22:55,501 INFO [train_multi_KD3.py:1149] (0/4) Epoch 28, validation on AT_audioset: loss=0.02313, beats_loss=0.02313, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 22:22:55,506 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32389MB 2024-08-18 22:23:06,204 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.27 vs. limit=10.0 2024-08-18 22:23:11,593 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.800e+01 2.291e+01 2.547e+01 2.883e+01 4.329e+01, threshold=5.094e+01, percent-clipped=0.0 2024-08-18 22:23:36,584 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 13 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-18 22:23:43,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4131380.0, ans=0.09899494936611666 2024-08-18 22:23:46,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4131480.0, ans=0.0 2024-08-18 22:23:51,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4131480.0, ans=0.1 2024-08-18 22:23:59,410 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 12050, loss[loss=0.1193, beats_loss=0.01085, ecapa_loss=0.0001285, whisper_loss=0.1072, over 20620.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01056, ecapa_loss=0.0001423, whisper_loss=0.09, over 3899885.07 frames. ], batch size: 77, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:24:10,078 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 23 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-18 22:24:35,239 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 27 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-18 22:24:38,505 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.81 vs. limit=15.0 2024-08-18 22:24:45,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4131880.0, ans=0.07 2024-08-18 22:24:48,010 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 15 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-18 22:25:03,145 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 12100, loss[loss=0.1067, beats_loss=0.009399, ecapa_loss=0.0001337, whisper_loss=0.09593, over 21834.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01059, ecapa_loss=0.0001421, whisper_loss=0.08946, over 3911565.01 frames. ], batch size: 88, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:25:07,461 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.10 vs. limit=15.0 2024-08-18 22:25:17,901 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.763e+01 2.242e+01 2.477e+01 2.714e+01 3.521e+01, threshold=4.955e+01, percent-clipped=0.0 2024-08-18 22:25:19,210 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-18 22:25:22,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4132180.0, ans=0.0 2024-08-18 22:25:37,737 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.78 vs. limit=15.0 2024-08-18 22:25:48,306 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 23 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-18 22:25:54,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4132480.0, ans=0.125 2024-08-18 22:26:09,541 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 12150, loss[loss=0.1155, beats_loss=0.01147, ecapa_loss=0.0001358, whisper_loss=0.1027, over 21599.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01049, ecapa_loss=0.0001429, whisper_loss=0.0901, over 3895525.12 frames. ], batch size: 87, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:26:18,513 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-18 22:26:27,001 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-18 22:26:32,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4132680.0, ans=0.125 2024-08-18 22:26:55,484 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 29 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-18 22:26:55,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4132780.0, ans=0.2 2024-08-18 22:27:29,609 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.441e-01 2024-08-18 22:27:31,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4132980.0, ans=0.0 2024-08-18 22:27:40,520 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 12200, loss[loss=0.1164, beats_loss=0.008756, ecapa_loss=0.0001383, whisper_loss=0.1063, over 20685.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01046, ecapa_loss=0.0001433, whisper_loss=0.09053, over 3873671.05 frames. ], batch size: 82, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:27:41,938 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-18 22:27:59,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4133080.0, ans=0.1 2024-08-18 22:28:02,389 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.12 vs. limit=15.0 2024-08-18 22:28:07,544 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.413e+01 2.660e+01 2.941e+01 4.822e+01, threshold=5.320e+01, percent-clipped=0.0 2024-08-18 22:28:10,425 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 17 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-18 22:29:02,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4133380.0, ans=0.1 2024-08-18 22:29:08,457 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.86 vs. limit=10.0 2024-08-18 22:29:20,703 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.33 vs. limit=15.0 2024-08-18 22:29:28,663 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 12250, loss[loss=0.1079, beats_loss=0.01037, ecapa_loss=0.0001466, whisper_loss=0.09604, over 20782.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01041, ecapa_loss=0.0001429, whisper_loss=0.09095, over 3869254.65 frames. ], batch size: 83, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:29:51,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4133680.0, ans=0.2 2024-08-18 22:30:03,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4133680.0, ans=0.125 2024-08-18 22:30:11,707 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.13 vs. limit=12.0 2024-08-18 22:30:22,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4133780.0, ans=0.125 2024-08-18 22:30:41,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4133880.0, ans=0.2 2024-08-18 22:31:02,379 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 12300, loss[loss=0.1111, beats_loss=0.007619, ecapa_loss=0.0001806, whisper_loss=0.1016, over 18598.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01035, ecapa_loss=0.0001434, whisper_loss=0.09137, over 3897784.36 frames. ], batch size: 76, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:31:19,462 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.309e+01 2.558e+01 2.931e+01 4.229e+01, threshold=5.115e+01, percent-clipped=0.0 2024-08-18 22:31:31,336 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.00 vs. limit=22.5 2024-08-18 22:31:32,140 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-18 22:31:37,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4134280.0, ans=0.0 2024-08-18 22:31:49,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4134380.0, ans=0.125 2024-08-18 22:31:59,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4134380.0, ans=0.0 2024-08-18 22:32:08,643 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2024-08-18 22:32:16,006 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 12350, loss[loss=0.07027, beats_loss=0.01205, ecapa_loss=0.0001524, whisper_loss=0.05669, over 14227.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01028, ecapa_loss=0.0001449, whisper_loss=0.09169, over 3902857.00 frames. ], batch size: 57, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:32:16,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4134580.0, ans=0.0 2024-08-18 22:32:29,962 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-18 22:32:31,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4134680.0, ans=0.125 2024-08-18 22:32:32,798 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-18 22:32:37,716 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 26 from Vox, 22 fro AS 2024-08-18 22:32:39,029 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 19 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-18 22:32:41,808 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-18 22:32:46,282 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-18 22:33:10,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4134880.0, ans=0.125 2024-08-18 22:33:14,238 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 26 from Vox, 20 fro AS 2024-08-18 22:33:17,730 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-18 22:33:27,344 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 14 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-18 22:33:33,577 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 12400, loss[loss=0.08635, beats_loss=0.01237, ecapa_loss=0.0001256, whisper_loss=0.07272, over 22774.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01027, ecapa_loss=0.0001449, whisper_loss=0.09173, over 3920697.56 frames. ], batch size: 93, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:33:49,888 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.20 vs. limit=15.0 2024-08-18 22:33:52,686 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.330e+01 2.647e+01 2.859e+01 4.171e+01, threshold=5.294e+01, percent-clipped=0.0 2024-08-18 22:34:07,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4135280.0, ans=0.125 2024-08-18 22:34:10,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4135280.0, ans=0.0 2024-08-18 22:34:13,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4135280.0, ans=0.0 2024-08-18 22:34:50,929 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 12450, loss[loss=0.1053, beats_loss=0.008595, ecapa_loss=0.0001742, whisper_loss=0.09495, over 17462.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01024, ecapa_loss=0.000145, whisper_loss=0.09177, over 3934802.41 frames. ], batch size: 71, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:34:59,955 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-18 22:35:00,526 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.67 vs. limit=15.0 2024-08-18 22:35:10,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4135680.0, ans=0.1 2024-08-18 22:35:13,643 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 35 from Vox, 30 fro AS 2024-08-18 22:35:13,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4135680.0, ans=0.125 2024-08-18 22:35:20,032 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.11 vs. limit=15.0 2024-08-18 22:35:26,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4135780.0, ans=0.2 2024-08-18 22:35:43,802 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 22:36:07,680 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 12500, loss[loss=0.07424, beats_loss=0.01003, ecapa_loss=0.0001814, whisper_loss=0.0624, over 16943.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0103, ecapa_loss=0.0001439, whisper_loss=0.09086, over 3902279.09 frames. ], batch size: 72, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:36:08,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4136080.0, ans=0.2 2024-08-18 22:36:14,338 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 25 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-18 22:36:25,088 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.340e+01 2.580e+01 2.958e+01 4.069e+01, threshold=5.161e+01, percent-clipped=0.0 2024-08-18 22:36:33,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4136180.0, ans=0.125 2024-08-18 22:36:39,392 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-18 22:36:47,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4136280.0, ans=0.0 2024-08-18 22:36:51,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4136380.0, ans=0.0 2024-08-18 22:36:53,105 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.49 vs. limit=22.5 2024-08-18 22:36:54,114 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-18 22:36:59,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4136380.0, ans=0.125 2024-08-18 22:37:20,194 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 13 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-18 22:37:21,024 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.24 vs. limit=15.0 2024-08-18 22:37:21,433 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 12550, loss[loss=0.08033, beats_loss=0.01096, ecapa_loss=0.000108, whisper_loss=0.06829, over 15569.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01033, ecapa_loss=0.0001444, whisper_loss=0.09061, over 3917648.56 frames. ], batch size: 59, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:37:23,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4136580.0, ans=0.0 2024-08-18 22:37:27,266 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-18 22:37:49,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4136780.0, ans=0.125 2024-08-18 22:37:56,234 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-18 22:38:05,758 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 28 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-18 22:38:17,885 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 24 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-18 22:38:20,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4136980.0, ans=0.125 2024-08-18 22:38:30,552 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 12600, loss[loss=0.1129, beats_loss=0.008155, ecapa_loss=0.000144, whisper_loss=0.1033, over 21777.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01037, ecapa_loss=0.0001453, whisper_loss=0.09089, over 3917603.27 frames. ], batch size: 85, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:38:46,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4137180.0, ans=0.125 2024-08-18 22:38:47,430 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.167e+01 2.472e+01 2.650e+01 1.085e+02, threshold=4.945e+01, percent-clipped=2.0 2024-08-18 22:38:47,941 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.331e+05 2024-08-18 22:38:51,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4137180.0, ans=0.1 2024-08-18 22:39:11,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4137380.0, ans=0.0 2024-08-18 22:39:16,733 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.64 vs. limit=15.0 2024-08-18 22:39:20,548 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 36 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-18 22:39:23,800 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.43 vs. limit=22.5 2024-08-18 22:39:25,468 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 28 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-18 22:39:37,669 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 12650, loss[loss=0.1065, beats_loss=0.01123, ecapa_loss=0.0001345, whisper_loss=0.09396, over 22944.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01054, ecapa_loss=0.0001443, whisper_loss=0.09016, over 3939437.57 frames. ], batch size: 92, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:39:44,663 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 28 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-18 22:39:57,528 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 37 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-18 22:40:19,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4137880.0, ans=0.125 2024-08-18 22:40:25,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4137880.0, ans=0.95 2024-08-18 22:40:31,510 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.14 vs. limit=15.0 2024-08-18 22:40:44,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4138080.0, ans=0.2 2024-08-18 22:40:45,687 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 12700, loss[loss=0.0979, beats_loss=0.01314, ecapa_loss=0.0001303, whisper_loss=0.08346, over 22151.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01058, ecapa_loss=0.0001436, whisper_loss=0.09028, over 3934074.75 frames. ], batch size: 90, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:40:54,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4138080.0, ans=0.125 2024-08-18 22:41:03,219 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.642e+01 2.392e+01 2.657e+01 3.204e+01 3.516e+02, threshold=5.313e+01, percent-clipped=1.0 2024-08-18 22:41:16,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4138280.0, ans=0.0 2024-08-18 22:41:31,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4138380.0, ans=0.125 2024-08-18 22:41:35,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=4138380.0, ans=0.05 2024-08-18 22:41:45,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4138480.0, ans=0.125 2024-08-18 22:41:54,374 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 29 from LS+wenet, 10 from Vox, 21 fro AS 2024-08-18 22:41:54,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4138580.0, ans=0.2 2024-08-18 22:41:55,492 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 12750, loss[loss=0.1361, beats_loss=0.008055, ecapa_loss=0.0001376, whisper_loss=0.1267, over 16210.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01058, ecapa_loss=0.0001431, whisper_loss=0.09089, over 3944873.94 frames. ], batch size: 60, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:42:00,293 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.59 vs. limit=15.0 2024-08-18 22:42:38,854 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-18 22:42:48,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4138980.0, ans=0.125 2024-08-18 22:43:04,012 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 12800, loss[loss=0.08409, beats_loss=0.01065, ecapa_loss=0.0001582, whisper_loss=0.07186, over 20834.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01059, ecapa_loss=0.000143, whisper_loss=0.09101, over 3934114.94 frames. ], batch size: 90, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:43:04,129 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-18 22:43:19,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4139180.0, ans=0.125 2024-08-18 22:43:20,484 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+01 2.353e+01 2.654e+01 3.011e+01 1.383e+02, threshold=5.309e+01, percent-clipped=3.0 2024-08-18 22:43:22,094 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=8.302e+00 2024-08-18 22:43:50,789 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 23 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-18 22:43:50,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4139380.0, ans=0.07 2024-08-18 22:44:10,623 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 12850, loss[loss=0.09663, beats_loss=0.01054, ecapa_loss=0.0001264, whisper_loss=0.08483, over 21859.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01062, ecapa_loss=0.0001436, whisper_loss=0.08965, over 3887452.73 frames. ], batch size: 85, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:44:13,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4139580.0, ans=0.125 2024-08-18 22:44:22,454 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-18 22:44:27,696 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 21 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-18 22:44:29,578 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.81 vs. limit=12.0 2024-08-18 22:44:42,999 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.15 vs. limit=15.0 2024-08-18 22:44:47,212 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 27 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-18 22:44:53,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4139880.0, ans=0.5 2024-08-18 22:44:57,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=4139880.0, ans=0.05 2024-08-18 22:44:59,756 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 18 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-18 22:45:18,909 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 12900, loss[loss=0.1018, beats_loss=0.01034, ecapa_loss=0.0001555, whisper_loss=0.08988, over 21993.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01065, ecapa_loss=0.0001436, whisper_loss=0.0894, over 3867471.17 frames. ], batch size: 92, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:45:33,201 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 25 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-18 22:45:35,658 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.698e+01 2.228e+01 2.406e+01 2.715e+01 2.764e+02, threshold=4.812e+01, percent-clipped=1.0 2024-08-18 22:45:36,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4140180.0, ans=0.0 2024-08-18 22:45:38,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4140180.0, ans=0.125 2024-08-18 22:45:39,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4140180.0, ans=0.125 2024-08-18 22:45:48,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4140280.0, ans=0.125 2024-08-18 22:45:50,927 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 34 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-18 22:45:51,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4140280.0, ans=0.125 2024-08-18 22:45:52,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4140280.0, ans=0.125 2024-08-18 22:46:27,726 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 12950, loss[loss=0.1048, beats_loss=0.009115, ecapa_loss=0.0001636, whisper_loss=0.09406, over 23833.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01056, ecapa_loss=0.0001445, whisper_loss=0.08922, over 3846040.07 frames. ], batch size: 94, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:46:58,071 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-18 22:47:15,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4140880.0, ans=0.125 2024-08-18 22:47:33,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4140980.0, ans=0.0 2024-08-18 22:47:35,811 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 13000, loss[loss=0.1056, beats_loss=0.01078, ecapa_loss=0.000134, whisper_loss=0.09349, over 20673.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01062, ecapa_loss=0.0001437, whisper_loss=0.08951, over 3896630.83 frames. ], batch size: 82, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:47:40,120 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 19 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-18 22:47:44,142 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-18 22:47:44,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=4141080.0, ans=0.5 2024-08-18 22:47:51,970 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+01 2.390e+01 2.751e+01 3.227e+01 4.665e+01, threshold=5.502e+01, percent-clipped=0.0 2024-08-18 22:47:53,686 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 25 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-18 22:48:08,562 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 22 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-18 22:48:32,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4141480.0, ans=0.0 2024-08-18 22:48:36,252 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 24 from LS+wenet, 19 from Vox, 15 fro AS 2024-08-18 22:48:43,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4141580.0, ans=0.125 2024-08-18 22:48:44,336 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 13050, loss[loss=0.1033, beats_loss=0.01215, ecapa_loss=0.0001339, whisper_loss=0.08977, over 22924.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01061, ecapa_loss=0.0001435, whisper_loss=0.08922, over 3865654.26 frames. ], batch size: 92, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:48:48,875 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 14 from Vox, 48 fro AS 2024-08-18 22:49:21,797 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 25 from LS+wenet, 28 from Vox, 25 fro AS 2024-08-18 22:49:22,982 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-18 22:49:41,472 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 22:49:47,254 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.009e-01 2024-08-18 22:49:49,612 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 19 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-18 22:49:51,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4141980.0, ans=10.0 2024-08-18 22:49:52,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4141980.0, ans=0.0 2024-08-18 22:49:54,074 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.88 vs. limit=22.5 2024-08-18 22:49:55,848 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 13100, loss[loss=0.115, beats_loss=0.01121, ecapa_loss=0.0001285, whisper_loss=0.1025, over 18852.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01054, ecapa_loss=0.0001426, whisper_loss=0.08971, over 3866492.05 frames. ], batch size: 72, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:50:03,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4142080.0, ans=0.0 2024-08-18 22:50:14,316 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.210e+01 2.445e+01 2.730e+01 3.721e+01, threshold=4.891e+01, percent-clipped=0.0 2024-08-18 22:50:23,076 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-18 22:50:35,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4142280.0, ans=0.125 2024-08-18 22:50:43,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4142380.0, ans=0.125 2024-08-18 22:50:45,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4142380.0, ans=0.07 2024-08-18 22:50:47,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4142380.0, ans=0.125 2024-08-18 22:51:10,357 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 13150, loss[loss=0.1099, beats_loss=0.00768, ecapa_loss=0.0001776, whisper_loss=0.1005, over 21728.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0105, ecapa_loss=0.0001423, whisper_loss=0.08993, over 3872167.75 frames. ], batch size: 87, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:51:30,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4142680.0, ans=0.125 2024-08-18 22:51:32,415 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.91 vs. limit=15.0 2024-08-18 22:51:41,382 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-18 22:51:50,435 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-18 22:51:52,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4142880.0, ans=0.125 2024-08-18 22:52:11,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4142980.0, ans=0.5 2024-08-18 22:52:23,477 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 13200, loss[loss=0.1096, beats_loss=0.009166, ecapa_loss=0.0001619, whisper_loss=0.09881, over 21854.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01046, ecapa_loss=0.0001422, whisper_loss=0.09009, over 3882920.49 frames. ], batch size: 90, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:52:28,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4143080.0, ans=0.1 2024-08-18 22:52:30,649 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-18 22:52:33,589 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 16 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-18 22:52:39,720 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.334e+01 2.573e+01 2.889e+01 3.948e+01, threshold=5.147e+01, percent-clipped=0.0 2024-08-18 22:52:54,053 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.42 vs. limit=22.5 2024-08-18 22:53:08,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4143380.0, ans=0.125 2024-08-18 22:53:32,943 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 13250, loss[loss=0.1043, beats_loss=0.01408, ecapa_loss=8.325e-05, whisper_loss=0.08943, over 16284.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01049, ecapa_loss=0.0001421, whisper_loss=0.09006, over 3891682.31 frames. ], batch size: 62, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:53:33,731 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.31 vs. limit=15.0 2024-08-18 22:53:41,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4143580.0, ans=0.125 2024-08-18 22:53:54,661 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.96 vs. limit=15.0 2024-08-18 22:53:55,413 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 22 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-18 22:54:04,452 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-18 22:54:07,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4143780.0, ans=0.125 2024-08-18 22:54:21,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4143880.0, ans=0.125 2024-08-18 22:54:29,993 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.17 vs. limit=22.5 2024-08-18 22:54:38,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4143980.0, ans=0.125 2024-08-18 22:54:41,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4143980.0, ans=0.1 2024-08-18 22:54:46,129 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 13300, loss[loss=0.09494, beats_loss=0.01117, ecapa_loss=0.0001353, whisper_loss=0.08242, over 15474.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01039, ecapa_loss=0.0001421, whisper_loss=0.09105, over 3892384.40 frames. ], batch size: 63, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:54:51,760 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4144080.0, ans=0.125 2024-08-18 22:55:02,712 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.713e+01 2.463e+01 2.740e+01 3.023e+01 1.147e+02, threshold=5.480e+01, percent-clipped=2.0 2024-08-18 22:55:05,329 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.85 vs. limit=15.0 2024-08-18 22:55:06,928 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-18 22:55:17,495 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 36 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-18 22:55:27,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4144380.0, ans=0.0 2024-08-18 22:55:51,217 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-18 22:55:51,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4144480.0, ans=0.1 2024-08-18 22:55:59,127 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 13350, loss[loss=0.09608, beats_loss=0.01074, ecapa_loss=0.0001622, whisper_loss=0.08373, over 16629.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01049, ecapa_loss=0.0001415, whisper_loss=0.09012, over 3926809.90 frames. ], batch size: 69, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:56:04,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4144580.0, ans=0.0 2024-08-18 22:56:12,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4144680.0, ans=0.0 2024-08-18 22:56:23,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4144680.0, ans=0.0 2024-08-18 22:56:31,288 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 36 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-18 22:56:42,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4144880.0, ans=0.09899494936611666 2024-08-18 22:56:55,025 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-18 22:57:00,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4144980.0, ans=0.0 2024-08-18 22:57:04,809 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-18 22:57:04,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4144980.0, ans=0.0 2024-08-18 22:57:09,993 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 13400, loss[loss=0.09526, beats_loss=0.01309, ecapa_loss=0.0001055, whisper_loss=0.08112, over 24195.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01052, ecapa_loss=0.0001413, whisper_loss=0.09027, over 3929780.95 frames. ], batch size: 95, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:57:12,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4145080.0, ans=0.125 2024-08-18 22:57:14,309 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.35 vs. limit=15.0 2024-08-18 22:57:20,915 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-18 22:57:22,348 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 22:57:25,918 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.309e+01 2.613e+01 2.893e+01 4.161e+01, threshold=5.227e+01, percent-clipped=0.0 2024-08-18 22:57:54,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4145380.0, ans=0.125 2024-08-18 22:57:55,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4145380.0, ans=0.1 2024-08-18 22:58:09,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4145480.0, ans=0.125 2024-08-18 22:58:17,575 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 13450, loss[loss=0.1085, beats_loss=0.008905, ecapa_loss=0.0001624, whisper_loss=0.09799, over 16833.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01048, ecapa_loss=0.0001414, whisper_loss=0.09127, over 3949055.90 frames. ], batch size: 66, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:58:18,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4145580.0, ans=0.0 2024-08-18 22:58:26,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4145580.0, ans=0.125 2024-08-18 22:58:40,895 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.56 vs. limit=15.0 2024-08-18 22:58:44,081 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 31 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-18 22:58:45,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=4145780.0, ans=0.05 2024-08-18 22:58:54,211 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.21 vs. limit=22.5 2024-08-18 22:59:05,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4145880.0, ans=0.125 2024-08-18 22:59:06,478 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-18 22:59:18,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4145980.0, ans=0.125 2024-08-18 22:59:22,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4145980.0, ans=0.0 2024-08-18 22:59:25,122 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 13500, loss[loss=0.1153, beats_loss=0.01152, ecapa_loss=0.0001116, whisper_loss=0.1026, over 24055.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01052, ecapa_loss=0.0001424, whisper_loss=0.09082, over 3948165.43 frames. ], batch size: 93, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:59:26,477 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-18 22:59:27,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4146080.0, ans=0.125 2024-08-18 22:59:33,483 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.67 vs. limit=22.5 2024-08-18 22:59:38,124 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 21 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-18 22:59:42,038 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.292e+01 2.450e+01 2.682e+01 3.330e+01, threshold=4.901e+01, percent-clipped=0.0 2024-08-18 22:59:44,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4146180.0, ans=0.125 2024-08-18 22:59:50,646 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-18 23:00:08,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=4146380.0, ans=0.1 2024-08-18 23:00:15,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4146380.0, ans=0.125 2024-08-18 23:00:17,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4146380.0, ans=0.5 2024-08-18 23:00:18,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4146480.0, ans=0.125 2024-08-18 23:00:23,661 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-18 23:00:25,326 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.55 vs. limit=12.0 2024-08-18 23:00:32,318 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 13550, loss[loss=0.1005, beats_loss=0.01057, ecapa_loss=0.0001405, whisper_loss=0.08853, over 19642.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01056, ecapa_loss=0.0001425, whisper_loss=0.09013, over 3922357.21 frames. ], batch size: 75, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 23:00:32,490 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 22 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-18 23:00:34,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4146580.0, ans=0.125 2024-08-18 23:00:47,930 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 31 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-18 23:00:48,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4146680.0, ans=0.2 2024-08-18 23:00:52,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4146680.0, ans=0.2 2024-08-18 23:00:55,328 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-18 23:01:02,223 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-18 23:01:23,807 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.05 vs. limit=12.0 2024-08-18 23:01:30,932 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-18 23:01:33,877 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-18 23:01:35,833 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 23:01:35,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4146980.0, ans=0.125 2024-08-18 23:01:41,364 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 13600, loss[loss=0.1231, beats_loss=0.01018, ecapa_loss=0.0001771, whisper_loss=0.1111, over 22647.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01054, ecapa_loss=0.0001423, whisper_loss=0.09068, over 3935143.09 frames. ], batch size: 90, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 23:01:57,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4147180.0, ans=0.0 2024-08-18 23:01:57,879 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.820e+01 2.313e+01 2.521e+01 2.832e+01 3.898e+01, threshold=5.043e+01, percent-clipped=0.0 2024-08-18 23:02:06,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4147180.0, ans=0.04949747468305833 2024-08-18 23:02:15,025 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-18 23:02:24,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4147380.0, ans=0.0 2024-08-18 23:02:35,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4147480.0, ans=0.0 2024-08-18 23:02:39,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4147480.0, ans=0.1 2024-08-18 23:02:39,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4147480.0, ans=0.125 2024-08-18 23:02:49,601 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 13650, loss[loss=0.1001, beats_loss=0.01085, ecapa_loss=0.0001433, whisper_loss=0.08777, over 22987.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01067, ecapa_loss=0.0001428, whisper_loss=0.09008, over 3945591.81 frames. ], batch size: 92, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 23:02:50,593 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.87 vs. limit=15.0 2024-08-18 23:02:52,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4147580.0, ans=0.95 2024-08-18 23:03:08,437 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.12 vs. limit=12.0 2024-08-18 23:03:15,293 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.03 vs. limit=15.0 2024-08-18 23:03:17,306 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 21 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-18 23:03:20,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4147780.0, ans=0.125 2024-08-18 23:03:21,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4147780.0, ans=0.125 2024-08-18 23:03:22,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4147780.0, ans=0.2 2024-08-18 23:03:46,002 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 17 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-18 23:03:56,210 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 13700, loss[loss=0.1105, beats_loss=0.01131, ecapa_loss=0.0001005, whisper_loss=0.09815, over 21860.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01067, ecapa_loss=0.0001417, whisper_loss=0.09046, over 3953916.70 frames. ], batch size: 80, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 23:04:11,952 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.317e+01 2.575e+01 2.845e+01 4.715e+01, threshold=5.149e+01, percent-clipped=0.0 2024-08-18 23:04:15,121 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.59 vs. limit=22.5 2024-08-18 23:04:15,249 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.35 vs. limit=12.0 2024-08-18 23:04:15,987 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 21 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-18 23:04:22,992 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-18 23:04:23,547 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.13 vs. limit=22.5 2024-08-18 23:04:24,269 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 24 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-18 23:04:24,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4148280.0, ans=0.0 2024-08-18 23:04:27,006 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-18 23:04:43,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4148380.0, ans=0.1 2024-08-18 23:04:47,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4148480.0, ans=0.125 2024-08-18 23:05:01,942 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 13750, loss[loss=0.08147, beats_loss=0.01091, ecapa_loss=0.0001399, whisper_loss=0.06916, over 20625.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01057, ecapa_loss=0.0001429, whisper_loss=0.09135, over 3926222.18 frames. ], batch size: 84, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:05:02,113 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 12 from Vox, 45 fro AS 2024-08-18 23:05:03,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4148580.0, ans=0.09899494936611666 2024-08-18 23:05:11,671 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-18 23:05:13,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4148580.0, ans=0.0 2024-08-18 23:05:33,802 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2024-08-18 23:05:51,418 WARNING [optim.py:496] (0/4) Scaling gradients by 0.06066558510065079, model_norm_threshold=51.49205017089844 2024-08-18 23:05:51,586 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.22, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.619e+05, grad_sumsq=1.558e+07, orig_rms_sq=1.039e-02 2024-08-18 23:06:00,947 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 23 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-18 23:06:03,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4148980.0, ans=0.0 2024-08-18 23:06:05,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4148980.0, ans=0.125 2024-08-18 23:06:07,877 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.13 vs. limit=22.5 2024-08-18 23:06:08,290 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 13800, loss[loss=0.1021, beats_loss=0.01094, ecapa_loss=0.0001362, whisper_loss=0.08975, over 21573.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01049, ecapa_loss=0.0001423, whisper_loss=0.09149, over 3928938.75 frames. ], batch size: 87, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:06:22,791 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-18 23:06:25,233 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.320e+01 2.516e+01 2.780e+01 8.488e+02, threshold=5.033e+01, percent-clipped=1.0 2024-08-18 23:06:44,755 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 30 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-18 23:06:54,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4149380.0, ans=0.125 2024-08-18 23:06:56,608 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-18 23:07:00,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4149480.0, ans=0.0 2024-08-18 23:07:06,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4149480.0, ans=0.09899494936611666 2024-08-18 23:07:14,674 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 13850, loss[loss=0.0887, beats_loss=0.01175, ecapa_loss=0.0001425, whisper_loss=0.07552, over 16224.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01048, ecapa_loss=0.0001427, whisper_loss=0.09202, over 3921092.54 frames. ], batch size: 65, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:07:17,581 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 14 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-18 23:07:41,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4149780.0, ans=0.0 2024-08-18 23:07:45,828 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-18 23:07:47,175 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 12 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-18 23:07:54,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4149880.0, ans=0.125 2024-08-18 23:07:56,499 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.99 vs. limit=10.0 2024-08-18 23:07:57,672 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.93 vs. limit=15.0 2024-08-18 23:07:58,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4149880.0, ans=0.125 2024-08-18 23:08:12,919 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-18 23:08:23,820 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 13900, loss[loss=0.07692, beats_loss=0.01238, ecapa_loss=0.0001191, whisper_loss=0.06334, over 16475.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01045, ecapa_loss=0.0001434, whisper_loss=0.09175, over 3907124.38 frames. ], batch size: 66, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:08:24,010 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 32 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-18 23:08:25,220 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-18 23:08:37,718 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-18 23:08:40,370 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.243e+01 2.562e+01 2.808e+01 4.472e+01, threshold=5.123e+01, percent-clipped=0.0 2024-08-18 23:08:41,835 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 19 from LS+wenet, 21 from Vox, 49 fro AS 2024-08-18 23:08:42,114 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.733e+01 2024-08-18 23:08:48,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4150180.0, ans=0.125 2024-08-18 23:08:56,162 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.89 vs. limit=5.0 2024-08-18 23:09:02,298 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 20 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-18 23:09:03,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4150380.0, ans=0.0 2024-08-18 23:09:05,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4150380.0, ans=0.1 2024-08-18 23:09:26,926 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-18 23:09:31,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4150580.0, ans=0.125 2024-08-18 23:09:31,914 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 13950, loss[loss=0.09583, beats_loss=0.01037, ecapa_loss=0.0001894, whisper_loss=0.08356, over 21234.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0105, ecapa_loss=0.0001434, whisper_loss=0.091, over 3920898.44 frames. ], batch size: 92, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:09:38,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4150580.0, ans=0.0 2024-08-18 23:09:56,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4150680.0, ans=0.125 2024-08-18 23:10:05,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4150780.0, ans=0.1 2024-08-18 23:10:21,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4150880.0, ans=0.05 2024-08-18 23:10:24,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4150980.0, ans=0.125 2024-08-18 23:10:38,879 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 14000, loss[loss=0.1243, beats_loss=0.009294, ecapa_loss=0.0001234, whisper_loss=0.1138, over 19058.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01049, ecapa_loss=0.0001428, whisper_loss=0.09109, over 3913920.06 frames. ], batch size: 71, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:10:49,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4151080.0, ans=0.125 2024-08-18 23:10:51,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=4151080.0, ans=10.0 2024-08-18 23:10:51,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4151080.0, ans=0.125 2024-08-18 23:10:56,368 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.301e+01 2.559e+01 2.803e+01 3.762e+01, threshold=5.117e+01, percent-clipped=0.0 2024-08-18 23:10:59,643 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.11 vs. limit=15.0 2024-08-18 23:11:08,413 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-18 23:11:21,418 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.04 vs. limit=12.0 2024-08-18 23:11:32,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4151380.0, ans=0.0 2024-08-18 23:11:38,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4151480.0, ans=0.125 2024-08-18 23:11:43,513 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-18 23:11:46,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4151480.0, ans=0.0 2024-08-18 23:11:48,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=4151580.0, ans=10.0 2024-08-18 23:11:49,162 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 14050, loss[loss=0.1086, beats_loss=0.01059, ecapa_loss=0.0001386, whisper_loss=0.09659, over 20025.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01055, ecapa_loss=0.0001425, whisper_loss=0.09047, over 3895002.01 frames. ], batch size: 82, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:11:53,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4151580.0, ans=0.125 2024-08-18 23:12:00,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4151580.0, ans=0.1 2024-08-18 23:12:11,564 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.87 vs. limit=10.0 2024-08-18 23:12:21,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4151780.0, ans=0.1 2024-08-18 23:12:22,728 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 25 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-18 23:12:27,107 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 30 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-18 23:12:28,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4151780.0, ans=0.125 2024-08-18 23:12:33,675 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-18 23:12:39,352 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-18 23:12:45,602 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.25 vs. limit=22.5 2024-08-18 23:12:48,548 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.71 vs. limit=15.0 2024-08-18 23:12:59,929 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 14100, loss[loss=0.1213, beats_loss=0.01054, ecapa_loss=0.0001299, whisper_loss=0.1095, over 22796.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01064, ecapa_loss=0.0001415, whisper_loss=0.09042, over 3931379.68 frames. ], batch size: 90, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:13:08,019 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-18 23:13:16,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=4152180.0, ans=0.2 2024-08-18 23:13:17,467 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.301e+01 2.548e+01 2.823e+01 5.332e+01, threshold=5.096e+01, percent-clipped=1.0 2024-08-18 23:13:45,609 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.68 vs. limit=6.0 2024-08-18 23:13:49,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4152380.0, ans=0.0 2024-08-18 23:13:51,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=4152380.0, ans=6.0 2024-08-18 23:13:52,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4152380.0, ans=0.125 2024-08-18 23:14:09,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4152480.0, ans=0.0 2024-08-18 23:14:12,142 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 14150, loss[loss=0.09816, beats_loss=0.01063, ecapa_loss=0.0001373, whisper_loss=0.08616, over 20281.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01066, ecapa_loss=0.000142, whisper_loss=0.08993, over 3901693.71 frames. ], batch size: 81, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:14:18,572 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.60 vs. limit=15.0 2024-08-18 23:14:22,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4152580.0, ans=0.1 2024-08-18 23:14:24,473 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4152580.0, ans=0.125 2024-08-18 23:14:39,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4152680.0, ans=0.1 2024-08-18 23:14:39,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4152680.0, ans=0.2 2024-08-18 23:14:50,379 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-18 23:15:20,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4152980.0, ans=0.0 2024-08-18 23:15:23,419 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 22 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-18 23:15:26,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4153080.0, ans=0.1 2024-08-18 23:15:27,358 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 14200, loss[loss=0.1367, beats_loss=0.007677, ecapa_loss=0.0001362, whisper_loss=0.1277, over 22397.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01059, ecapa_loss=0.0001426, whisper_loss=0.08981, over 3891369.56 frames. ], batch size: 84, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:15:30,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4153080.0, ans=0.125 2024-08-18 23:15:35,855 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-18 23:15:45,263 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.295e+01 2.597e+01 2.935e+01 2.411e+02, threshold=5.195e+01, percent-clipped=3.0 2024-08-18 23:15:55,083 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 29 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-18 23:15:56,395 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 21 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-18 23:16:10,303 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 14 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-18 23:16:20,654 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 19 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-18 23:16:38,100 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.08 vs. limit=15.0 2024-08-18 23:16:40,952 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 14250, loss[loss=0.1149, beats_loss=0.01002, ecapa_loss=0.000156, whisper_loss=0.1034, over 21813.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01058, ecapa_loss=0.0001427, whisper_loss=0.08935, over 3874491.16 frames. ], batch size: 89, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:16:44,397 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 19 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-18 23:17:03,930 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 27 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-18 23:17:04,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=4153680.0, ans=15.0 2024-08-18 23:17:07,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4153680.0, ans=0.1 2024-08-18 23:17:12,751 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 24 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-18 23:17:39,397 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 23:17:40,895 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.24 vs. limit=22.5 2024-08-18 23:17:43,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4153980.0, ans=0.125 2024-08-18 23:17:52,489 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.36 vs. limit=22.5 2024-08-18 23:17:54,773 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 14300, loss[loss=0.08568, beats_loss=0.01078, ecapa_loss=0.000163, whisper_loss=0.07327, over 19015.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01064, ecapa_loss=0.0001421, whisper_loss=0.08952, over 3889598.10 frames. ], batch size: 81, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:18:01,807 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 15 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-18 23:18:11,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4154180.0, ans=0.1 2024-08-18 23:18:12,373 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.726e+01 2.299e+01 2.504e+01 2.790e+01 4.018e+02, threshold=5.008e+01, percent-clipped=2.0 2024-08-18 23:18:20,817 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-18 23:18:31,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4154280.0, ans=0.1 2024-08-18 23:18:34,965 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 20 from LS+wenet, 28 from Vox, 44 fro AS 2024-08-18 23:18:43,968 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.47 vs. limit=15.0 2024-08-18 23:18:50,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4154480.0, ans=0.125 2024-08-18 23:19:00,970 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 29 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-18 23:19:03,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4154580.0, ans=0.125 2024-08-18 23:19:04,721 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 14350, loss[loss=0.1007, beats_loss=0.008918, ecapa_loss=0.0001331, whisper_loss=0.0905, over 13960.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01055, ecapa_loss=0.0001418, whisper_loss=0.08972, over 3877415.44 frames. ], batch size: 53, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:19:11,741 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 23:19:14,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4154580.0, ans=0.125 2024-08-18 23:19:30,026 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.10 vs. limit=15.0 2024-08-18 23:19:41,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4154780.0, ans=0.125 2024-08-18 23:19:44,500 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 26 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-18 23:19:49,491 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.39 vs. limit=12.0 2024-08-18 23:20:01,449 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.94 vs. limit=12.0 2024-08-18 23:20:06,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4154980.0, ans=0.125 2024-08-18 23:20:17,347 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 14400, loss[loss=0.09224, beats_loss=0.01156, ecapa_loss=0.000121, whisper_loss=0.07947, over 19587.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01059, ecapa_loss=0.0001413, whisper_loss=0.08952, over 3885599.86 frames. ], batch size: 76, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:20:34,733 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.269e+01 2.603e+01 3.055e+01 4.750e+01, threshold=5.206e+01, percent-clipped=0.0 2024-08-18 23:20:52,071 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.50 vs. limit=22.5 2024-08-18 23:21:22,697 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 14 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-18 23:21:32,436 INFO [train_multi_KD3.py:1116] (0/4) Epoch 28, batch 14450, loss[loss=0.1055, beats_loss=0.01154, ecapa_loss=0.0001115, whisper_loss=0.0928, over 22854.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01056, ecapa_loss=0.0001417, whisper_loss=0.08976, over 3862239.10 frames. ], batch size: 88, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:21:34,308 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 25 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-18 23:21:40,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4155580.0, ans=0.2 2024-08-18 23:21:44,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4155580.0, ans=0.125 2024-08-18 23:22:28,939 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-18 23:22:34,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4155980.0, ans=0.2 2024-08-18 23:22:44,132 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 27 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-18 23:22:48,964 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/epoch-28.pt 2024-08-18 23:23:31,033 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 0, loss[loss=0.09442, beats_loss=0.01029, ecapa_loss=0.0001438, whisper_loss=0.08269, over 19772.00 frames. ], tot_loss[loss=0.09442, beats_loss=0.01029, ecapa_loss=0.0001438, whisper_loss=0.08269, over 19772.00 frames. ], batch size: 81, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:23:31,034 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-18 23:23:49,701 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.5915, 2.6217, 3.0098, 3.2191], device='cuda:0') 2024-08-18 23:24:08,478 INFO [train_multi_KD3.py:1149] (0/4) Epoch 29, validation on ASR_libri: loss=0.2527, beats_loss=0, ecapa_loss=0.0005265, whisper_loss=0.2475, over 922467.00 frames. 2024-08-18 23:24:23,839 INFO [train_multi_KD3.py:1149] (0/4) Epoch 29, validation on SV_voxceleb1: loss=0.004049, beats_loss=0, ecapa_loss=0.0004049, whisper_loss=0, over 939242.00 frames. 2024-08-18 23:24:32,213 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([2.8930, 2.1458, 2.0216, 1.9679, 2.4574, 1.9756, 2.0698, 1.9300], device='cuda:0') 2024-08-18 23:25:27,557 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.8757, 2.2483, 2.0170, 1.4527, 1.8196, 1.7044, 2.0574, 1.9738], device='cuda:0') 2024-08-18 23:26:08,876 INFO [train_multi_KD3.py:1149] (0/4) Epoch 29, validation on AT_audioset: loss=0.02325, beats_loss=0.02325, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 23:26:08,880 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32389MB 2024-08-18 23:26:29,188 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 23:26:37,362 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.09 vs. limit=6.0 2024-08-18 23:26:41,485 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.401e+01 2.612e+01 3.065e+01 1.665e+02, threshold=5.224e+01, percent-clipped=1.0 2024-08-18 23:26:45,887 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.47 vs. limit=15.0 2024-08-18 23:27:00,913 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 26 from LS+wenet, 29 from Vox, 27 fro AS 2024-08-18 23:27:29,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4156370.0, ans=0.0 2024-08-18 23:27:33,955 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-18 23:27:39,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4156370.0, ans=0.2 2024-08-18 23:28:02,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4156470.0, ans=0.125 2024-08-18 23:28:10,073 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 50, loss[loss=0.07971, beats_loss=0.01054, ecapa_loss=0.0001327, whisper_loss=0.06784, over 13698.00 frames. ], tot_loss[loss=0.09941, beats_loss=0.009566, ecapa_loss=0.000152, whisper_loss=0.08833, over 889806.59 frames. ], batch size: 54, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:28:13,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4156570.0, ans=0.5 2024-08-18 23:28:15,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4156570.0, ans=0.1 2024-08-18 23:28:31,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4156670.0, ans=0.125 2024-08-18 23:29:00,545 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 27 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-18 23:29:18,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4156870.0, ans=0.125 2024-08-18 23:29:19,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4156870.0, ans=0.1 2024-08-18 23:29:34,848 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-18 23:29:51,272 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.81 vs. limit=6.0 2024-08-18 23:29:53,586 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.63 vs. limit=15.0 2024-08-18 23:29:55,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4156970.0, ans=0.125 2024-08-18 23:30:01,590 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 100, loss[loss=0.09914, beats_loss=0.01086, ecapa_loss=0.0001536, whisper_loss=0.08674, over 18662.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.00917, ecapa_loss=0.0001504, whisper_loss=0.09198, over 1528641.62 frames. ], batch size: 77, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:30:08,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4157070.0, ans=0.1 2024-08-18 23:30:30,219 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.602e+01 2.871e+01 3.209e+01 4.618e+01, threshold=5.741e+01, percent-clipped=0.0 2024-08-18 23:30:30,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4157170.0, ans=0.0 2024-08-18 23:30:45,142 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-18 23:30:54,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4157270.0, ans=0.1 2024-08-18 23:31:04,854 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-18 23:31:40,634 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-18 23:31:42,673 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 150, loss[loss=0.1008, beats_loss=0.009957, ecapa_loss=0.0001453, whisper_loss=0.08941, over 15340.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.009251, ecapa_loss=0.0001488, whisper_loss=0.09072, over 2023265.10 frames. ], batch size: 61, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:31:46,588 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.40 vs. limit=12.0 2024-08-18 23:32:03,083 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-18 23:32:07,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4157670.0, ans=0.125 2024-08-18 23:32:15,480 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4157770.0, ans=0.125 2024-08-18 23:32:23,166 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 23 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-18 23:32:23,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4157770.0, ans=0.125 2024-08-18 23:32:35,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4157870.0, ans=0.125 2024-08-18 23:32:42,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4157870.0, ans=0.2 2024-08-18 23:32:48,357 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4157970.0, ans=0.0 2024-08-18 23:32:55,878 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.92 vs. limit=12.0 2024-08-18 23:32:58,233 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-18 23:33:01,263 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 200, loss[loss=0.114, beats_loss=0.01218, ecapa_loss=0.0001168, whisper_loss=0.1007, over 21807.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.009467, ecapa_loss=0.0001475, whisper_loss=0.09126, over 2412499.35 frames. ], batch size: 86, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:33:01,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4158070.0, ans=0.0 2024-08-18 23:33:02,516 WARNING [optim.py:496] (0/4) Scaling gradients by 0.05759067460894585, model_norm_threshold=57.41115951538086 2024-08-18 23:33:02,687 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.48, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.794e+05, grad_sumsq=4.619e+07, orig_rms_sq=1.038e-02 2024-08-18 23:33:19,911 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.990e+01 2.401e+01 2.608e+01 2.915e+01 9.969e+02, threshold=5.216e+01, percent-clipped=2.0 2024-08-18 23:33:27,114 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-18 23:33:29,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4158270.0, ans=0.0 2024-08-18 23:33:31,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4158270.0, ans=0.125 2024-08-18 23:33:48,470 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 18 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-18 23:33:49,955 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 19 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-18 23:33:51,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4158370.0, ans=0.125 2024-08-18 23:33:57,137 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2024-08-18 23:33:59,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4158470.0, ans=0.125 2024-08-18 23:34:11,762 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 250, loss[loss=0.09151, beats_loss=0.009873, ecapa_loss=0.0001642, whisper_loss=0.08, over 19285.00 frames. ], tot_loss[loss=0.103, beats_loss=0.009656, ecapa_loss=0.0001461, whisper_loss=0.09193, over 2738267.04 frames. ], batch size: 77, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:34:21,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4158570.0, ans=0.125 2024-08-18 23:34:32,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4158670.0, ans=0.1 2024-08-18 23:34:41,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4158770.0, ans=0.025 2024-08-18 23:34:46,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=4158770.0, ans=0.05 2024-08-18 23:34:52,321 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-18 23:34:59,201 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 26 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-18 23:35:07,340 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 25 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-18 23:35:10,313 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.56 vs. limit=10.0 2024-08-18 23:35:16,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4158970.0, ans=0.125 2024-08-18 23:35:17,762 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4159070.0, ans=0.07 2024-08-18 23:35:17,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4159070.0, ans=0.0 2024-08-18 23:35:18,467 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 300, loss[loss=0.1033, beats_loss=0.009439, ecapa_loss=0.0001446, whisper_loss=0.09244, over 19795.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.009808, ecapa_loss=0.0001458, whisper_loss=0.09104, over 2945779.82 frames. ], batch size: 79, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:35:35,806 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.849e+01 2.277e+01 2.615e+01 3.023e+01 6.948e+01, threshold=5.231e+01, percent-clipped=1.0 2024-08-18 23:35:56,535 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 15 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-18 23:36:18,442 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 26 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-18 23:36:26,461 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 350, loss[loss=0.07683, beats_loss=0.0106, ecapa_loss=0.0001197, whisper_loss=0.06503, over 15309.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.009947, ecapa_loss=0.000146, whisper_loss=0.091, over 3157031.05 frames. ], batch size: 58, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:36:29,308 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 23 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-18 23:36:38,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4159570.0, ans=0.0 2024-08-18 23:36:40,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4159670.0, ans=0.125 2024-08-18 23:36:40,882 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 23:37:04,345 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 24 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-18 23:37:05,153 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.57 vs. limit=15.0 2024-08-18 23:37:05,676 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 29 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-18 23:37:20,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4159870.0, ans=0.1 2024-08-18 23:37:23,889 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-18 23:37:25,154 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-416000.pt 2024-08-18 23:37:38,402 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 400, loss[loss=0.1064, beats_loss=0.009175, ecapa_loss=0.00011, whisper_loss=0.09611, over 16725.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01, ecapa_loss=0.0001437, whisper_loss=0.09091, over 3297642.96 frames. ], batch size: 62, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:37:45,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4160070.0, ans=0.2 2024-08-18 23:37:55,253 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.335e+01 2.628e+01 2.920e+01 4.432e+01, threshold=5.257e+01, percent-clipped=0.0 2024-08-18 23:37:57,946 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 21 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-18 23:38:00,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4160170.0, ans=0.0 2024-08-18 23:38:21,032 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-18 23:38:25,354 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.33 vs. limit=15.0 2024-08-18 23:38:28,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=4160370.0, ans=15.0 2024-08-18 23:38:35,555 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 16 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-18 23:38:36,836 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-18 23:38:46,445 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 450, loss[loss=0.1187, beats_loss=0.009324, ecapa_loss=0.000128, whisper_loss=0.1081, over 24100.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01008, ecapa_loss=0.0001446, whisper_loss=0.09075, over 3420287.61 frames. ], batch size: 90, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:38:48,329 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 26 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-18 23:38:50,691 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 23 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-18 23:38:54,186 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-18 23:38:56,712 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 19 from LS+wenet, 9 from Vox, 30 fro AS 2024-08-18 23:39:02,025 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 27 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-18 23:39:07,498 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-18 23:39:44,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4160970.0, ans=0.1 2024-08-18 23:39:51,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4160970.0, ans=0.0 2024-08-18 23:39:55,447 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 500, loss[loss=0.07686, beats_loss=0.01054, ecapa_loss=0.0001373, whisper_loss=0.06494, over 17378.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01005, ecapa_loss=0.0001443, whisper_loss=0.09073, over 3513138.99 frames. ], batch size: 68, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:40:08,054 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 30 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-18 23:40:10,794 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 26 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-18 23:40:13,471 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.408e+01 2.730e+01 3.089e+01 1.165e+02, threshold=5.459e+01, percent-clipped=2.0 2024-08-18 23:40:22,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4161270.0, ans=0.035 2024-08-18 23:40:23,420 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 19 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-18 23:40:30,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4161270.0, ans=0.0 2024-08-18 23:41:05,127 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 550, loss[loss=0.1098, beats_loss=0.009259, ecapa_loss=0.0001519, whisper_loss=0.09905, over 21130.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01017, ecapa_loss=0.0001438, whisper_loss=0.09078, over 3610842.76 frames. ], batch size: 83, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:41:06,571 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 34 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-18 23:41:17,694 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 36 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-18 23:41:59,710 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.57 vs. limit=15.0 2024-08-18 23:42:12,627 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 600, loss[loss=0.09376, beats_loss=0.009282, ecapa_loss=0.0001212, whisper_loss=0.08326, over 14880.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01017, ecapa_loss=0.0001432, whisper_loss=0.09083, over 3642830.28 frames. ], batch size: 57, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:42:16,951 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-18 23:42:23,783 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 15 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-18 23:42:30,323 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.259e+01 2.406e+01 2.695e+01 1.044e+02, threshold=4.812e+01, percent-clipped=1.0 2024-08-18 23:42:33,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4162170.0, ans=0.0 2024-08-18 23:42:37,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4162170.0, ans=0.0 2024-08-18 23:42:39,114 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 17 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-18 23:43:05,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4162470.0, ans=0.125 2024-08-18 23:43:08,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4162470.0, ans=0.0 2024-08-18 23:43:19,962 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 650, loss[loss=0.08527, beats_loss=0.01119, ecapa_loss=0.000137, whisper_loss=0.0727, over 15185.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01026, ecapa_loss=0.0001429, whisper_loss=0.09, over 3659744.32 frames. ], batch size: 62, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:43:21,565 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-18 23:43:21,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4162570.0, ans=0.125 2024-08-18 23:43:29,475 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-18 23:43:29,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4162570.0, ans=0.035 2024-08-18 23:43:36,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4162670.0, ans=0.125 2024-08-18 23:43:42,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4162670.0, ans=0.125 2024-08-18 23:43:47,044 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 18 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-18 23:43:49,243 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.19 vs. limit=15.0 2024-08-18 23:44:05,238 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.27 vs. limit=10.0 2024-08-18 23:44:11,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4162870.0, ans=0.0 2024-08-18 23:44:13,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4162970.0, ans=0.0 2024-08-18 23:44:14,097 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.11 vs. limit=22.5 2024-08-18 23:44:27,115 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 700, loss[loss=0.1001, beats_loss=0.008965, ecapa_loss=0.0002072, whisper_loss=0.08905, over 14225.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0103, ecapa_loss=0.0001431, whisper_loss=0.09006, over 3685514.00 frames. ], batch size: 62, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:44:39,701 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-18 23:44:44,124 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.32 vs. limit=15.0 2024-08-18 23:44:44,847 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.232e+01 2.495e+01 2.687e+01 4.242e+01, threshold=4.990e+01, percent-clipped=0.0 2024-08-18 23:45:18,874 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4163370.0, ans=0.2 2024-08-18 23:45:18,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4163370.0, ans=0.1 2024-08-18 23:45:34,230 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 750, loss[loss=0.1178, beats_loss=0.009417, ecapa_loss=0.0001728, whisper_loss=0.1066, over 18234.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01025, ecapa_loss=0.000143, whisper_loss=0.09028, over 3704911.35 frames. ], batch size: 71, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:45:56,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4163670.0, ans=0.125 2024-08-18 23:45:57,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4163670.0, ans=0.125 2024-08-18 23:46:26,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4163870.0, ans=0.125 2024-08-18 23:46:34,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4163970.0, ans=0.0 2024-08-18 23:46:35,523 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-18 23:46:39,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4163970.0, ans=0.125 2024-08-18 23:46:42,148 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 800, loss[loss=0.09441, beats_loss=0.01024, ecapa_loss=0.0001466, whisper_loss=0.08271, over 18264.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01029, ecapa_loss=0.000143, whisper_loss=0.08999, over 3761018.47 frames. ], batch size: 72, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:46:50,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4164070.0, ans=0.125 2024-08-18 23:46:59,125 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.180e+01 2.421e+01 2.736e+01 5.483e+01, threshold=4.843e+01, percent-clipped=1.0 2024-08-18 23:47:03,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4164170.0, ans=0.95 2024-08-18 23:47:08,746 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.99 vs. limit=6.0 2024-08-18 23:47:14,609 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 30 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-18 23:47:15,897 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-18 23:47:33,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4164370.0, ans=0.125 2024-08-18 23:47:34,808 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-18 23:47:35,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4164470.0, ans=0.125 2024-08-18 23:47:43,295 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 24 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-18 23:47:47,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4164470.0, ans=0.2 2024-08-18 23:47:47,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4164470.0, ans=0.2 2024-08-18 23:47:49,600 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 850, loss[loss=0.08969, beats_loss=0.009687, ecapa_loss=0.0001638, whisper_loss=0.07837, over 14075.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01022, ecapa_loss=0.0001421, whisper_loss=0.08965, over 3770022.62 frames. ], batch size: 58, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:47:52,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4164570.0, ans=0.1 2024-08-18 23:47:55,269 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-18 23:48:04,781 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-18 23:48:09,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4164670.0, ans=0.0 2024-08-18 23:48:20,347 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-18 23:48:48,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4164970.0, ans=0.0 2024-08-18 23:48:57,008 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.64 vs. limit=15.0 2024-08-18 23:48:57,714 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 900, loss[loss=0.1024, beats_loss=0.00977, ecapa_loss=0.000136, whisper_loss=0.09131, over 18245.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01029, ecapa_loss=0.0001415, whisper_loss=0.08884, over 3752346.13 frames. ], batch size: 71, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:48:57,881 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 33 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-18 23:49:05,511 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.47 vs. limit=15.0 2024-08-18 23:49:06,189 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 22 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-18 23:49:14,961 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.209e+01 2.441e+01 2.717e+01 4.171e+01, threshold=4.882e+01, percent-clipped=0.0 2024-08-18 23:49:19,284 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 22 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-18 23:49:26,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4165270.0, ans=0.0 2024-08-18 23:49:32,467 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 33 from Vox, 35 fro AS 2024-08-18 23:49:38,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4165370.0, ans=0.125 2024-08-18 23:49:39,594 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-18 23:49:57,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4165470.0, ans=0.125 2024-08-18 23:50:02,842 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 25 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-18 23:50:04,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4165570.0, ans=0.125 2024-08-18 23:50:05,279 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 950, loss[loss=0.1007, beats_loss=0.01106, ecapa_loss=0.0001316, whisper_loss=0.08831, over 22516.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01026, ecapa_loss=0.000142, whisper_loss=0.08924, over 3768521.50 frames. ], batch size: 89, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:50:08,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4165570.0, ans=0.0 2024-08-18 23:50:13,501 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-18 23:50:31,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4165770.0, ans=0.015 2024-08-18 23:50:53,733 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.10 vs. limit=10.0 2024-08-18 23:50:55,065 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.01 vs. limit=12.0 2024-08-18 23:50:57,452 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4165870.0, ans=0.125 2024-08-18 23:50:58,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4165970.0, ans=0.125 2024-08-18 23:51:00,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4165970.0, ans=0.125 2024-08-18 23:51:13,496 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 1000, loss[loss=0.1092, beats_loss=0.009056, ecapa_loss=0.0001438, whisper_loss=0.09872, over 14923.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01022, ecapa_loss=0.000141, whisper_loss=0.0893, over 3761921.65 frames. ], batch size: 58, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:51:20,866 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-18 23:51:26,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4166170.0, ans=0.2 2024-08-18 23:51:30,083 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 34 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-18 23:51:30,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4166170.0, ans=0.0 2024-08-18 23:51:31,119 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.375e+01 2.563e+01 2.832e+01 6.372e+01, threshold=5.125e+01, percent-clipped=2.0 2024-08-18 23:51:34,118 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 35 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-18 23:51:34,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4166170.0, ans=0.125 2024-08-18 23:51:49,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4166270.0, ans=0.125 2024-08-18 23:51:54,736 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-18 23:51:56,219 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.11 vs. limit=22.5 2024-08-18 23:51:57,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4166370.0, ans=0.1 2024-08-18 23:52:01,512 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-18 23:52:04,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=4166370.0, ans=0.025 2024-08-18 23:52:21,789 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 1050, loss[loss=0.1036, beats_loss=0.009423, ecapa_loss=0.0001385, whisper_loss=0.0928, over 16641.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01027, ecapa_loss=0.0001406, whisper_loss=0.08923, over 3796082.25 frames. ], batch size: 63, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:52:23,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4166570.0, ans=0.125 2024-08-18 23:52:55,997 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.72 vs. limit=22.5 2024-08-18 23:53:03,380 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 33 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-18 23:53:21,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4166970.0, ans=0.125 2024-08-18 23:53:29,501 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 1100, loss[loss=0.1104, beats_loss=0.009417, ecapa_loss=0.0001649, whisper_loss=0.09934, over 21545.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.0103, ecapa_loss=0.0001409, whisper_loss=0.08913, over 3814145.57 frames. ], batch size: 89, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:53:35,051 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 16 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-18 23:53:40,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4167070.0, ans=0.0 2024-08-18 23:53:46,317 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.816e+01 2.348e+01 2.551e+01 2.998e+01 5.491e+01, threshold=5.102e+01, percent-clipped=1.0 2024-08-18 23:53:49,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4167170.0, ans=0.0 2024-08-18 23:53:52,354 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 28 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-18 23:54:00,443 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.50 vs. limit=22.5 2024-08-18 23:54:02,269 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.71 vs. limit=6.0 2024-08-18 23:54:12,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4167370.0, ans=0.2 2024-08-18 23:54:22,637 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 17 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-18 23:54:34,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4167470.0, ans=0.0 2024-08-18 23:54:36,199 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 1150, loss[loss=0.09862, beats_loss=0.01059, ecapa_loss=0.0001345, whisper_loss=0.08669, over 20554.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01037, ecapa_loss=0.0001398, whisper_loss=0.08922, over 3815965.02 frames. ], batch size: 80, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:54:43,890 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.38 vs. limit=10.0 2024-08-18 23:54:45,890 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 19 from LS+wenet, 33 from Vox, 37 fro AS 2024-08-18 23:54:50,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4167670.0, ans=0.125 2024-08-18 23:55:05,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4167770.0, ans=0.125 2024-08-18 23:55:06,393 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 23:55:34,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4167970.0, ans=0.125 2024-08-18 23:55:43,873 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 1200, loss[loss=0.1027, beats_loss=0.01159, ecapa_loss=0.0001493, whisper_loss=0.08964, over 23082.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01043, ecapa_loss=0.0001394, whisper_loss=0.08936, over 3790348.61 frames. ], batch size: 93, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:55:51,014 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 16 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-18 23:55:53,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4168070.0, ans=0.125 2024-08-18 23:56:00,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4168170.0, ans=0.2 2024-08-18 23:56:01,574 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.277e+01 2.488e+01 2.834e+01 2.594e+02, threshold=4.975e+01, percent-clipped=3.0 2024-08-18 23:56:12,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4168270.0, ans=0.125 2024-08-18 23:56:31,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4168370.0, ans=0.0 2024-08-18 23:56:44,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4168470.0, ans=0.2 2024-08-18 23:56:51,137 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 1250, loss[loss=0.1105, beats_loss=0.009122, ecapa_loss=0.0001376, whisper_loss=0.1, over 19339.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01049, ecapa_loss=0.0001386, whisper_loss=0.08916, over 3808146.29 frames. ], batch size: 73, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:56:59,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4168570.0, ans=0.0 2024-08-18 23:57:05,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4168670.0, ans=0.2 2024-08-18 23:57:07,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4168670.0, ans=0.125 2024-08-18 23:57:15,990 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 10 from Vox, 36 fro AS 2024-08-18 23:57:26,034 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 39 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-18 23:57:28,463 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.38 vs. limit=15.0 2024-08-18 23:57:33,156 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-18 23:57:45,571 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 23:57:54,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4168970.0, ans=0.1 2024-08-18 23:57:58,400 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 1300, loss[loss=0.08311, beats_loss=0.01327, ecapa_loss=0.0001153, whisper_loss=0.06869, over 21412.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01049, ecapa_loss=0.0001395, whisper_loss=0.08945, over 3825373.94 frames. ], batch size: 87, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:58:15,392 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.54 vs. limit=10.0 2024-08-18 23:58:17,205 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.654e+01 2.196e+01 2.537e+01 2.815e+01 5.238e+01, threshold=5.075e+01, percent-clipped=1.0 2024-08-18 23:58:18,888 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 21 from LS+wenet, 33 from Vox, 36 fro AS 2024-08-18 23:58:27,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4169270.0, ans=0.0 2024-08-18 23:58:29,231 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.98 vs. limit=22.5 2024-08-18 23:58:36,182 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-18 23:58:39,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4169370.0, ans=0.125 2024-08-18 23:58:55,396 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 12 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-18 23:58:58,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4169470.0, ans=0.0 2024-08-18 23:59:06,416 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 1350, loss[loss=0.09737, beats_loss=0.0107, ecapa_loss=0.0001279, whisper_loss=0.08539, over 22950.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01046, ecapa_loss=0.0001396, whisper_loss=0.08959, over 3841397.22 frames. ], batch size: 92, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:59:06,888 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.80 vs. limit=15.0 2024-08-18 23:59:07,714 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-18 23:59:10,801 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 27 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-18 23:59:11,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4169570.0, ans=0.0 2024-08-18 23:59:11,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4169570.0, ans=0.125 2024-08-18 23:59:23,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4169670.0, ans=0.0 2024-08-18 23:59:26,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4169670.0, ans=0.0 2024-08-18 23:59:28,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4169670.0, ans=0.035 2024-08-18 23:59:35,606 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 26 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-18 23:59:51,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4169870.0, ans=0.0 2024-08-18 23:59:53,837 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 20 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-18 23:59:55,246 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-18 23:59:56,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4169870.0, ans=0.1 2024-08-18 23:59:57,975 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 24 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-18 23:59:58,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4169870.0, ans=0.125 2024-08-19 00:00:14,463 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 1400, loss[loss=0.07828, beats_loss=0.01311, ecapa_loss=0.000148, whisper_loss=0.06369, over 21259.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01042, ecapa_loss=0.0001407, whisper_loss=0.0896, over 3842440.24 frames. ], batch size: 89, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:00:16,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4170070.0, ans=0.125 2024-08-19 00:00:33,078 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.158e+01 2.419e+01 2.652e+01 3.776e+01, threshold=4.839e+01, percent-clipped=0.0 2024-08-19 00:00:35,589 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.73 vs. limit=15.0 2024-08-19 00:00:40,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4170270.0, ans=0.0 2024-08-19 00:00:48,121 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 28 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-19 00:00:57,656 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 29 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-19 00:01:06,909 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.08 vs. limit=22.5 2024-08-19 00:01:07,972 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.05 vs. limit=15.0 2024-08-19 00:01:14,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4170470.0, ans=0.125 2024-08-19 00:01:22,102 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 1450, loss[loss=0.09398, beats_loss=0.01044, ecapa_loss=0.0001943, whisper_loss=0.0816, over 14815.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01032, ecapa_loss=0.0001408, whisper_loss=0.08974, over 3825738.76 frames. ], batch size: 61, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:01:22,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4170570.0, ans=0.1 2024-08-19 00:01:57,164 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.70 vs. limit=6.0 2024-08-19 00:02:14,638 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 10 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-19 00:02:19,140 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-19 00:02:36,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4170870.0, ans=0.125 2024-08-19 00:02:41,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4170870.0, ans=0.2 2024-08-19 00:02:57,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4170970.0, ans=0.0 2024-08-19 00:02:59,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4170970.0, ans=0.0 2024-08-19 00:03:05,593 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 1500, loss[loss=0.09937, beats_loss=0.0109, ecapa_loss=0.0001466, whisper_loss=0.08701, over 17090.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01041, ecapa_loss=0.0001399, whisper_loss=0.08955, over 3816511.79 frames. ], batch size: 69, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:03:24,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4171170.0, ans=0.0 2024-08-19 00:03:27,136 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.743e+01 2.239e+01 2.511e+01 2.818e+01 6.129e+01, threshold=5.023e+01, percent-clipped=1.0 2024-08-19 00:03:46,608 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 17 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-19 00:03:51,473 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 16 from Vox, 47 fro AS 2024-08-19 00:04:11,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4171470.0, ans=0.2 2024-08-19 00:04:21,035 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 1550, loss[loss=0.09746, beats_loss=0.00836, ecapa_loss=0.0001565, whisper_loss=0.08754, over 14209.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01042, ecapa_loss=0.0001398, whisper_loss=0.08947, over 3798194.69 frames. ], batch size: 55, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:04:25,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4171570.0, ans=0.2 2024-08-19 00:04:31,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4171570.0, ans=0.09899494936611666 2024-08-19 00:04:36,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4171670.0, ans=0.125 2024-08-19 00:04:46,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4171670.0, ans=0.2 2024-08-19 00:04:56,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4171770.0, ans=0.125 2024-08-19 00:04:56,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4171770.0, ans=0.125 2024-08-19 00:04:59,588 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 22 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-19 00:05:18,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4171970.0, ans=0.0 2024-08-19 00:05:34,416 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 1600, loss[loss=0.112, beats_loss=0.008403, ecapa_loss=0.0001155, whisper_loss=0.1024, over 14948.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01044, ecapa_loss=0.0001398, whisper_loss=0.08858, over 3817060.21 frames. ], batch size: 55, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:05:36,861 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.76 vs. limit=22.5 2024-08-19 00:05:43,493 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 29 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-19 00:05:53,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4172170.0, ans=0.1 2024-08-19 00:05:56,213 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.800e+01 2.337e+01 2.619e+01 2.865e+01 4.288e+01, threshold=5.238e+01, percent-clipped=0.0 2024-08-19 00:05:56,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=4172170.0, ans=10.0 2024-08-19 00:06:04,633 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 24 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-19 00:06:04,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4172270.0, ans=0.1 2024-08-19 00:06:14,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4172270.0, ans=0.125 2024-08-19 00:06:23,308 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.72 vs. limit=15.0 2024-08-19 00:06:28,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4172370.0, ans=0.125 2024-08-19 00:06:30,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4172370.0, ans=0.2 2024-08-19 00:06:33,702 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 23 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-19 00:06:39,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4172470.0, ans=0.125 2024-08-19 00:06:47,193 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 1650, loss[loss=0.09379, beats_loss=0.01111, ecapa_loss=0.0001464, whisper_loss=0.08122, over 21954.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.0105, ecapa_loss=0.0001391, whisper_loss=0.08858, over 3822972.85 frames. ], batch size: 86, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:06:49,228 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.70 vs. limit=10.0 2024-08-19 00:06:59,778 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-19 00:07:06,303 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-19 00:07:09,082 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-19 00:07:11,802 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 34 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-19 00:07:20,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4172770.0, ans=0.0 2024-08-19 00:07:24,206 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.78 vs. limit=22.5 2024-08-19 00:07:29,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4172770.0, ans=0.125 2024-08-19 00:07:35,705 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.77 vs. limit=15.0 2024-08-19 00:07:48,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=4172970.0, ans=0.025 2024-08-19 00:07:49,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4172970.0, ans=0.125 2024-08-19 00:07:54,443 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 18 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-19 00:07:59,462 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 1700, loss[loss=0.0921, beats_loss=0.009932, ecapa_loss=0.0001327, whisper_loss=0.08084, over 22818.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01042, ecapa_loss=0.0001385, whisper_loss=0.0894, over 3826462.18 frames. ], batch size: 88, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:08:09,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4173070.0, ans=0.1 2024-08-19 00:08:22,440 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.820e+01 2.287e+01 2.515e+01 2.871e+01 4.134e+01, threshold=5.030e+01, percent-clipped=0.0 2024-08-19 00:08:31,005 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-19 00:08:34,603 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-19 00:08:43,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4173270.0, ans=0.125 2024-08-19 00:09:09,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4173470.0, ans=0.1 2024-08-19 00:09:15,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4173470.0, ans=0.2 2024-08-19 00:09:27,563 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 1750, loss[loss=0.09621, beats_loss=0.01178, ecapa_loss=0.000157, whisper_loss=0.08286, over 22167.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01044, ecapa_loss=0.0001385, whisper_loss=0.0891, over 3817484.59 frames. ], batch size: 90, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:09:32,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4173570.0, ans=0.125 2024-08-19 00:09:45,824 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.64 vs. limit=22.5 2024-08-19 00:09:59,967 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=9.569e+00 2024-08-19 00:10:04,852 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.35 vs. limit=15.0 2024-08-19 00:10:05,542 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-19 00:10:09,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4173770.0, ans=0.0 2024-08-19 00:10:20,871 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-19 00:10:47,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4173970.0, ans=0.125 2024-08-19 00:10:52,446 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 1800, loss[loss=0.07974, beats_loss=0.01286, ecapa_loss=0.0001017, whisper_loss=0.06586, over 16318.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01049, ecapa_loss=0.0001381, whisper_loss=0.08846, over 3822269.46 frames. ], batch size: 63, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:10:55,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4174070.0, ans=0.2 2024-08-19 00:10:58,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4174070.0, ans=0.0 2024-08-19 00:11:12,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4174170.0, ans=0.0 2024-08-19 00:11:16,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4174170.0, ans=0.1 2024-08-19 00:11:18,292 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-19 00:11:19,702 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.235e+01 2.459e+01 2.784e+01 3.423e+01, threshold=4.917e+01, percent-clipped=0.0 2024-08-19 00:11:20,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4174170.0, ans=0.0 2024-08-19 00:11:28,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4174170.0, ans=0.125 2024-08-19 00:11:35,264 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-19 00:11:35,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4174270.0, ans=0.09899494936611666 2024-08-19 00:11:41,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4174270.0, ans=0.1 2024-08-19 00:11:57,481 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-19 00:11:57,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4174370.0, ans=0.0 2024-08-19 00:12:02,082 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 17 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-19 00:12:02,688 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.54 vs. limit=15.0 2024-08-19 00:12:30,298 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.66 vs. limit=22.5 2024-08-19 00:12:35,641 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 1850, loss[loss=0.0883, beats_loss=0.01062, ecapa_loss=0.0001773, whisper_loss=0.0759, over 14849.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01041, ecapa_loss=0.0001383, whisper_loss=0.08881, over 3807206.74 frames. ], batch size: 61, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:12:41,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4174570.0, ans=0.0 2024-08-19 00:12:53,944 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-19 00:13:01,094 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 27 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-19 00:13:18,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4174770.0, ans=0.2 2024-08-19 00:13:34,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4174770.0, ans=0.125 2024-08-19 00:13:57,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=4174970.0, ans=0.5 2024-08-19 00:14:06,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4174970.0, ans=0.0 2024-08-19 00:14:19,592 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 1900, loss[loss=0.1151, beats_loss=0.009587, ecapa_loss=0.0001312, whisper_loss=0.1042, over 22641.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01046, ecapa_loss=0.0001387, whisper_loss=0.0886, over 3793570.79 frames. ], batch size: 88, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:14:42,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4175170.0, ans=0.125 2024-08-19 00:14:48,750 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.965e+01 2.320e+01 2.567e+01 2.860e+01 4.992e+01, threshold=5.134e+01, percent-clipped=1.0 2024-08-19 00:15:19,915 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-19 00:15:41,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=4175370.0, ans=0.025 2024-08-19 00:15:51,069 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 18 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-19 00:15:58,871 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 1950, loss[loss=0.107, beats_loss=0.006981, ecapa_loss=0.0001536, whisper_loss=0.09848, over 14273.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01038, ecapa_loss=0.0001391, whisper_loss=0.08914, over 3803755.26 frames. ], batch size: 54, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:15:59,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4175570.0, ans=0.2 2024-08-19 00:16:20,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4175670.0, ans=0.125 2024-08-19 00:16:25,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4175770.0, ans=0.1 2024-08-19 00:16:30,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4175770.0, ans=0.2 2024-08-19 00:16:40,964 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.53 vs. limit=12.0 2024-08-19 00:16:41,997 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.08 vs. limit=15.0 2024-08-19 00:16:47,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4175870.0, ans=0.125 2024-08-19 00:16:55,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4175970.0, ans=0.2 2024-08-19 00:17:08,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4175970.0, ans=0.125 2024-08-19 00:17:10,453 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 2000, loss[loss=0.1164, beats_loss=0.009887, ecapa_loss=0.0001352, whisper_loss=0.1052, over 23004.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01036, ecapa_loss=0.0001389, whisper_loss=0.089, over 3824882.00 frames. ], batch size: 92, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:17:17,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4176070.0, ans=0.0 2024-08-19 00:17:30,900 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.704e+01 2.262e+01 2.428e+01 2.730e+01 5.039e+01, threshold=4.856e+01, percent-clipped=0.0 2024-08-19 00:17:32,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=4176170.0, ans=0.02 2024-08-19 00:17:39,486 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 21 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-19 00:18:05,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=4176370.0, ans=0.025 2024-08-19 00:18:05,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4176370.0, ans=0.125 2024-08-19 00:18:18,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4176470.0, ans=0.1 2024-08-19 00:18:22,234 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 2050, loss[loss=0.0981, beats_loss=0.0107, ecapa_loss=0.0001139, whisper_loss=0.08626, over 19721.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01038, ecapa_loss=0.0001385, whisper_loss=0.08853, over 3803771.91 frames. ], batch size: 77, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:18:25,403 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-19 00:18:30,792 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 17 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-19 00:18:32,521 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.98 vs. limit=6.0 2024-08-19 00:18:35,431 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 30 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-19 00:18:59,367 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-19 00:19:02,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4176770.0, ans=0.0 2024-08-19 00:19:04,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4176870.0, ans=0.0 2024-08-19 00:19:30,973 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-19 00:19:33,326 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 2100, loss[loss=0.08415, beats_loss=0.0122, ecapa_loss=0.0001428, whisper_loss=0.07052, over 21552.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.0104, ecapa_loss=0.0001379, whisper_loss=0.08885, over 3813962.72 frames. ], batch size: 90, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:19:34,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4177070.0, ans=0.125 2024-08-19 00:19:54,051 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.287e+01 2.483e+01 2.876e+01 4.561e+01, threshold=4.966e+01, percent-clipped=0.0 2024-08-19 00:20:01,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4177270.0, ans=0.125 2024-08-19 00:20:03,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4177270.0, ans=0.1 2024-08-19 00:20:05,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4177270.0, ans=0.2 2024-08-19 00:20:12,164 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-19 00:20:22,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4177370.0, ans=0.0 2024-08-19 00:20:31,572 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 16 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-19 00:20:42,758 WARNING [optim.py:496] (0/4) Scaling gradients by 0.019715236499905586, model_norm_threshold=49.664920806884766 2024-08-19 00:20:42,926 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.827e+05, grad_sumsq=8.827e+05, orig_rms_sq=1.000e+00 2024-08-19 00:20:44,497 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 2150, loss[loss=0.1102, beats_loss=0.009074, ecapa_loss=0.0001476, whisper_loss=0.09969, over 22394.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01047, ecapa_loss=0.000138, whisper_loss=0.08879, over 3816421.38 frames. ], batch size: 89, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:20:47,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4177570.0, ans=0.125 2024-08-19 00:20:55,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4177570.0, ans=0.0 2024-08-19 00:20:56,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4177570.0, ans=0.025 2024-08-19 00:21:00,543 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 25 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-19 00:21:09,947 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 17 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-19 00:21:11,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4177770.0, ans=0.0 2024-08-19 00:21:12,657 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 22 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-19 00:21:18,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4177770.0, ans=0.125 2024-08-19 00:21:42,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4177970.0, ans=0.0 2024-08-19 00:21:54,891 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 2200, loss[loss=0.1104, beats_loss=0.009808, ecapa_loss=0.0001162, whisper_loss=0.09943, over 24822.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01049, ecapa_loss=0.0001376, whisper_loss=0.08878, over 3796679.68 frames. ], batch size: 91, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:21:59,254 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-19 00:22:01,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4178070.0, ans=0.0 2024-08-19 00:22:01,294 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.71 vs. limit=15.0 2024-08-19 00:22:14,870 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.733e+01 2.401e+01 2.663e+01 2.914e+01 2.519e+03, threshold=5.327e+01, percent-clipped=2.0 2024-08-19 00:22:16,273 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-19 00:22:18,214 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=7.806e+00 2024-08-19 00:22:23,350 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-19 00:22:33,771 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-19 00:22:36,360 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-19 00:22:47,824 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 28 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-19 00:23:01,817 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 15 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-19 00:23:03,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4178470.0, ans=0.1 2024-08-19 00:23:06,175 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 2250, loss[loss=0.07342, beats_loss=0.01241, ecapa_loss=0.0001298, whisper_loss=0.05971, over 13652.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01055, ecapa_loss=0.0001385, whisper_loss=0.08886, over 3824934.96 frames. ], batch size: 55, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:23:13,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4178570.0, ans=0.1 2024-08-19 00:23:19,065 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-19 00:23:22,100 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-19 00:23:27,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4178670.0, ans=0.1 2024-08-19 00:23:43,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4178770.0, ans=0.125 2024-08-19 00:23:54,409 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-19 00:24:01,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4178870.0, ans=0.1 2024-08-19 00:24:10,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4178970.0, ans=0.125 2024-08-19 00:24:18,225 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 2300, loss[loss=0.1086, beats_loss=0.01054, ecapa_loss=0.0001348, whisper_loss=0.0967, over 16526.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0105, ecapa_loss=0.0001394, whisper_loss=0.08985, over 3850893.67 frames. ], batch size: 67, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:24:23,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4179070.0, ans=0.0 2024-08-19 00:24:24,290 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-19 00:24:32,944 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-19 00:24:38,206 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.772e+01 2.287e+01 2.550e+01 2.826e+01 7.255e+01, threshold=5.099e+01, percent-clipped=2.0 2024-08-19 00:24:43,510 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 24 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-19 00:24:48,237 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-19 00:24:55,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4179270.0, ans=0.125 2024-08-19 00:25:04,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4179370.0, ans=10.0 2024-08-19 00:25:05,266 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-19 00:25:07,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4179370.0, ans=0.0 2024-08-19 00:25:20,636 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-19 00:25:22,144 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 18 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-19 00:25:28,987 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 2350, loss[loss=0.1027, beats_loss=0.01241, ecapa_loss=9.602e-05, whisper_loss=0.08936, over 18619.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01046, ecapa_loss=0.0001403, whisper_loss=0.09009, over 3836115.79 frames. ], batch size: 71, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:25:30,882 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 18 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-19 00:25:32,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4179570.0, ans=0.125 2024-08-19 00:25:40,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4179570.0, ans=0.125 2024-08-19 00:25:40,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4179570.0, ans=0.1 2024-08-19 00:26:11,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4179870.0, ans=0.0 2024-08-19 00:26:12,814 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 19 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-19 00:26:24,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4179970.0, ans=0.09899494936611666 2024-08-19 00:26:39,341 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 2400, loss[loss=0.1145, beats_loss=0.01045, ecapa_loss=0.0001193, whisper_loss=0.1029, over 20251.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01042, ecapa_loss=0.0001413, whisper_loss=0.09059, over 3851104.15 frames. ], batch size: 79, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:26:39,494 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-19 00:26:53,274 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.16 vs. limit=15.0 2024-08-19 00:26:59,948 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.410e+01 2.624e+01 2.909e+01 9.467e+01, threshold=5.248e+01, percent-clipped=3.0 2024-08-19 00:27:04,959 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.10 vs. limit=15.0 2024-08-19 00:27:19,047 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.18 vs. limit=15.0 2024-08-19 00:27:25,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4180370.0, ans=0.0 2024-08-19 00:27:29,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4180370.0, ans=0.125 2024-08-19 00:27:38,716 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 19 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-19 00:27:44,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4180470.0, ans=0.0 2024-08-19 00:27:49,917 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 2450, loss[loss=0.1051, beats_loss=0.009479, ecapa_loss=0.0001452, whisper_loss=0.0942, over 21132.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01037, ecapa_loss=0.0001411, whisper_loss=0.09017, over 3856871.42 frames. ], batch size: 88, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:27:52,917 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 16 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-19 00:27:56,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4180570.0, ans=0.125 2024-08-19 00:28:15,762 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 17 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-19 00:28:17,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4180770.0, ans=0.125 2024-08-19 00:28:31,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4180870.0, ans=0.125 2024-08-19 00:28:59,542 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 2500, loss[loss=0.1137, beats_loss=0.009236, ecapa_loss=0.0001134, whisper_loss=0.1034, over 24012.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01034, ecapa_loss=0.000141, whisper_loss=0.09006, over 3860746.64 frames. ], batch size: 90, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:29:08,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4181070.0, ans=0.0 2024-08-19 00:29:18,114 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.844e+01 2.325e+01 2.522e+01 2.873e+01 3.781e+01, threshold=5.044e+01, percent-clipped=0.0 2024-08-19 00:29:31,169 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.33 vs. limit=22.5 2024-08-19 00:30:03,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4181470.0, ans=0.125 2024-08-19 00:30:07,473 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 2550, loss[loss=0.1106, beats_loss=0.008931, ecapa_loss=0.0001346, whisper_loss=0.1004, over 22720.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01047, ecapa_loss=0.0001413, whisper_loss=0.08997, over 3858316.46 frames. ], batch size: 86, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:30:08,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4181570.0, ans=0.2 2024-08-19 00:30:10,063 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 36 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-19 00:30:15,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4181570.0, ans=0.125 2024-08-19 00:30:16,893 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 14 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-19 00:30:40,083 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2024-08-19 00:30:41,263 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-19 00:30:45,165 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-19 00:30:51,966 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 19 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-19 00:30:53,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4181870.0, ans=0.1 2024-08-19 00:31:00,950 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-19 00:31:14,257 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 2600, loss[loss=0.08833, beats_loss=0.009921, ecapa_loss=0.0001741, whisper_loss=0.07667, over 13255.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01055, ecapa_loss=0.0001418, whisper_loss=0.08905, over 3855319.12 frames. ], batch size: 56, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:31:14,764 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.771e+00 2024-08-19 00:31:32,363 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.892e+01 2.363e+01 2.681e+01 3.007e+01 2.480e+02, threshold=5.362e+01, percent-clipped=3.0 2024-08-19 00:31:52,164 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.37 vs. limit=12.0 2024-08-19 00:31:53,709 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-08-19 00:31:55,558 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-19 00:32:00,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4182370.0, ans=0.1 2024-08-19 00:32:01,727 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 14 from LS+wenet, 28 from Vox, 16 fro AS 2024-08-19 00:32:14,730 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 26 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-19 00:32:18,422 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 2650, loss[loss=0.1069, beats_loss=0.01082, ecapa_loss=0.0001775, whisper_loss=0.09435, over 16065.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01049, ecapa_loss=0.0001415, whisper_loss=0.08991, over 3856012.19 frames. ], batch size: 65, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:32:28,877 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.947e+01 2024-08-19 00:33:05,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4182870.0, ans=0.125 2024-08-19 00:33:12,107 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.58 vs. limit=15.0 2024-08-19 00:33:16,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4182970.0, ans=0.0 2024-08-19 00:33:16,940 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 26 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-19 00:33:21,986 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 2700, loss[loss=0.08578, beats_loss=0.01131, ecapa_loss=0.0001003, whisper_loss=0.07346, over 21176.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01051, ecapa_loss=0.0001402, whisper_loss=0.08995, over 3874529.50 frames. ], batch size: 78, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:33:28,899 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.10 vs. limit=15.0 2024-08-19 00:33:32,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4183070.0, ans=0.1 2024-08-19 00:33:35,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4183170.0, ans=0.2 2024-08-19 00:33:39,785 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.289e+01 2.477e+01 2.694e+01 4.905e+01, threshold=4.954e+01, percent-clipped=0.0 2024-08-19 00:33:48,865 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 30 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-19 00:33:49,982 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-19 00:33:53,832 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 20 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-19 00:33:55,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4183270.0, ans=0.125 2024-08-19 00:34:12,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4183470.0, ans=0.125 2024-08-19 00:34:13,821 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.09 vs. limit=15.0 2024-08-19 00:34:21,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4183470.0, ans=0.125 2024-08-19 00:34:25,794 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 2750, loss[loss=0.08306, beats_loss=0.0129, ecapa_loss=0.000119, whisper_loss=0.06897, over 17872.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0105, ecapa_loss=0.0001406, whisper_loss=0.08939, over 3845986.95 frames. ], batch size: 68, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:34:33,782 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.123e+01 2024-08-19 00:34:37,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4183670.0, ans=0.04949747468305833 2024-08-19 00:35:01,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4183770.0, ans=0.125 2024-08-19 00:35:02,677 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 34 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-19 00:35:21,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4183970.0, ans=0.05 2024-08-19 00:35:29,672 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 2800, loss[loss=0.1173, beats_loss=0.009294, ecapa_loss=0.0001566, whisper_loss=0.1064, over 18374.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01052, ecapa_loss=0.0001407, whisper_loss=0.08942, over 3866026.58 frames. ], batch size: 74, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:35:38,989 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.86 vs. limit=15.0 2024-08-19 00:35:47,528 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.992e+01 2.309e+01 2.538e+01 2.850e+01 3.934e+01, threshold=5.076e+01, percent-clipped=0.0 2024-08-19 00:36:23,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4184470.0, ans=0.1 2024-08-19 00:36:33,824 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 2850, loss[loss=0.08968, beats_loss=0.01228, ecapa_loss=0.0001405, whisper_loss=0.07599, over 17426.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01057, ecapa_loss=0.0001407, whisper_loss=0.08886, over 3840787.73 frames. ], batch size: 71, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:36:37,747 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 28 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-19 00:36:38,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4184570.0, ans=0.125 2024-08-19 00:36:42,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4184570.0, ans=0.125 2024-08-19 00:36:44,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4184570.0, ans=0.125 2024-08-19 00:36:50,808 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-19 00:36:52,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4184670.0, ans=0.2 2024-08-19 00:37:16,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4184870.0, ans=0.0 2024-08-19 00:37:18,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4184870.0, ans=0.0 2024-08-19 00:37:22,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4184870.0, ans=0.2 2024-08-19 00:37:25,588 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-19 00:37:25,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4184970.0, ans=0.125 2024-08-19 00:37:37,745 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 2900, loss[loss=0.1229, beats_loss=0.009491, ecapa_loss=0.0001535, whisper_loss=0.1119, over 22240.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01053, ecapa_loss=0.0001416, whisper_loss=0.08919, over 3859626.62 frames. ], batch size: 90, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:37:39,885 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.31 vs. limit=15.0 2024-08-19 00:37:41,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4185070.0, ans=0.125 2024-08-19 00:37:41,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4185070.0, ans=0.125 2024-08-19 00:37:46,926 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 29 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-19 00:37:56,755 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.844e+01 2.390e+01 2.645e+01 3.019e+01 5.767e+01, threshold=5.291e+01, percent-clipped=1.0 2024-08-19 00:38:01,466 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 28 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-19 00:38:08,845 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.49 vs. limit=15.0 2024-08-19 00:38:09,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4185270.0, ans=0.125 2024-08-19 00:38:11,641 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.13 vs. limit=15.0 2024-08-19 00:38:17,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4185370.0, ans=0.125 2024-08-19 00:38:33,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4185470.0, ans=0.1 2024-08-19 00:38:41,206 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 2950, loss[loss=0.1055, beats_loss=0.01047, ecapa_loss=0.0001357, whisper_loss=0.09366, over 23373.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01056, ecapa_loss=0.0001421, whisper_loss=0.08903, over 3872153.37 frames. ], batch size: 93, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:38:46,668 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 23 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-19 00:39:00,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4185670.0, ans=0.125 2024-08-19 00:39:04,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4185670.0, ans=0.125 2024-08-19 00:39:08,069 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-19 00:39:09,756 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.33 vs. limit=15.0 2024-08-19 00:39:18,947 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 23 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-19 00:39:39,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4185970.0, ans=0.0 2024-08-19 00:39:44,457 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 3000, loss[loss=0.104, beats_loss=0.008607, ecapa_loss=0.0001522, whisper_loss=0.09387, over 17747.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01055, ecapa_loss=0.0001408, whisper_loss=0.08893, over 3894834.26 frames. ], batch size: 70, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:39:44,458 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-19 00:40:22,158 INFO [train_multi_KD3.py:1149] (0/4) Epoch 29, validation on ASR_libri: loss=0.2528, beats_loss=0, ecapa_loss=0.0005176, whisper_loss=0.2476, over 922467.00 frames. 2024-08-19 00:40:37,546 INFO [train_multi_KD3.py:1149] (0/4) Epoch 29, validation on SV_voxceleb1: loss=0.004065, beats_loss=0, ecapa_loss=0.0004065, whisper_loss=0, over 939242.00 frames. 2024-08-19 00:42:25,731 INFO [train_multi_KD3.py:1149] (0/4) Epoch 29, validation on AT_audioset: loss=0.02301, beats_loss=0.02301, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 00:42:25,735 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32389MB 2024-08-19 00:42:29,447 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 17 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-19 00:42:42,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4186170.0, ans=0.0 2024-08-19 00:42:44,915 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.286e+01 2.543e+01 2.791e+01 3.821e+01, threshold=5.086e+01, percent-clipped=0.0 2024-08-19 00:42:45,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4186170.0, ans=0.2 2024-08-19 00:42:49,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4186170.0, ans=0.0 2024-08-19 00:42:49,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4186170.0, ans=0.2 2024-08-19 00:42:53,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4186270.0, ans=0.0 2024-08-19 00:43:10,582 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-19 00:43:13,388 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-19 00:43:13,733 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.74 vs. limit=12.0 2024-08-19 00:43:19,395 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 19 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-19 00:43:29,451 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 3050, loss[loss=0.09826, beats_loss=0.01093, ecapa_loss=0.0001703, whisper_loss=0.08563, over 22074.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01051, ecapa_loss=0.0001426, whisper_loss=0.09004, over 3913662.21 frames. ], batch size: 91, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:43:40,465 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.00 vs. limit=15.0 2024-08-19 00:44:00,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4186770.0, ans=0.0 2024-08-19 00:44:07,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4186870.0, ans=0.125 2024-08-19 00:44:09,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4186870.0, ans=0.2 2024-08-19 00:44:13,273 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.82 vs. limit=15.0 2024-08-19 00:44:26,572 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-19 00:44:30,883 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4186970.0, ans=0.125 2024-08-19 00:44:32,910 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 3100, loss[loss=0.1011, beats_loss=0.008996, ecapa_loss=0.0001542, whisper_loss=0.09058, over 16944.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0105, ecapa_loss=0.0001443, whisper_loss=0.0899, over 3887919.65 frames. ], batch size: 67, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:44:33,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4187070.0, ans=0.0 2024-08-19 00:44:33,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4187070.0, ans=0.0 2024-08-19 00:44:38,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4187070.0, ans=0.125 2024-08-19 00:44:47,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4187170.0, ans=0.1 2024-08-19 00:44:52,007 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.319e+01 2.529e+01 2.804e+01 4.634e+01, threshold=5.057e+01, percent-clipped=0.0 2024-08-19 00:44:53,412 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-19 00:45:02,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4187270.0, ans=0.0 2024-08-19 00:45:21,682 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-19 00:45:23,012 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 31 from LS+wenet, 30 from Vox, 26 fro AS 2024-08-19 00:45:36,805 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 3150, loss[loss=0.1235, beats_loss=0.008806, ecapa_loss=0.000127, whisper_loss=0.1134, over 20200.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0105, ecapa_loss=0.0001454, whisper_loss=0.08966, over 3853346.00 frames. ], batch size: 76, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:45:38,446 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 27 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-19 00:45:38,646 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 00:45:47,356 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.22 vs. limit=15.0 2024-08-19 00:45:55,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4187670.0, ans=0.2 2024-08-19 00:45:58,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=4187670.0, ans=0.05 2024-08-19 00:46:00,795 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.14 vs. limit=15.0 2024-08-19 00:46:13,022 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 37 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-19 00:46:31,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4187970.0, ans=0.025 2024-08-19 00:46:37,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4187970.0, ans=0.125 2024-08-19 00:46:40,752 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 3200, loss[loss=0.1199, beats_loss=0.008637, ecapa_loss=0.0001429, whisper_loss=0.1098, over 20416.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01051, ecapa_loss=0.000144, whisper_loss=0.09021, over 3843659.27 frames. ], batch size: 78, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:46:40,898 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-19 00:46:42,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4188070.0, ans=0.1 2024-08-19 00:46:45,887 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 22 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-19 00:46:59,735 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.318e+01 2.522e+01 2.852e+01 4.136e+01, threshold=5.044e+01, percent-clipped=0.0 2024-08-19 00:47:04,165 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.66 vs. limit=15.0 2024-08-19 00:47:38,986 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-19 00:47:39,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4188470.0, ans=0.125 2024-08-19 00:47:39,246 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 00:47:39,648 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.45 vs. limit=15.0 2024-08-19 00:47:43,694 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 3250, loss[loss=0.1088, beats_loss=0.01006, ecapa_loss=0.0001282, whisper_loss=0.09742, over 18449.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01049, ecapa_loss=0.000145, whisper_loss=0.08996, over 3849590.41 frames. ], batch size: 73, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:47:45,783 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.28 vs. limit=15.0 2024-08-19 00:48:03,478 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.65 vs. limit=15.0 2024-08-19 00:48:04,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4188670.0, ans=0.0 2024-08-19 00:48:04,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4188670.0, ans=0.0 2024-08-19 00:48:09,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4188770.0, ans=0.125 2024-08-19 00:48:35,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4188970.0, ans=0.1 2024-08-19 00:48:46,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4189070.0, ans=0.0 2024-08-19 00:48:47,429 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 3300, loss[loss=0.1052, beats_loss=0.01024, ecapa_loss=0.0001507, whisper_loss=0.09342, over 23064.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01042, ecapa_loss=0.0001466, whisper_loss=0.09031, over 3856773.32 frames. ], batch size: 95, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:48:48,212 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.97 vs. limit=10.0 2024-08-19 00:49:02,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4189170.0, ans=0.125 2024-08-19 00:49:03,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4189170.0, ans=0.125 2024-08-19 00:49:05,170 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 17 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-19 00:49:06,247 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.329e+01 2.531e+01 2.845e+01 1.091e+02, threshold=5.061e+01, percent-clipped=2.0 2024-08-19 00:49:06,439 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 25 from LS+wenet, 13 from Vox, 47 fro AS 2024-08-19 00:49:06,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4189170.0, ans=0.1 2024-08-19 00:49:24,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4189370.0, ans=0.125 2024-08-19 00:49:33,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4189370.0, ans=0.125 2024-08-19 00:49:34,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4189370.0, ans=0.125 2024-08-19 00:49:43,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4189470.0, ans=0.0 2024-08-19 00:49:50,411 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 3350, loss[loss=0.1087, beats_loss=0.009358, ecapa_loss=0.0001619, whisper_loss=0.09773, over 16147.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01054, ecapa_loss=0.0001447, whisper_loss=0.08978, over 3864477.91 frames. ], batch size: 66, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:49:58,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4189570.0, ans=0.0 2024-08-19 00:50:01,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4189570.0, ans=0.125 2024-08-19 00:50:12,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4189670.0, ans=0.125 2024-08-19 00:50:13,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4189670.0, ans=0.07 2024-08-19 00:50:15,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4189770.0, ans=0.0 2024-08-19 00:50:15,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4189770.0, ans=0.2 2024-08-19 00:50:24,316 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-19 00:50:28,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4189870.0, ans=0.125 2024-08-19 00:50:28,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4189870.0, ans=0.125 2024-08-19 00:50:39,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4189870.0, ans=0.09899494936611666 2024-08-19 00:50:43,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=4189970.0, ans=10.0 2024-08-19 00:50:43,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4189970.0, ans=0.2 2024-08-19 00:50:44,875 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-19 00:50:54,748 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 3400, loss[loss=0.1203, beats_loss=0.009546, ecapa_loss=0.0001456, whisper_loss=0.1093, over 17978.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01051, ecapa_loss=0.0001434, whisper_loss=0.09029, over 3835760.74 frames. ], batch size: 70, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:51:02,013 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.21 vs. limit=15.0 2024-08-19 00:51:03,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4190070.0, ans=0.125 2024-08-19 00:51:04,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4190070.0, ans=0.0 2024-08-19 00:51:05,198 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 22 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-19 00:51:13,513 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.286e+01 2.558e+01 2.999e+01 2.108e+02, threshold=5.116e+01, percent-clipped=4.0 2024-08-19 00:51:13,683 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-19 00:51:21,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4190270.0, ans=0.125 2024-08-19 00:51:30,640 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 12 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-19 00:51:34,462 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 29 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-19 00:51:37,813 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.27 vs. limit=15.0 2024-08-19 00:51:40,315 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.10 vs. limit=10.0 2024-08-19 00:51:42,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4190370.0, ans=0.1 2024-08-19 00:51:50,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4190470.0, ans=0.2 2024-08-19 00:51:53,194 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.96 vs. limit=12.0 2024-08-19 00:51:55,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4190470.0, ans=0.0 2024-08-19 00:51:59,240 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 3450, loss[loss=0.1157, beats_loss=0.01063, ecapa_loss=0.0001233, whisper_loss=0.1038, over 23161.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01049, ecapa_loss=0.0001451, whisper_loss=0.09039, over 3869927.90 frames. ], batch size: 91, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:52:01,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4190570.0, ans=0.2 2024-08-19 00:52:19,494 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-19 00:52:29,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4190770.0, ans=0.125 2024-08-19 00:52:31,534 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.44 vs. limit=22.5 2024-08-19 00:53:03,032 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 3500, loss[loss=0.1076, beats_loss=0.01022, ecapa_loss=0.0001632, whisper_loss=0.09578, over 19645.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01056, ecapa_loss=0.0001443, whisper_loss=0.08966, over 3858778.12 frames. ], batch size: 79, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:53:03,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4191070.0, ans=0.0 2024-08-19 00:53:06,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4191070.0, ans=0.125 2024-08-19 00:53:17,518 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 17 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-19 00:53:19,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4191170.0, ans=0.0 2024-08-19 00:53:22,265 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.660e+01 2.230e+01 2.489e+01 2.768e+01 5.626e+01, threshold=4.978e+01, percent-clipped=1.0 2024-08-19 00:53:27,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4191270.0, ans=0.125 2024-08-19 00:53:36,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=4191270.0, ans=0.95 2024-08-19 00:53:36,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=4191270.0, ans=0.2 2024-08-19 00:53:38,171 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.39 vs. limit=15.0 2024-08-19 00:53:41,187 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 18 from LS+wenet, 32 from Vox, 29 fro AS 2024-08-19 00:53:49,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4191370.0, ans=0.125 2024-08-19 00:53:49,586 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.96 vs. limit=15.0 2024-08-19 00:53:52,882 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 29 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-19 00:53:54,104 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 28 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-19 00:53:59,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4191470.0, ans=0.04949747468305833 2024-08-19 00:54:06,688 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 3550, loss[loss=0.1127, beats_loss=0.01024, ecapa_loss=0.0001429, whisper_loss=0.1011, over 22445.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01058, ecapa_loss=0.0001442, whisper_loss=0.08928, over 3869823.62 frames. ], batch size: 89, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:54:10,521 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-19 00:54:23,668 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-19 00:54:26,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4191670.0, ans=0.0 2024-08-19 00:54:27,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4191670.0, ans=0.1 2024-08-19 00:54:37,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4191770.0, ans=0.0 2024-08-19 00:54:54,190 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-19 00:55:10,339 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 3600, loss[loss=0.1182, beats_loss=0.009197, ecapa_loss=0.0001586, whisper_loss=0.1074, over 17412.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01058, ecapa_loss=0.0001436, whisper_loss=0.08964, over 3836866.22 frames. ], batch size: 69, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:55:28,612 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 42 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-19 00:55:29,600 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.764e+01 2.283e+01 2.473e+01 2.802e+01 1.020e+02, threshold=4.947e+01, percent-clipped=3.0 2024-08-19 00:55:36,348 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 18 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-19 00:55:48,892 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 31 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-19 00:56:04,565 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 20 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-19 00:56:04,870 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.78 vs. limit=12.0 2024-08-19 00:56:07,845 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.56 vs. limit=6.0 2024-08-19 00:56:14,940 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 3650, loss[loss=0.1013, beats_loss=0.01155, ecapa_loss=0.0001461, whisper_loss=0.08826, over 20855.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01056, ecapa_loss=0.0001435, whisper_loss=0.08999, over 3830587.17 frames. ], batch size: 83, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:56:19,868 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 24 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-19 00:56:42,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4192770.0, ans=0.0 2024-08-19 00:56:43,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4192770.0, ans=0.2 2024-08-19 00:56:44,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4192770.0, ans=0.1 2024-08-19 00:56:53,483 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-19 00:57:02,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4192870.0, ans=0.5 2024-08-19 00:57:07,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4192970.0, ans=0.125 2024-08-19 00:57:14,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4192970.0, ans=0.125 2024-08-19 00:57:16,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4192970.0, ans=0.1 2024-08-19 00:57:18,974 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 3700, loss[loss=0.1197, beats_loss=0.008153, ecapa_loss=0.0001383, whisper_loss=0.1101, over 22336.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01048, ecapa_loss=0.0001432, whisper_loss=0.09006, over 3855334.21 frames. ], batch size: 86, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:57:22,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4193070.0, ans=0.125 2024-08-19 00:57:36,961 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-19 00:57:38,288 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.771e+01 2.353e+01 2.591e+01 2.874e+01 4.771e+02, threshold=5.181e+01, percent-clipped=2.0 2024-08-19 00:57:43,458 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-19 00:57:49,513 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 27 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-19 00:57:49,843 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.653e+05 2024-08-19 00:57:59,211 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.81 vs. limit=22.5 2024-08-19 00:58:03,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4193370.0, ans=0.0 2024-08-19 00:58:03,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4193370.0, ans=0.95 2024-08-19 00:58:14,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4193470.0, ans=0.1 2024-08-19 00:58:15,570 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4193470.0, ans=0.125 2024-08-19 00:58:16,580 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-19 00:58:22,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4193570.0, ans=0.0 2024-08-19 00:58:22,868 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 3750, loss[loss=0.1031, beats_loss=0.009982, ecapa_loss=0.0001409, whisper_loss=0.0917, over 22238.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01049, ecapa_loss=0.0001429, whisper_loss=0.09008, over 3877032.12 frames. ], batch size: 90, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:58:24,213 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-19 00:58:29,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4193570.0, ans=0.0 2024-08-19 00:58:45,297 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.49 vs. limit=15.0 2024-08-19 00:58:48,827 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.84 vs. limit=12.0 2024-08-19 00:59:05,114 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4193870.0, ans=0.2 2024-08-19 00:59:06,203 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 17 from LS+wenet, 11 from Vox, 36 fro AS 2024-08-19 00:59:07,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4193870.0, ans=0.125 2024-08-19 00:59:25,163 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.99 vs. limit=22.5 2024-08-19 00:59:28,407 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 3800, loss[loss=0.06807, beats_loss=0.0124, ecapa_loss=0.0001748, whisper_loss=0.05392, over 13858.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01045, ecapa_loss=0.0001436, whisper_loss=0.09045, over 3881717.03 frames. ], batch size: 58, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:59:34,151 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-19 00:59:42,305 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-19 00:59:48,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4194170.0, ans=0.0 2024-08-19 00:59:48,903 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.244e+01 2.516e+01 2.813e+01 3.693e+01, threshold=5.032e+01, percent-clipped=0.0 2024-08-19 00:59:49,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4194170.0, ans=0.2 2024-08-19 00:59:50,290 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 24 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-19 01:00:02,264 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 21 from LS+wenet, 24 from Vox, 47 fro AS 2024-08-19 01:00:20,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4194370.0, ans=0.125 2024-08-19 01:00:21,692 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.54 vs. limit=15.0 2024-08-19 01:00:35,732 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 3850, loss[loss=0.09835, beats_loss=0.0108, ecapa_loss=0.0001416, whisper_loss=0.08614, over 15841.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01047, ecapa_loss=0.000144, whisper_loss=0.09082, over 3903964.98 frames. ], batch size: 65, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:00:38,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4194570.0, ans=0.125 2024-08-19 01:00:45,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4194570.0, ans=0.125 2024-08-19 01:00:46,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4194570.0, ans=0.0 2024-08-19 01:00:56,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4194670.0, ans=0.125 2024-08-19 01:01:04,283 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.01 vs. limit=22.5 2024-08-19 01:01:14,188 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 18 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-19 01:01:14,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4194770.0, ans=0.125 2024-08-19 01:01:17,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4194870.0, ans=0.2 2024-08-19 01:01:18,184 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4194870.0, ans=0.125 2024-08-19 01:01:18,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=4194870.0, ans=10.0 2024-08-19 01:01:32,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4194970.0, ans=0.0 2024-08-19 01:01:33,354 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.290e+05 2024-08-19 01:01:35,349 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.89 vs. limit=15.0 2024-08-19 01:01:39,920 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 24 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-19 01:01:43,348 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.75 vs. limit=15.0 2024-08-19 01:01:43,817 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 3900, loss[loss=0.1166, beats_loss=0.00941, ecapa_loss=0.0001742, whisper_loss=0.1055, over 21270.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01049, ecapa_loss=0.0001451, whisper_loss=0.09022, over 3891910.11 frames. ], batch size: 88, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:02:03,759 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.347e+01 2.563e+01 2.946e+01 1.381e+02, threshold=5.126e+01, percent-clipped=1.0 2024-08-19 01:02:04,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4195170.0, ans=0.125 2024-08-19 01:02:12,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4195270.0, ans=0.95 2024-08-19 01:02:16,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4195270.0, ans=0.125 2024-08-19 01:02:40,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4195470.0, ans=0.125 2024-08-19 01:02:50,758 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 3950, loss[loss=0.1241, beats_loss=0.009086, ecapa_loss=0.0001447, whisper_loss=0.1136, over 21224.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01034, ecapa_loss=0.0001466, whisper_loss=0.09121, over 3913089.14 frames. ], batch size: 85, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:03:01,960 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-19 01:03:06,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4195670.0, ans=0.0 2024-08-19 01:03:14,184 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4195670.0, ans=0.125 2024-08-19 01:03:34,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4195870.0, ans=0.0 2024-08-19 01:03:38,010 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-19 01:03:39,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4195870.0, ans=0.2 2024-08-19 01:03:41,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4195870.0, ans=0.1 2024-08-19 01:03:55,331 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-19 01:03:57,125 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.76 vs. limit=15.0 2024-08-19 01:03:59,389 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 4000, loss[loss=0.1007, beats_loss=0.01204, ecapa_loss=0.0001138, whisper_loss=0.08748, over 13674.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01036, ecapa_loss=0.0001457, whisper_loss=0.09075, over 3900452.79 frames. ], batch size: 53, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:03:59,529 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-19 01:04:02,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4196070.0, ans=0.125 2024-08-19 01:04:07,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4196070.0, ans=0.0 2024-08-19 01:04:18,739 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.259e+01 2.561e+01 2.838e+01 1.741e+02, threshold=5.122e+01, percent-clipped=1.0 2024-08-19 01:04:26,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4196270.0, ans=0.2 2024-08-19 01:04:40,240 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 21 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-19 01:04:44,903 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.31 vs. limit=15.0 2024-08-19 01:04:51,578 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.37 vs. limit=22.5 2024-08-19 01:04:56,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4196470.0, ans=0.125 2024-08-19 01:05:06,393 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 4050, loss[loss=0.09899, beats_loss=0.01276, ecapa_loss=0.0001379, whisper_loss=0.08485, over 18086.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01037, ecapa_loss=0.000146, whisper_loss=0.09068, over 3930000.22 frames. ], batch size: 75, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:05:14,791 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 19 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-19 01:05:15,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4196570.0, ans=10.0 2024-08-19 01:05:25,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4196670.0, ans=0.0 2024-08-19 01:05:27,000 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 28 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-19 01:05:33,535 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 28 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-19 01:05:36,253 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-19 01:05:53,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4196870.0, ans=0.0 2024-08-19 01:06:06,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4196970.0, ans=0.09899494936611666 2024-08-19 01:06:09,316 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.89 vs. limit=15.0 2024-08-19 01:06:14,415 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 4100, loss[loss=0.1132, beats_loss=0.01273, ecapa_loss=0.0001075, whisper_loss=0.09937, over 23681.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01049, ecapa_loss=0.0001458, whisper_loss=0.08995, over 3917855.79 frames. ], batch size: 91, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:06:27,003 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-19 01:06:34,694 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.750e+01 2.255e+01 2.514e+01 2.841e+01 8.921e+01, threshold=5.028e+01, percent-clipped=1.0 2024-08-19 01:06:34,847 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-19 01:06:37,049 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.03 vs. limit=15.0 2024-08-19 01:06:39,136 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-19 01:06:59,571 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-19 01:07:08,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4197470.0, ans=0.125 2024-08-19 01:07:11,387 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 14 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-19 01:07:24,215 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 4150, loss[loss=0.1247, beats_loss=0.008887, ecapa_loss=0.0001437, whisper_loss=0.1144, over 20140.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01055, ecapa_loss=0.0001461, whisper_loss=0.08966, over 3875365.68 frames. ], batch size: 77, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:07:44,399 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-19 01:07:48,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4197670.0, ans=0.04949747468305833 2024-08-19 01:08:13,357 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4197870.0, ans=0.125 2024-08-19 01:08:17,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4197870.0, ans=0.125 2024-08-19 01:08:21,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4197970.0, ans=0.125 2024-08-19 01:08:32,661 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 4200, loss[loss=0.09087, beats_loss=0.01112, ecapa_loss=0.0001543, whisper_loss=0.07821, over 17257.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01049, ecapa_loss=0.0001452, whisper_loss=0.08992, over 3865307.96 frames. ], batch size: 69, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:08:36,287 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 24 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-19 01:08:41,857 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 31 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-19 01:08:46,989 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 18 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-19 01:08:47,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4198170.0, ans=0.0 2024-08-19 01:08:49,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4198170.0, ans=0.0 2024-08-19 01:08:52,133 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.638e+01 2.233e+01 2.469e+01 2.829e+01 1.799e+02, threshold=4.938e+01, percent-clipped=1.0 2024-08-19 01:08:55,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4198170.0, ans=0.1 2024-08-19 01:09:15,828 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 21 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-19 01:09:19,717 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-19 01:09:38,391 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 4250, loss[loss=0.09708, beats_loss=0.008723, ecapa_loss=0.000136, whisper_loss=0.08699, over 19041.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01051, ecapa_loss=0.0001444, whisper_loss=0.0898, over 3878738.36 frames. ], batch size: 76, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:09:44,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4198570.0, ans=0.0 2024-08-19 01:09:44,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4198570.0, ans=0.0 2024-08-19 01:09:55,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4198670.0, ans=0.125 2024-08-19 01:10:07,831 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.61 vs. limit=15.0 2024-08-19 01:10:10,180 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.26 vs. limit=15.0 2024-08-19 01:10:20,612 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 24 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-19 01:10:25,848 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 33 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-19 01:10:38,934 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 30 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-19 01:10:43,756 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 4300, loss[loss=0.09165, beats_loss=0.0116, ecapa_loss=0.0001446, whisper_loss=0.0786, over 22105.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0105, ecapa_loss=0.0001444, whisper_loss=0.08939, over 3899691.22 frames. ], batch size: 89, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:10:46,980 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 24 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-19 01:10:48,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4199070.0, ans=0.125 2024-08-19 01:10:52,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4199070.0, ans=0.0 2024-08-19 01:10:53,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4199070.0, ans=0.0 2024-08-19 01:10:57,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4199170.0, ans=0.125 2024-08-19 01:10:57,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4199170.0, ans=0.125 2024-08-19 01:10:58,498 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 18 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-19 01:11:03,591 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.233e+01 2.491e+01 2.683e+01 4.196e+01, threshold=4.981e+01, percent-clipped=0.0 2024-08-19 01:11:06,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4199170.0, ans=0.0 2024-08-19 01:11:08,968 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4199270.0, ans=0.1 2024-08-19 01:11:09,370 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.42 vs. limit=15.0 2024-08-19 01:11:16,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4199270.0, ans=0.125 2024-08-19 01:11:18,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4199270.0, ans=0.125 2024-08-19 01:11:19,407 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-19 01:11:21,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4199370.0, ans=0.2 2024-08-19 01:11:43,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4199470.0, ans=0.125 2024-08-19 01:11:48,953 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 4350, loss[loss=0.0917, beats_loss=0.01211, ecapa_loss=9.824e-05, whisper_loss=0.07861, over 21834.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01048, ecapa_loss=0.0001439, whisper_loss=0.08887, over 3889446.58 frames. ], batch size: 84, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:11:50,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4199570.0, ans=0.0 2024-08-19 01:11:56,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4199570.0, ans=0.0 2024-08-19 01:11:59,732 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-19 01:12:10,976 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 13 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-19 01:12:14,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4199770.0, ans=0.0 2024-08-19 01:12:23,970 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.40 vs. limit=12.0 2024-08-19 01:12:25,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4199770.0, ans=0.1 2024-08-19 01:12:26,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4199770.0, ans=0.125 2024-08-19 01:12:27,370 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 15 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-19 01:12:45,344 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-420000.pt 2024-08-19 01:12:49,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4199970.0, ans=0.0 2024-08-19 01:12:57,532 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 4400, loss[loss=0.09747, beats_loss=0.008704, ecapa_loss=0.0001405, whisper_loss=0.08736, over 21973.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01052, ecapa_loss=0.0001433, whisper_loss=0.08926, over 3899511.30 frames. ], batch size: 88, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:13:10,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4200170.0, ans=0.125 2024-08-19 01:13:18,297 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.359e+01 2.700e+01 2.927e+01 4.297e+01, threshold=5.400e+01, percent-clipped=0.0 2024-08-19 01:13:21,376 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-19 01:13:22,867 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 25 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-19 01:13:57,766 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-19 01:14:05,794 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 4450, loss[loss=0.1098, beats_loss=0.01097, ecapa_loss=0.0001313, whisper_loss=0.09747, over 22916.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01055, ecapa_loss=0.0001427, whisper_loss=0.08928, over 3907316.56 frames. ], batch size: 87, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:14:16,049 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 20 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-19 01:14:17,235 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 19 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-19 01:14:25,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4200670.0, ans=0.0 2024-08-19 01:14:36,292 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 37 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-19 01:14:42,214 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.50 vs. limit=6.0 2024-08-19 01:14:48,056 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 19 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-19 01:15:11,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4200970.0, ans=0.125 2024-08-19 01:15:12,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4200970.0, ans=0.125 2024-08-19 01:15:13,943 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 4500, loss[loss=0.1119, beats_loss=0.01104, ecapa_loss=0.0001187, whisper_loss=0.09969, over 23296.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01055, ecapa_loss=0.0001414, whisper_loss=0.08916, over 3914764.06 frames. ], batch size: 92, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:15:16,105 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.81 vs. limit=22.5 2024-08-19 01:15:32,184 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.76 vs. limit=22.5 2024-08-19 01:15:34,075 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.278e+01 2.479e+01 2.905e+01 4.149e+01, threshold=4.957e+01, percent-clipped=0.0 2024-08-19 01:15:38,210 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-19 01:15:49,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4201270.0, ans=0.0 2024-08-19 01:15:54,645 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-19 01:15:56,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4201370.0, ans=0.125 2024-08-19 01:15:58,191 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.04 vs. limit=15.0 2024-08-19 01:16:06,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4201370.0, ans=0.125 2024-08-19 01:16:12,257 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-19 01:16:22,398 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 4550, loss[loss=0.08387, beats_loss=0.01141, ecapa_loss=0.0001436, whisper_loss=0.07103, over 19617.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01054, ecapa_loss=0.0001409, whisper_loss=0.08918, over 3893637.10 frames. ], batch size: 80, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:16:23,746 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 18 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-19 01:16:36,632 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 38 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-19 01:16:47,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4201670.0, ans=0.125 2024-08-19 01:16:50,861 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-19 01:16:54,630 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-19 01:17:03,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4201870.0, ans=0.2 2024-08-19 01:17:12,851 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.44 vs. limit=15.0 2024-08-19 01:17:13,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4201870.0, ans=0.125 2024-08-19 01:17:16,577 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-19 01:17:18,305 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 01:17:20,761 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-19 01:17:31,565 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 4600, loss[loss=0.1113, beats_loss=0.009667, ecapa_loss=0.0001571, whisper_loss=0.1001, over 20412.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01056, ecapa_loss=0.0001412, whisper_loss=0.08897, over 3909828.20 frames. ], batch size: 81, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:17:34,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4202070.0, ans=0.0 2024-08-19 01:17:50,113 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 31 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-19 01:17:52,498 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.339e+01 2.630e+01 2.939e+01 5.094e+01, threshold=5.261e+01, percent-clipped=1.0 2024-08-19 01:18:12,167 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-19 01:18:12,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=4202370.0, ans=0.05 2024-08-19 01:18:38,976 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 21 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-19 01:18:41,881 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 4650, loss[loss=0.1032, beats_loss=0.008237, ecapa_loss=0.000145, whisper_loss=0.09348, over 14723.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01058, ecapa_loss=0.0001423, whisper_loss=0.08901, over 3897060.53 frames. ], batch size: 55, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:18:42,034 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-19 01:18:43,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4202570.0, ans=0.125 2024-08-19 01:18:44,451 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 30 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-19 01:18:49,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4202570.0, ans=0.1 2024-08-19 01:18:56,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4202670.0, ans=0.125 2024-08-19 01:19:02,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4202670.0, ans=0.1 2024-08-19 01:19:05,929 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.24 vs. limit=15.0 2024-08-19 01:19:10,686 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 29 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-19 01:19:10,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4202770.0, ans=0.1 2024-08-19 01:19:20,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4202770.0, ans=0.125 2024-08-19 01:19:33,563 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-19 01:19:34,167 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.86 vs. limit=15.0 2024-08-19 01:19:36,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4202870.0, ans=0.0 2024-08-19 01:19:43,294 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-19 01:19:53,406 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 4700, loss[loss=0.1044, beats_loss=0.009587, ecapa_loss=0.0001539, whisper_loss=0.09328, over 21440.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01052, ecapa_loss=0.0001428, whisper_loss=0.08953, over 3920491.25 frames. ], batch size: 86, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:20:13,704 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.364e+01 2.626e+01 2.952e+01 4.706e+01, threshold=5.252e+01, percent-clipped=0.0 2024-08-19 01:20:16,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4203170.0, ans=0.05 2024-08-19 01:20:26,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4203270.0, ans=0.125 2024-08-19 01:20:29,258 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-19 01:20:30,543 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 14 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-19 01:20:47,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4203470.0, ans=0.125 2024-08-19 01:21:00,343 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 33 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-19 01:21:01,429 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 4750, loss[loss=0.134, beats_loss=0.006885, ecapa_loss=0.000173, whisper_loss=0.1254, over 19691.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0105, ecapa_loss=0.0001424, whisper_loss=0.08991, over 3930758.46 frames. ], batch size: 76, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:21:08,612 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-19 01:21:09,851 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 25 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-19 01:21:14,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4203670.0, ans=0.2 2024-08-19 01:21:14,729 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.43 vs. limit=15.0 2024-08-19 01:21:16,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4203670.0, ans=0.125 2024-08-19 01:21:30,135 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 25 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-19 01:22:04,329 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 17 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-19 01:22:07,912 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.41 vs. limit=15.0 2024-08-19 01:22:08,373 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 4800, loss[loss=0.09065, beats_loss=0.009609, ecapa_loss=0.0001551, whisper_loss=0.07949, over 20463.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01051, ecapa_loss=0.0001431, whisper_loss=0.08945, over 3921508.48 frames. ], batch size: 84, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:22:08,542 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 24 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-19 01:22:27,886 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.297e+01 2.510e+01 2.780e+01 4.241e+01, threshold=5.020e+01, percent-clipped=0.0 2024-08-19 01:22:34,667 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.27 vs. limit=15.0 2024-08-19 01:22:40,908 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-19 01:22:55,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4204370.0, ans=0.1 2024-08-19 01:23:09,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4204470.0, ans=0.09899494936611666 2024-08-19 01:23:16,830 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 4850, loss[loss=0.111, beats_loss=0.009914, ecapa_loss=0.0001586, whisper_loss=0.09947, over 18246.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01058, ecapa_loss=0.0001413, whisper_loss=0.08963, over 3922495.99 frames. ], batch size: 76, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:23:57,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4204870.0, ans=0.07 2024-08-19 01:24:18,687 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.88 vs. limit=22.5 2024-08-19 01:24:19,508 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-19 01:24:20,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4204970.0, ans=0.125 2024-08-19 01:24:26,354 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 4900, loss[loss=0.103, beats_loss=0.0107, ecapa_loss=0.0001012, whisper_loss=0.09126, over 20003.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01061, ecapa_loss=0.000142, whisper_loss=0.08946, over 3900064.26 frames. ], batch size: 75, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:24:27,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4205070.0, ans=0.0 2024-08-19 01:24:49,156 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.329e+01 2.525e+01 2.915e+01 4.394e+02, threshold=5.050e+01, percent-clipped=2.0 2024-08-19 01:24:51,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4205170.0, ans=0.2 2024-08-19 01:25:01,292 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-19 01:25:03,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4205270.0, ans=0.0 2024-08-19 01:25:18,462 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-19 01:25:36,831 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 4950, loss[loss=0.097, beats_loss=0.009365, ecapa_loss=0.0001751, whisper_loss=0.08589, over 17133.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01066, ecapa_loss=0.0001432, whisper_loss=0.08925, over 3905050.50 frames. ], batch size: 72, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:25:47,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4205570.0, ans=0.035 2024-08-19 01:26:12,716 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.400e-02 2024-08-19 01:26:16,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4205870.0, ans=0.125 2024-08-19 01:26:17,459 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.91 vs. limit=15.0 2024-08-19 01:26:35,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4205970.0, ans=0.0 2024-08-19 01:26:46,252 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 5000, loss[loss=0.09612, beats_loss=0.01039, ecapa_loss=0.0001687, whisper_loss=0.08405, over 22859.00 frames. ], tot_loss[loss=0.101, beats_loss=0.0107, ecapa_loss=0.0001433, whisper_loss=0.08883, over 3880922.54 frames. ], batch size: 93, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:26:51,366 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.67 vs. limit=22.5 2024-08-19 01:26:58,633 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-19 01:26:58,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4206170.0, ans=0.125 2024-08-19 01:27:07,007 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.303e+01 2.548e+01 2.762e+01 6.852e+01, threshold=5.096e+01, percent-clipped=1.0 2024-08-19 01:27:32,666 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.58 vs. limit=15.0 2024-08-19 01:27:48,185 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 26 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-19 01:27:57,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4206470.0, ans=0.0 2024-08-19 01:27:59,234 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 5050, loss[loss=0.0868, beats_loss=0.01009, ecapa_loss=0.0001542, whisper_loss=0.07517, over 20555.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01067, ecapa_loss=0.0001442, whisper_loss=0.08904, over 3889171.12 frames. ], batch size: 87, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:27:59,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4206570.0, ans=0.0 2024-08-19 01:28:02,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4206570.0, ans=0.1 2024-08-19 01:28:17,592 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4206670.0, ans=0.2 2024-08-19 01:28:43,131 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4206870.0, ans=0.1 2024-08-19 01:29:01,584 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 21 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-19 01:29:02,427 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.67 vs. limit=6.0 2024-08-19 01:29:09,900 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.04 vs. limit=15.0 2024-08-19 01:29:12,627 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 5100, loss[loss=0.1382, beats_loss=0.006898, ecapa_loss=0.0001282, whisper_loss=0.13, over 19418.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01073, ecapa_loss=0.0001434, whisper_loss=0.08955, over 3913894.28 frames. ], batch size: 68, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:29:21,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4207070.0, ans=0.125 2024-08-19 01:29:33,758 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.980e+01 2.355e+01 2.571e+01 2.795e+01 7.505e+01, threshold=5.142e+01, percent-clipped=1.0 2024-08-19 01:29:41,961 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 01:29:45,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4207270.0, ans=0.0 2024-08-19 01:29:54,474 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-19 01:29:57,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4207370.0, ans=0.125 2024-08-19 01:30:05,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4207370.0, ans=0.125 2024-08-19 01:30:25,290 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 5150, loss[loss=0.1047, beats_loss=0.01051, ecapa_loss=0.0001452, whisper_loss=0.09269, over 16961.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01064, ecapa_loss=0.0001434, whisper_loss=0.09032, over 3915761.02 frames. ], batch size: 68, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:31:12,724 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 23 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-19 01:31:39,196 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 9 from Vox, 28 fro AS 2024-08-19 01:31:40,245 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 5200, loss[loss=0.09752, beats_loss=0.01028, ecapa_loss=8.267e-05, whisper_loss=0.08641, over 16078.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01057, ecapa_loss=0.0001424, whisper_loss=0.09014, over 3889281.25 frames. ], batch size: 56, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:32:00,706 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.849e+01 2.279e+01 2.485e+01 2.777e+01 3.905e+01, threshold=4.969e+01, percent-clipped=0.0 2024-08-19 01:32:17,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=4208270.0, ans=15.0 2024-08-19 01:32:22,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4208270.0, ans=0.125 2024-08-19 01:32:34,217 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.84 vs. limit=15.0 2024-08-19 01:32:53,280 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 5250, loss[loss=0.08782, beats_loss=0.009844, ecapa_loss=0.0001915, whisper_loss=0.07606, over 21144.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0105, ecapa_loss=0.0001424, whisper_loss=0.09016, over 3840687.77 frames. ], batch size: 90, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:33:18,737 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 19 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-19 01:33:22,557 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-19 01:33:46,245 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 30 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-19 01:33:58,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4208970.0, ans=0.1 2024-08-19 01:34:00,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4208970.0, ans=0.0 2024-08-19 01:34:10,175 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 5300, loss[loss=0.1161, beats_loss=0.008451, ecapa_loss=0.0001241, whisper_loss=0.1064, over 23200.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01045, ecapa_loss=0.0001417, whisper_loss=0.0906, over 3862777.94 frames. ], batch size: 91, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:34:15,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4209070.0, ans=0.125 2024-08-19 01:34:29,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4209170.0, ans=0.2 2024-08-19 01:34:29,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4209170.0, ans=0.125 2024-08-19 01:34:32,128 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.82 vs. limit=15.0 2024-08-19 01:34:32,569 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.347e+01 2.623e+01 3.004e+01 4.261e+01, threshold=5.246e+01, percent-clipped=0.0 2024-08-19 01:34:56,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4209370.0, ans=0.0 2024-08-19 01:34:58,214 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-19 01:34:59,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4209370.0, ans=0.125 2024-08-19 01:35:05,789 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-19 01:35:28,124 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 5350, loss[loss=0.08794, beats_loss=0.0103, ecapa_loss=0.0001281, whisper_loss=0.07635, over 14422.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01047, ecapa_loss=0.000142, whisper_loss=0.09024, over 3859261.39 frames. ], batch size: 55, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:35:30,555 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 15 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-19 01:35:41,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4209570.0, ans=0.0 2024-08-19 01:35:42,691 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-19 01:35:56,091 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 27 from LS+wenet, 16 from Vox, 52 fro AS 2024-08-19 01:36:06,741 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 15 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-19 01:36:11,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4209770.0, ans=0.1 2024-08-19 01:36:11,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4209770.0, ans=0.0 2024-08-19 01:36:14,062 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 22 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-19 01:36:20,112 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.50 vs. limit=15.0 2024-08-19 01:36:20,795 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 16 from LS+wenet, 26 from Vox, 22 fro AS 2024-08-19 01:36:24,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4209870.0, ans=0.0 2024-08-19 01:36:33,993 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 19 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-19 01:36:48,243 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 5400, loss[loss=0.106, beats_loss=0.01093, ecapa_loss=0.0001329, whisper_loss=0.09373, over 23147.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01035, ecapa_loss=0.0001415, whisper_loss=0.09073, over 3850783.42 frames. ], batch size: 93, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:36:56,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=4210070.0, ans=0.2 2024-08-19 01:37:11,654 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.661e+01 2.258e+01 2.667e+01 2.927e+01 2.051e+02, threshold=5.334e+01, percent-clipped=3.0 2024-08-19 01:37:37,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4210370.0, ans=0.0 2024-08-19 01:38:02,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4210470.0, ans=0.125 2024-08-19 01:38:08,046 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 5450, loss[loss=0.08891, beats_loss=0.01221, ecapa_loss=0.0001394, whisper_loss=0.07531, over 18663.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01041, ecapa_loss=0.0001416, whisper_loss=0.09042, over 3863135.63 frames. ], batch size: 75, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:38:14,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4210570.0, ans=0.0 2024-08-19 01:38:21,337 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.19 vs. limit=15.0 2024-08-19 01:38:24,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4210670.0, ans=0.125 2024-08-19 01:38:38,259 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 31 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-19 01:38:48,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4210770.0, ans=0.0 2024-08-19 01:38:53,006 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-19 01:38:55,582 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-19 01:39:08,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4210970.0, ans=0.125 2024-08-19 01:39:12,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4210970.0, ans=0.125 2024-08-19 01:39:21,320 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 5500, loss[loss=0.1143, beats_loss=0.00978, ecapa_loss=0.0001468, whisper_loss=0.103, over 21907.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01049, ecapa_loss=0.000141, whisper_loss=0.09008, over 3891070.72 frames. ], batch size: 88, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:39:27,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4211070.0, ans=0.125 2024-08-19 01:39:29,258 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 19 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-19 01:39:34,835 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-19 01:39:38,760 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-19 01:39:40,282 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 21 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-19 01:39:42,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4211170.0, ans=0.2 2024-08-19 01:39:43,070 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+01 2.383e+01 2.517e+01 2.787e+01 3.399e+01, threshold=5.035e+01, percent-clipped=0.0 2024-08-19 01:39:53,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4211270.0, ans=0.0 2024-08-19 01:40:07,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4211370.0, ans=0.2 2024-08-19 01:40:24,543 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-19 01:40:25,674 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 37 from Vox, 30 fro AS 2024-08-19 01:40:28,041 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-19 01:40:30,226 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 5550, loss[loss=0.1107, beats_loss=0.009856, ecapa_loss=0.0001254, whisper_loss=0.0996, over 24510.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01048, ecapa_loss=0.0001419, whisper_loss=0.09063, over 3900865.28 frames. ], batch size: 94, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:40:42,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4211670.0, ans=0.1 2024-08-19 01:40:45,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4211670.0, ans=0.125 2024-08-19 01:40:49,373 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-19 01:41:01,255 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 28 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-19 01:41:09,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4211870.0, ans=0.0 2024-08-19 01:41:12,444 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-19 01:41:20,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4211870.0, ans=0.2 2024-08-19 01:41:21,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4211870.0, ans=0.0 2024-08-19 01:41:36,429 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 5600, loss[loss=0.1048, beats_loss=0.01033, ecapa_loss=0.0001561, whisper_loss=0.09289, over 21299.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01056, ecapa_loss=0.0001419, whisper_loss=0.09016, over 3908081.12 frames. ], batch size: 91, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:41:36,542 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 27 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-19 01:41:37,012 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.80 vs. limit=22.5 2024-08-19 01:41:37,957 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 26 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-19 01:41:41,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4212070.0, ans=0.125 2024-08-19 01:41:48,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4212170.0, ans=0.0 2024-08-19 01:41:56,850 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.373e+01 2.558e+01 2.737e+01 3.942e+01, threshold=5.116e+01, percent-clipped=0.0 2024-08-19 01:41:56,965 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 12 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-19 01:42:02,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=4212270.0, ans=10.0 2024-08-19 01:42:08,846 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 18 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-19 01:42:14,181 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 20 from LS+wenet, 24 from Vox, 51 fro AS 2024-08-19 01:42:15,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4212370.0, ans=0.015 2024-08-19 01:42:27,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4212370.0, ans=0.125 2024-08-19 01:42:30,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4212470.0, ans=0.125 2024-08-19 01:42:30,868 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.33 vs. limit=10.0 2024-08-19 01:42:37,053 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.43 vs. limit=15.0 2024-08-19 01:42:38,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4212470.0, ans=0.125 2024-08-19 01:42:40,284 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.710e-02 2024-08-19 01:42:43,353 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 5650, loss[loss=0.107, beats_loss=0.009955, ecapa_loss=0.0001297, whisper_loss=0.09571, over 21136.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01058, ecapa_loss=0.0001421, whisper_loss=0.08973, over 3896668.75 frames. ], batch size: 82, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:43:02,595 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 28 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-19 01:43:08,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4212670.0, ans=0.0 2024-08-19 01:43:18,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4212770.0, ans=0.125 2024-08-19 01:43:22,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4212770.0, ans=0.125 2024-08-19 01:43:23,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4212870.0, ans=0.125 2024-08-19 01:43:28,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4212870.0, ans=0.0 2024-08-19 01:43:32,056 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.740e-01 2024-08-19 01:43:33,206 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.300e+05 2024-08-19 01:43:38,509 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 36 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-19 01:43:48,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4212970.0, ans=0.0 2024-08-19 01:43:51,627 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 5700, loss[loss=0.0947, beats_loss=0.01237, ecapa_loss=0.0001336, whisper_loss=0.081, over 22307.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01052, ecapa_loss=0.0001433, whisper_loss=0.09005, over 3921073.71 frames. ], batch size: 87, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:43:53,933 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.03 vs. limit=15.0 2024-08-19 01:43:59,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4213070.0, ans=0.125 2024-08-19 01:44:00,119 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.73 vs. limit=15.0 2024-08-19 01:44:06,786 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.22 vs. limit=12.0 2024-08-19 01:44:07,826 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-19 01:44:12,711 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.394e+01 2.632e+01 3.086e+01 9.254e+01, threshold=5.264e+01, percent-clipped=1.0 2024-08-19 01:44:21,177 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.27 vs. limit=15.0 2024-08-19 01:44:21,918 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 18 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-19 01:44:25,162 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.01 vs. limit=15.0 2024-08-19 01:44:48,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4213470.0, ans=0.125 2024-08-19 01:44:58,025 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 16 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-19 01:45:01,721 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 5750, loss[loss=0.1164, beats_loss=0.01008, ecapa_loss=0.0001647, whisper_loss=0.1047, over 20810.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01048, ecapa_loss=0.0001431, whisper_loss=0.09053, over 3906984.63 frames. ], batch size: 89, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:45:10,260 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 15 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-19 01:45:17,636 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.520e-01 2024-08-19 01:45:23,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4213670.0, ans=0.0 2024-08-19 01:45:27,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4213670.0, ans=0.95 2024-08-19 01:45:30,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4213770.0, ans=0.1 2024-08-19 01:45:46,809 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-19 01:46:05,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4213970.0, ans=0.1 2024-08-19 01:46:09,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4213970.0, ans=0.125 2024-08-19 01:46:13,287 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 5800, loss[loss=0.0806, beats_loss=0.01249, ecapa_loss=0.0001169, whisper_loss=0.06695, over 21995.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01043, ecapa_loss=0.0001428, whisper_loss=0.09051, over 3913047.72 frames. ], batch size: 90, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:46:15,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4214070.0, ans=0.0 2024-08-19 01:46:26,795 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 15 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-19 01:46:35,942 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.883e+01 2.316e+01 2.608e+01 2.916e+01 4.627e+01, threshold=5.217e+01, percent-clipped=0.0 2024-08-19 01:47:02,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4214370.0, ans=0.2 2024-08-19 01:47:03,502 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 27 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-19 01:47:05,419 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.73 vs. limit=22.5 2024-08-19 01:47:07,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4214370.0, ans=0.125 2024-08-19 01:47:14,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4214470.0, ans=0.1 2024-08-19 01:47:15,399 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.35 vs. limit=22.5 2024-08-19 01:47:19,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4214470.0, ans=0.125 2024-08-19 01:47:22,173 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.207e+01 2024-08-19 01:47:24,599 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 5850, loss[loss=0.0964, beats_loss=0.008318, ecapa_loss=0.0001748, whisper_loss=0.08633, over 14724.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01037, ecapa_loss=0.0001444, whisper_loss=0.09057, over 3891148.54 frames. ], batch size: 59, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:47:31,810 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.60 vs. limit=15.0 2024-08-19 01:47:36,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4214670.0, ans=0.0 2024-08-19 01:47:44,769 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 15 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-19 01:47:53,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4214770.0, ans=0.125 2024-08-19 01:47:57,740 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 38 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-19 01:48:06,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4214870.0, ans=0.125 2024-08-19 01:48:20,083 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 17 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-19 01:48:21,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten.whitening_limit, batch_count=4214970.0, ans=15.0 2024-08-19 01:48:25,472 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 20 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-19 01:48:27,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4214970.0, ans=0.125 2024-08-19 01:48:34,851 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.77 vs. limit=15.0 2024-08-19 01:48:37,176 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 5900, loss[loss=0.09828, beats_loss=0.008588, ecapa_loss=0.000191, whisper_loss=0.08778, over 20702.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01042, ecapa_loss=0.0001445, whisper_loss=0.08986, over 3887166.70 frames. ], batch size: 86, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:48:57,703 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.293e+01 2.484e+01 2.776e+01 5.070e+01, threshold=4.968e+01, percent-clipped=0.0 2024-08-19 01:49:14,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4215270.0, ans=0.125 2024-08-19 01:49:33,872 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 17 from Vox, 50 fro AS 2024-08-19 01:49:34,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4215470.0, ans=0.2 2024-08-19 01:49:41,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4215470.0, ans=0.0 2024-08-19 01:49:51,556 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 5950, loss[loss=0.1016, beats_loss=0.01263, ecapa_loss=0.0001335, whisper_loss=0.08761, over 23348.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01052, ecapa_loss=0.0001438, whisper_loss=0.08922, over 3880752.94 frames. ], batch size: 94, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:50:54,451 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 01:50:57,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4215870.0, ans=0.09899494936611666 2024-08-19 01:51:21,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=4215970.0, ans=10.0 2024-08-19 01:51:24,386 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 6000, loss[loss=0.1073, beats_loss=0.01064, ecapa_loss=0.0001239, whisper_loss=0.09545, over 16849.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01051, ecapa_loss=0.000143, whisper_loss=0.08964, over 3875898.59 frames. ], batch size: 65, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:51:24,388 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-19 01:52:18,322 INFO [train_multi_KD3.py:1149] (0/4) Epoch 29, validation on ASR_libri: loss=0.2515, beats_loss=0, ecapa_loss=0.0005229, whisper_loss=0.2463, over 922467.00 frames. 2024-08-19 01:52:36,639 INFO [train_multi_KD3.py:1149] (0/4) Epoch 29, validation on SV_voxceleb1: loss=0.003944, beats_loss=0, ecapa_loss=0.0003944, whisper_loss=0, over 939242.00 frames. 2024-08-19 01:53:38,812 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.3619, 3.2213, 2.8140, 3.1712], device='cuda:0') 2024-08-19 01:55:19,237 INFO [train_multi_KD3.py:1149] (0/4) Epoch 29, validation on AT_audioset: loss=0.02306, beats_loss=0.02306, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 01:55:19,241 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32389MB 2024-08-19 01:55:22,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4216070.0, ans=0.125 2024-08-19 01:55:25,062 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-19 01:55:26,340 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 18 from LS+wenet, 26 from Vox, 45 fro AS 2024-08-19 01:55:38,358 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-19 01:55:40,898 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.71 vs. limit=15.0 2024-08-19 01:55:48,133 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.845e+01 2.345e+01 2.612e+01 2.923e+01 8.240e+01, threshold=5.224e+01, percent-clipped=1.0 2024-08-19 01:55:55,135 WARNING [optim.py:496] (0/4) Scaling gradients by 0.028258753940463066, model_norm_threshold=52.240760803222656 2024-08-19 01:55:55,300 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.18, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.066e+05, grad_sumsq=6.066e+05, orig_rms_sq=1.000e+00 2024-08-19 01:56:04,357 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4216270.0, ans=0.0 2024-08-19 01:56:10,145 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 01:56:11,861 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 27 from LS+wenet, 18 from Vox, 17 fro AS 2024-08-19 01:56:17,623 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.956e-02 2024-08-19 01:56:38,509 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-19 01:56:40,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4216470.0, ans=0.125 2024-08-19 01:56:47,118 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 32 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-19 01:56:51,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4216470.0, ans=0.2 2024-08-19 01:56:56,737 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 6050, loss[loss=0.1234, beats_loss=0.007992, ecapa_loss=0.0001574, whisper_loss=0.1138, over 21684.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0105, ecapa_loss=0.0001426, whisper_loss=0.09043, over 3886218.41 frames. ], batch size: 84, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:57:17,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4216670.0, ans=0.0 2024-08-19 01:57:37,096 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.84 vs. limit=15.0 2024-08-19 01:57:44,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4216870.0, ans=0.0 2024-08-19 01:57:53,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=4216870.0, ans=15.0 2024-08-19 01:57:58,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4216970.0, ans=0.125 2024-08-19 01:58:11,245 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 6100, loss[loss=0.1042, beats_loss=0.01037, ecapa_loss=0.0001477, whisper_loss=0.09237, over 23234.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0105, ecapa_loss=0.0001431, whisper_loss=0.0896, over 3891829.72 frames. ], batch size: 95, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:58:31,661 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.816e+01 2.328e+01 2.727e+01 2.996e+01 1.849e+03, threshold=5.454e+01, percent-clipped=1.0 2024-08-19 01:58:36,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4217270.0, ans=0.125 2024-08-19 01:58:55,123 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 20 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-19 01:58:56,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=4217370.0, ans=0.1 2024-08-19 01:58:57,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4217370.0, ans=0.125 2024-08-19 01:59:02,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4217370.0, ans=0.125 2024-08-19 01:59:05,323 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.92 vs. limit=15.0 2024-08-19 01:59:13,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4217470.0, ans=0.0 2024-08-19 01:59:19,524 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 6150, loss[loss=0.1053, beats_loss=0.008306, ecapa_loss=0.0001338, whisper_loss=0.09569, over 18503.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01053, ecapa_loss=0.0001436, whisper_loss=0.08943, over 3882198.63 frames. ], batch size: 71, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:59:22,642 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-19 01:59:34,204 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.49 vs. limit=22.5 2024-08-19 01:59:40,622 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 24 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-19 01:59:53,496 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.89 vs. limit=12.0 2024-08-19 01:59:57,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4217770.0, ans=0.125 2024-08-19 02:00:05,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4217870.0, ans=0.2 2024-08-19 02:00:09,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4217870.0, ans=0.125 2024-08-19 02:00:25,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4217970.0, ans=0.1 2024-08-19 02:00:29,200 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.58 vs. limit=12.0 2024-08-19 02:00:29,878 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 6200, loss[loss=0.07755, beats_loss=0.01265, ecapa_loss=0.000136, whisper_loss=0.06354, over 16843.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0105, ecapa_loss=0.0001429, whisper_loss=0.08963, over 3852201.85 frames. ], batch size: 68, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:00:37,740 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 24 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-19 02:00:49,916 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.97 vs. limit=22.5 2024-08-19 02:00:50,570 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.664e+01 2.249e+01 2.459e+01 2.825e+01 3.741e+01, threshold=4.918e+01, percent-clipped=0.0 2024-08-19 02:01:10,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4218370.0, ans=0.125 2024-08-19 02:01:13,462 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 24 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-19 02:01:13,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4218370.0, ans=0.125 2024-08-19 02:01:24,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4218370.0, ans=0.125 2024-08-19 02:01:40,944 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 6250, loss[loss=0.1028, beats_loss=0.0119, ecapa_loss=0.0001133, whisper_loss=0.08975, over 13651.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01041, ecapa_loss=0.0001416, whisper_loss=0.09026, over 3857527.77 frames. ], batch size: 53, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:01:48,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4218570.0, ans=0.125 2024-08-19 02:01:51,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4218570.0, ans=0.0 2024-08-19 02:01:51,365 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2024-08-19 02:02:03,791 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.30 vs. limit=15.0 2024-08-19 02:02:18,454 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 17 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-19 02:02:18,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4218770.0, ans=0.125 2024-08-19 02:02:31,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4218870.0, ans=0.1 2024-08-19 02:02:35,656 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 39 from LS+wenet, 12 from Vox, 39 fro AS 2024-08-19 02:02:39,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4218970.0, ans=0.0 2024-08-19 02:02:39,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4218970.0, ans=0.125 2024-08-19 02:02:49,353 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-19 02:02:50,737 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 6300, loss[loss=0.1132, beats_loss=0.0108, ecapa_loss=0.0001548, whisper_loss=0.1009, over 21891.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01038, ecapa_loss=0.0001419, whisper_loss=0.09054, over 3860267.13 frames. ], batch size: 90, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:02:52,696 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.63 vs. limit=15.0 2024-08-19 02:02:53,629 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 10 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-19 02:02:57,567 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 19 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-19 02:03:11,743 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.391e+01 2.571e+01 3.038e+01 7.293e+01, threshold=5.142e+01, percent-clipped=2.0 2024-08-19 02:03:17,198 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 21 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-19 02:03:59,772 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 6350, loss[loss=0.1068, beats_loss=0.01157, ecapa_loss=0.0001798, whisper_loss=0.09344, over 21946.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01044, ecapa_loss=0.0001424, whisper_loss=0.09025, over 3848833.96 frames. ], batch size: 92, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:03:59,872 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-19 02:04:06,255 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 22 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-19 02:04:14,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4219670.0, ans=0.2 2024-08-19 02:04:28,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4219770.0, ans=0.2 2024-08-19 02:04:31,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4219770.0, ans=0.0 2024-08-19 02:04:33,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4219770.0, ans=0.0 2024-08-19 02:04:37,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4219770.0, ans=0.2 2024-08-19 02:04:45,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4219870.0, ans=0.0 2024-08-19 02:04:50,085 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 27 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-19 02:04:53,147 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-19 02:04:54,773 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.97 vs. limit=15.0 2024-08-19 02:04:56,302 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.41 vs. limit=15.0 2024-08-19 02:05:07,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4219970.0, ans=0.07 2024-08-19 02:05:09,547 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 6400, loss[loss=0.1084, beats_loss=0.01086, ecapa_loss=0.0001147, whisper_loss=0.09644, over 17997.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01043, ecapa_loss=0.0001418, whisper_loss=0.08993, over 3871497.90 frames. ], batch size: 69, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:05:10,011 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.57 vs. limit=22.5 2024-08-19 02:05:15,461 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.53 vs. limit=15.0 2024-08-19 02:05:31,204 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.582e+01 2.283e+01 2.522e+01 2.730e+01 4.061e+01, threshold=5.044e+01, percent-clipped=0.0 2024-08-19 02:05:50,390 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.38 vs. limit=15.0 2024-08-19 02:05:58,348 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 16 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-19 02:06:11,169 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.84 vs. limit=22.5 2024-08-19 02:06:19,666 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 6450, loss[loss=0.08745, beats_loss=0.01145, ecapa_loss=0.0001668, whisper_loss=0.07432, over 21164.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01051, ecapa_loss=0.000142, whisper_loss=0.08992, over 3893930.24 frames. ], batch size: 90, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:06:20,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4220570.0, ans=0.05 2024-08-19 02:06:25,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4220570.0, ans=0.125 2024-08-19 02:06:34,053 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.81 vs. limit=12.0 2024-08-19 02:06:37,428 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-19 02:07:04,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4220870.0, ans=0.0 2024-08-19 02:07:07,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4220870.0, ans=0.0 2024-08-19 02:07:08,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4220870.0, ans=0.1 2024-08-19 02:07:20,761 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 23 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-19 02:07:26,510 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 16 from Vox, 49 fro AS 2024-08-19 02:07:27,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4220970.0, ans=0.125 2024-08-19 02:07:29,954 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 6500, loss[loss=0.0974, beats_loss=0.01062, ecapa_loss=0.0001269, whisper_loss=0.08551, over 21214.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01045, ecapa_loss=0.0001437, whisper_loss=0.0906, over 3910351.50 frames. ], batch size: 81, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:07:42,046 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.20 vs. limit=15.0 2024-08-19 02:07:50,819 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.411e+01 2.588e+01 2.957e+01 3.943e+01, threshold=5.175e+01, percent-clipped=0.0 2024-08-19 02:08:28,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4221470.0, ans=0.125 2024-08-19 02:08:29,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4221470.0, ans=0.125 2024-08-19 02:08:38,486 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 6550, loss[loss=0.09417, beats_loss=0.01299, ecapa_loss=0.0001215, whisper_loss=0.07996, over 21687.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01045, ecapa_loss=0.0001419, whisper_loss=0.09112, over 3931985.80 frames. ], batch size: 89, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:08:43,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4221570.0, ans=0.125 2024-08-19 02:08:57,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4221670.0, ans=0.125 2024-08-19 02:09:10,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4221770.0, ans=0.125 2024-08-19 02:09:29,178 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.10 vs. limit=12.0 2024-08-19 02:09:31,673 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 12 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-19 02:09:33,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4221870.0, ans=0.0 2024-08-19 02:09:42,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4221970.0, ans=0.125 2024-08-19 02:09:49,327 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 6600, loss[loss=0.08775, beats_loss=0.01003, ecapa_loss=0.0001332, whisper_loss=0.07638, over 22066.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01041, ecapa_loss=0.0001432, whisper_loss=0.09101, over 3963502.80 frames. ], batch size: 88, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:09:57,879 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.00 vs. limit=15.0 2024-08-19 02:10:09,806 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.434e+01 2.631e+01 2.972e+01 4.626e+01, threshold=5.261e+01, percent-clipped=0.0 2024-08-19 02:10:13,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4222170.0, ans=0.2 2024-08-19 02:10:14,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4222170.0, ans=0.0 2024-08-19 02:10:17,929 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 37 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-19 02:10:32,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4222370.0, ans=0.125 2024-08-19 02:10:46,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4222470.0, ans=0.0 2024-08-19 02:10:47,316 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 30 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-19 02:10:58,861 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 6650, loss[loss=0.09419, beats_loss=0.01238, ecapa_loss=0.0001205, whisper_loss=0.0806, over 19397.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01044, ecapa_loss=0.0001431, whisper_loss=0.09117, over 3951842.15 frames. ], batch size: 74, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:11:09,118 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 33 from Vox, 33 fro AS 2024-08-19 02:11:10,281 WARNING [optim.py:496] (0/4) Scaling gradients by 0.04424963891506195, model_norm_threshold=52.611846923828125 2024-08-19 02:11:10,443 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.224e+05, grad_sumsq=2.136e+07, orig_rms_sq=1.041e-02 2024-08-19 02:11:13,047 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 38 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-19 02:11:46,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4222870.0, ans=0.125 2024-08-19 02:11:46,400 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.96 vs. limit=15.0 2024-08-19 02:11:47,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4222870.0, ans=0.125 2024-08-19 02:11:50,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4222870.0, ans=0.0 2024-08-19 02:12:09,155 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 6700, loss[loss=0.08477, beats_loss=0.0114, ecapa_loss=0.0001398, whisper_loss=0.07197, over 15528.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0104, ecapa_loss=0.0001444, whisper_loss=0.091, over 3922864.96 frames. ], batch size: 62, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:12:14,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4223070.0, ans=0.07 2024-08-19 02:12:31,458 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.376e+01 2.691e+01 2.989e+01 1.189e+03, threshold=5.381e+01, percent-clipped=5.0 2024-08-19 02:12:33,579 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-19 02:12:39,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4223270.0, ans=0.0 2024-08-19 02:12:50,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4223270.0, ans=0.125 2024-08-19 02:13:01,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4223370.0, ans=0.04949747468305833 2024-08-19 02:13:09,318 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.81 vs. limit=22.5 2024-08-19 02:13:20,614 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 6750, loss[loss=0.1005, beats_loss=0.01058, ecapa_loss=0.0001723, whisper_loss=0.08821, over 16424.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01035, ecapa_loss=0.0001432, whisper_loss=0.09124, over 3894725.39 frames. ], batch size: 69, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:13:31,721 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-19 02:13:41,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4223670.0, ans=0.0 2024-08-19 02:13:44,291 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.14 vs. limit=15.0 2024-08-19 02:14:02,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4223870.0, ans=0.0 2024-08-19 02:14:02,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4223870.0, ans=0.125 2024-08-19 02:14:10,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4223870.0, ans=0.125 2024-08-19 02:14:12,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4223870.0, ans=0.125 2024-08-19 02:14:12,893 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.27 vs. limit=15.0 2024-08-19 02:14:28,903 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 6800, loss[loss=0.09748, beats_loss=0.01097, ecapa_loss=0.0001596, whisper_loss=0.08491, over 13909.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01032, ecapa_loss=0.0001441, whisper_loss=0.09104, over 3866596.34 frames. ], batch size: 56, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:14:32,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4224070.0, ans=0.1 2024-08-19 02:14:46,238 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 27 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-19 02:14:49,983 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.425e+01 2.571e+01 2.858e+01 3.712e+02, threshold=5.143e+01, percent-clipped=3.0 2024-08-19 02:14:50,140 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 24 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-19 02:14:51,773 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-19 02:14:52,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4224170.0, ans=0.0 2024-08-19 02:14:55,808 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 17 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-19 02:15:03,551 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 16 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-19 02:15:15,841 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 27 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-19 02:15:24,366 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 16 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-19 02:15:34,813 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-19 02:15:37,300 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 6850, loss[loss=0.09166, beats_loss=0.01268, ecapa_loss=0.0001054, whisper_loss=0.07792, over 15768.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01044, ecapa_loss=0.0001434, whisper_loss=0.09019, over 3852790.82 frames. ], batch size: 60, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:15:38,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4224570.0, ans=0.07 2024-08-19 02:15:44,095 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 38 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-19 02:16:01,976 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.41 vs. limit=15.0 2024-08-19 02:16:02,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4224770.0, ans=0.04949747468305833 2024-08-19 02:16:24,476 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-19 02:16:42,040 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-19 02:16:46,198 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 6900, loss[loss=0.1207, beats_loss=0.008942, ecapa_loss=0.0001373, whisper_loss=0.1104, over 21238.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01048, ecapa_loss=0.0001429, whisper_loss=0.09028, over 3862461.70 frames. ], batch size: 81, lr: 2.11e-03, grad_scale: 1.152921504606847e+18 2024-08-19 02:17:06,150 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.295e+01 2.515e+01 2.694e+01 1.143e+02, threshold=5.030e+01, percent-clipped=2.0 2024-08-19 02:17:17,520 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-19 02:17:24,136 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.46 vs. limit=10.0 2024-08-19 02:17:27,779 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-19 02:17:41,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4225470.0, ans=0.125 2024-08-19 02:17:51,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4225570.0, ans=0.07 2024-08-19 02:17:52,721 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 6950, loss[loss=0.08108, beats_loss=0.01218, ecapa_loss=0.0001315, whisper_loss=0.06759, over 13556.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01056, ecapa_loss=0.0001421, whisper_loss=0.08987, over 3838690.04 frames. ], batch size: 53, lr: 2.11e-03, grad_scale: 1.152921504606847e+18 2024-08-19 02:18:00,953 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 02:18:48,787 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 26 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-19 02:18:57,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4226070.0, ans=0.125 2024-08-19 02:18:58,847 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 7000, loss[loss=0.1013, beats_loss=0.01104, ecapa_loss=0.0001557, whisper_loss=0.08869, over 22266.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0106, ecapa_loss=0.0001435, whisper_loss=0.08952, over 3839734.64 frames. ], batch size: 92, lr: 2.11e-03, grad_scale: 1.152921504606847e+18 2024-08-19 02:19:08,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4226070.0, ans=0.125 2024-08-19 02:19:18,041 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.286e+01 2.533e+01 2.808e+01 4.798e+01, threshold=5.066e+01, percent-clipped=0.0 2024-08-19 02:19:19,412 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 36 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-19 02:19:26,721 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 30 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-19 02:19:36,236 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.94 vs. limit=10.0 2024-08-19 02:19:40,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4226370.0, ans=0.1 2024-08-19 02:19:41,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4226370.0, ans=0.0 2024-08-19 02:19:43,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4226370.0, ans=0.125 2024-08-19 02:19:45,172 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.11 vs. limit=15.0 2024-08-19 02:19:56,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=4226470.0, ans=10.0 2024-08-19 02:19:56,859 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.81 vs. limit=12.0 2024-08-19 02:20:02,709 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 7050, loss[loss=0.1075, beats_loss=0.009003, ecapa_loss=0.0001588, whisper_loss=0.09696, over 22818.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0105, ecapa_loss=0.0001448, whisper_loss=0.08998, over 3861011.51 frames. ], batch size: 92, lr: 2.11e-03, grad_scale: 1.152921504606847e+18 2024-08-19 02:20:06,170 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.51 vs. limit=15.0 2024-08-19 02:20:10,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4226570.0, ans=0.1 2024-08-19 02:20:11,754 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-19 02:20:23,220 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 15 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-19 02:20:25,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4226670.0, ans=0.125 2024-08-19 02:20:32,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4226770.0, ans=0.125 2024-08-19 02:20:32,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4226770.0, ans=0.0 2024-08-19 02:20:57,151 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 15 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-19 02:21:03,293 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 28 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-19 02:21:04,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4227070.0, ans=0.125 2024-08-19 02:21:05,475 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 7100, loss[loss=0.1016, beats_loss=0.009327, ecapa_loss=0.0001547, whisper_loss=0.09071, over 17445.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01049, ecapa_loss=0.0001433, whisper_loss=0.08998, over 3854230.06 frames. ], batch size: 69, lr: 2.11e-03, grad_scale: 1.152921504606847e+18 2024-08-19 02:21:08,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4227070.0, ans=0.2 2024-08-19 02:21:14,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4227070.0, ans=0.04949747468305833 2024-08-19 02:21:16,869 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 18 from LS+wenet, 26 from Vox, 46 fro AS 2024-08-19 02:21:18,531 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2024-08-19 02:21:21,592 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 31 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-19 02:21:23,809 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.314e+01 2.618e+01 2.953e+01 5.824e+01, threshold=5.237e+01, percent-clipped=1.0 2024-08-19 02:21:28,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4227170.0, ans=0.125 2024-08-19 02:21:38,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4227270.0, ans=0.125 2024-08-19 02:21:51,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4227370.0, ans=0.1 2024-08-19 02:21:55,004 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 27 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-19 02:22:08,096 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 7150, loss[loss=0.09987, beats_loss=0.008722, ecapa_loss=0.0001691, whisper_loss=0.08946, over 17264.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01041, ecapa_loss=0.0001434, whisper_loss=0.09043, over 3872339.75 frames. ], batch size: 71, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:22:21,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4227670.0, ans=0.2 2024-08-19 02:22:28,106 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 22 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-19 02:22:29,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=4227670.0, ans=0.025 2024-08-19 02:22:43,190 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-19 02:22:46,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4227870.0, ans=0.0 2024-08-19 02:22:46,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4227870.0, ans=0.125 2024-08-19 02:22:51,281 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-19 02:22:51,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4227870.0, ans=0.125 2024-08-19 02:23:08,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=4227970.0, ans=15.0 2024-08-19 02:23:11,467 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 7200, loss[loss=0.09591, beats_loss=0.0124, ecapa_loss=0.0001153, whisper_loss=0.08236, over 22176.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01044, ecapa_loss=0.0001426, whisper_loss=0.09011, over 3858108.13 frames. ], batch size: 88, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:23:12,937 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 32 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-19 02:23:13,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4228070.0, ans=0.125 2024-08-19 02:23:31,506 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.350e+01 2.585e+01 2.924e+01 4.669e+01, threshold=5.169e+01, percent-clipped=0.0 2024-08-19 02:23:36,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4228270.0, ans=0.125 2024-08-19 02:23:55,815 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.56 vs. limit=15.0 2024-08-19 02:23:56,583 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-19 02:24:00,928 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.55 vs. limit=22.5 2024-08-19 02:24:04,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4228470.0, ans=0.125 2024-08-19 02:24:06,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4228470.0, ans=0.2 2024-08-19 02:24:13,877 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 7250, loss[loss=0.104, beats_loss=0.009616, ecapa_loss=0.000141, whisper_loss=0.09298, over 22759.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01041, ecapa_loss=0.0001425, whisper_loss=0.09064, over 3895834.57 frames. ], batch size: 91, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:24:15,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4228570.0, ans=0.2 2024-08-19 02:24:20,475 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 20 from LS+wenet, 8 from Vox, 30 fro AS 2024-08-19 02:24:43,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4228770.0, ans=0.0 2024-08-19 02:24:43,786 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.77 vs. limit=15.0 2024-08-19 02:24:48,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4228770.0, ans=0.0 2024-08-19 02:24:55,407 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.28 vs. limit=6.0 2024-08-19 02:24:57,777 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.84 vs. limit=22.5 2024-08-19 02:25:02,157 WARNING [optim.py:496] (0/4) Scaling gradients by 0.0587669275701046, model_norm_threshold=51.69389343261719 2024-08-19 02:25:02,321 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.3.norm.log_scale with proportion 0.11, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.897e+04, grad_sumsq=8.897e+04, orig_rms_sq=1.000e+00 2024-08-19 02:25:16,807 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-19 02:25:18,051 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 7300, loss[loss=0.1033, beats_loss=0.009983, ecapa_loss=0.0001712, whisper_loss=0.09162, over 20704.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0105, ecapa_loss=0.0001423, whisper_loss=0.09038, over 3882327.31 frames. ], batch size: 88, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:25:33,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4229170.0, ans=0.125 2024-08-19 02:25:33,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4229170.0, ans=0.1 2024-08-19 02:25:35,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4229170.0, ans=0.0 2024-08-19 02:25:38,107 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.817e+01 2.307e+01 2.519e+01 2.780e+01 8.796e+02, threshold=5.038e+01, percent-clipped=1.0 2024-08-19 02:25:40,253 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.65 vs. limit=22.5 2024-08-19 02:25:41,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4229170.0, ans=0.0 2024-08-19 02:25:46,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4229270.0, ans=0.09899494936611666 2024-08-19 02:25:59,570 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 27 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-19 02:25:59,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4229370.0, ans=0.0 2024-08-19 02:26:08,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4229470.0, ans=0.0 2024-08-19 02:26:20,631 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 7350, loss[loss=0.1104, beats_loss=0.009781, ecapa_loss=0.0001369, whisper_loss=0.0993, over 22091.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01047, ecapa_loss=0.0001434, whisper_loss=0.09051, over 3869574.01 frames. ], batch size: 88, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:26:23,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4229570.0, ans=0.125 2024-08-19 02:26:25,976 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-19 02:26:34,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4229670.0, ans=0.0 2024-08-19 02:26:52,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4229770.0, ans=0.0 2024-08-19 02:26:53,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4229770.0, ans=0.1 2024-08-19 02:27:01,883 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.17 vs. limit=15.0 2024-08-19 02:27:03,592 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 35 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-19 02:27:07,965 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.47 vs. limit=10.0 2024-08-19 02:27:09,751 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-19 02:27:10,989 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-19 02:27:25,385 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 7400, loss[loss=0.1223, beats_loss=0.009559, ecapa_loss=0.0001359, whisper_loss=0.1114, over 15373.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01048, ecapa_loss=0.0001432, whisper_loss=0.09065, over 3880007.92 frames. ], batch size: 61, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:27:25,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4230070.0, ans=0.125 2024-08-19 02:27:29,522 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 14 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-19 02:27:43,921 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-19 02:27:46,175 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.315e+01 2.515e+01 2.740e+01 4.360e+01, threshold=5.029e+01, percent-clipped=0.0 2024-08-19 02:27:46,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4230170.0, ans=0.125 2024-08-19 02:27:46,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4230170.0, ans=0.125 2024-08-19 02:27:57,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4230270.0, ans=0.125 2024-08-19 02:28:11,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4230370.0, ans=0.125 2024-08-19 02:28:11,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4230370.0, ans=0.0 2024-08-19 02:28:14,333 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 38 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-19 02:28:16,893 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 25 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-19 02:28:17,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=4230470.0, ans=0.025 2024-08-19 02:28:29,069 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 7450, loss[loss=0.1098, beats_loss=0.007985, ecapa_loss=0.0001218, whisper_loss=0.1006, over 20528.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01037, ecapa_loss=0.0001442, whisper_loss=0.09117, over 3866333.98 frames. ], batch size: 76, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:28:36,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4230570.0, ans=0.125 2024-08-19 02:28:39,135 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.77 vs. limit=15.0 2024-08-19 02:28:39,813 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 20 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-19 02:28:40,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4230570.0, ans=0.2 2024-08-19 02:28:40,561 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.09 vs. limit=15.0 2024-08-19 02:29:17,035 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-19 02:29:33,723 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 7500, loss[loss=0.1069, beats_loss=0.008798, ecapa_loss=0.0001712, whisper_loss=0.09644, over 16359.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01031, ecapa_loss=0.0001444, whisper_loss=0.09097, over 3868189.31 frames. ], batch size: 65, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:29:36,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4231070.0, ans=0.1 2024-08-19 02:29:53,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4231170.0, ans=0.1 2024-08-19 02:29:54,011 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.202e+01 2.431e+01 2.767e+01 3.373e+01, threshold=4.863e+01, percent-clipped=0.0 2024-08-19 02:29:55,744 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=4.460e+01 2024-08-19 02:29:56,974 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-19 02:30:24,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4231470.0, ans=0.0 2024-08-19 02:30:29,941 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 34 from Vox, 33 fro AS 2024-08-19 02:30:37,878 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 7550, loss[loss=0.07138, beats_loss=0.01284, ecapa_loss=0.0001908, whisper_loss=0.05663, over 16697.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01033, ecapa_loss=0.0001446, whisper_loss=0.08996, over 3836275.54 frames. ], batch size: 72, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:30:40,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4231570.0, ans=0.125 2024-08-19 02:30:59,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4231670.0, ans=0.2 2024-08-19 02:31:02,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4231770.0, ans=0.0 2024-08-19 02:31:27,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4231970.0, ans=0.1 2024-08-19 02:31:35,747 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.85 vs. limit=15.0 2024-08-19 02:31:41,782 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 7600, loss[loss=0.09087, beats_loss=0.01161, ecapa_loss=0.0001289, whisper_loss=0.07797, over 22346.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01037, ecapa_loss=0.0001447, whisper_loss=0.09039, over 3856067.63 frames. ], batch size: 90, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:31:57,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4232170.0, ans=0.2 2024-08-19 02:31:59,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4232170.0, ans=0.125 2024-08-19 02:32:01,694 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.345e+01 2.567e+01 2.795e+01 4.774e+01, threshold=5.135e+01, percent-clipped=0.0 2024-08-19 02:32:01,824 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-19 02:32:02,634 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.36 vs. limit=15.0 2024-08-19 02:32:10,781 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-19 02:32:13,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4232270.0, ans=0.0 2024-08-19 02:32:16,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4232270.0, ans=0.125 2024-08-19 02:32:17,815 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 22 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-19 02:32:38,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4232470.0, ans=0.0 2024-08-19 02:32:38,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4232470.0, ans=0.1 2024-08-19 02:32:42,737 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 20 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-19 02:32:45,031 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 7650, loss[loss=0.1164, beats_loss=0.009551, ecapa_loss=0.0001109, whisper_loss=0.1057, over 23693.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01031, ecapa_loss=0.0001442, whisper_loss=0.09033, over 3860628.15 frames. ], batch size: 90, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:32:58,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4232670.0, ans=0.0 2024-08-19 02:32:59,300 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-19 02:33:04,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=4232670.0, ans=6.0 2024-08-19 02:33:07,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4232670.0, ans=0.125 2024-08-19 02:33:09,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4232770.0, ans=0.1 2024-08-19 02:33:12,169 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-19 02:33:13,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4232770.0, ans=0.2 2024-08-19 02:33:19,256 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.09 vs. limit=10.0 2024-08-19 02:33:20,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4232770.0, ans=0.1 2024-08-19 02:33:48,018 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 7700, loss[loss=0.1071, beats_loss=0.009702, ecapa_loss=0.0001163, whisper_loss=0.09622, over 19884.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01028, ecapa_loss=0.0001448, whisper_loss=0.09, over 3866988.75 frames. ], batch size: 78, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:33:49,493 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-19 02:33:52,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4233070.0, ans=0.125 2024-08-19 02:33:58,525 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.192e+05 2024-08-19 02:34:07,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4233170.0, ans=0.1 2024-08-19 02:34:07,805 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.265e+01 2.506e+01 2.925e+01 4.294e+01, threshold=5.013e+01, percent-clipped=0.0 2024-08-19 02:34:09,450 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-19 02:34:27,996 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.89 vs. limit=15.0 2024-08-19 02:34:40,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4233470.0, ans=0.125 2024-08-19 02:34:49,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4233470.0, ans=0.125 2024-08-19 02:34:50,238 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-19 02:34:51,658 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 7750, loss[loss=0.09966, beats_loss=0.01032, ecapa_loss=0.0001543, whisper_loss=0.0878, over 16602.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01038, ecapa_loss=0.0001441, whisper_loss=0.08961, over 3877847.11 frames. ], batch size: 67, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:34:52,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4233570.0, ans=0.125 2024-08-19 02:35:09,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4233670.0, ans=0.0 2024-08-19 02:35:10,813 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-19 02:35:14,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4233670.0, ans=0.125 2024-08-19 02:35:25,330 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-19 02:35:30,759 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 28 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-19 02:35:34,422 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-19 02:35:36,861 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-19 02:35:39,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4233870.0, ans=0.125 2024-08-19 02:35:51,385 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 23 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-19 02:35:54,848 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 7800, loss[loss=0.1033, beats_loss=0.01066, ecapa_loss=0.0001144, whisper_loss=0.09149, over 22302.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01039, ecapa_loss=0.0001437, whisper_loss=0.08974, over 3882378.01 frames. ], batch size: 89, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:35:58,568 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 24 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-19 02:36:12,698 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-19 02:36:15,040 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.304e+01 2.560e+01 2.905e+01 1.988e+02, threshold=5.119e+01, percent-clipped=2.0 2024-08-19 02:36:28,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4234270.0, ans=0.125 2024-08-19 02:36:57,824 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 7850, loss[loss=0.09683, beats_loss=0.009132, ecapa_loss=0.000171, whisper_loss=0.08599, over 13279.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01041, ecapa_loss=0.0001436, whisper_loss=0.08953, over 3867040.80 frames. ], batch size: 54, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:37:07,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4234570.0, ans=0.1 2024-08-19 02:37:19,270 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 23 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-19 02:37:50,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4234970.0, ans=0.2 2024-08-19 02:37:55,123 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 13 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-19 02:37:58,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4234970.0, ans=0.125 2024-08-19 02:38:00,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4235070.0, ans=0.125 2024-08-19 02:38:01,008 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 7900, loss[loss=0.1077, beats_loss=0.01151, ecapa_loss=0.0001339, whisper_loss=0.09486, over 23338.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01056, ecapa_loss=0.0001427, whisper_loss=0.08951, over 3882815.68 frames. ], batch size: 94, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:38:02,247 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-19 02:38:07,116 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.61 vs. limit=15.0 2024-08-19 02:38:10,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4235070.0, ans=0.1 2024-08-19 02:38:14,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4235170.0, ans=0.1 2024-08-19 02:38:20,846 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+01 2.393e+01 2.663e+01 2.999e+01 6.865e+01, threshold=5.327e+01, percent-clipped=3.0 2024-08-19 02:38:25,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4235270.0, ans=0.1 2024-08-19 02:38:34,649 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.54 vs. limit=15.0 2024-08-19 02:38:37,421 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-19 02:38:41,138 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 24 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-19 02:38:41,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4235370.0, ans=0.2 2024-08-19 02:38:44,852 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 23 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-19 02:38:52,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4235470.0, ans=0.0 2024-08-19 02:39:01,431 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.59 vs. limit=15.0 2024-08-19 02:39:03,381 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 7950, loss[loss=0.08306, beats_loss=0.01361, ecapa_loss=0.0001296, whisper_loss=0.06815, over 22087.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01062, ecapa_loss=0.0001418, whisper_loss=0.08928, over 3872287.28 frames. ], batch size: 93, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:39:13,352 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 29 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-19 02:39:19,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4235670.0, ans=0.0 2024-08-19 02:39:34,125 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-19 02:39:44,997 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 29 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-19 02:39:50,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4235870.0, ans=0.125 2024-08-19 02:39:51,983 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.03 vs. limit=10.0 2024-08-19 02:39:52,373 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 15 from Vox, 47 fro AS 2024-08-19 02:39:55,140 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-19 02:39:57,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4235970.0, ans=0.125 2024-08-19 02:40:04,515 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 8000, loss[loss=0.1079, beats_loss=0.01063, ecapa_loss=0.0001599, whisper_loss=0.09566, over 18574.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0105, ecapa_loss=0.000142, whisper_loss=0.09053, over 3894331.36 frames. ], batch size: 75, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:40:09,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4236070.0, ans=0.1 2024-08-19 02:40:14,450 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 25 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-19 02:40:15,635 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 31 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-19 02:40:24,235 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.236e+01 2.508e+01 2.783e+01 4.268e+01, threshold=5.017e+01, percent-clipped=0.0 2024-08-19 02:40:41,919 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 35 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-19 02:41:05,888 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 8050, loss[loss=0.08566, beats_loss=0.011, ecapa_loss=0.0001345, whisper_loss=0.07332, over 22835.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01044, ecapa_loss=0.0001417, whisper_loss=0.0905, over 3910098.84 frames. ], batch size: 92, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:41:10,038 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.07 vs. limit=10.0 2024-08-19 02:41:17,170 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-19 02:41:33,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4236770.0, ans=0.0 2024-08-19 02:41:41,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4236770.0, ans=0.0 2024-08-19 02:41:41,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4236770.0, ans=0.1 2024-08-19 02:41:43,787 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.99 vs. limit=22.5 2024-08-19 02:41:46,410 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.95 vs. limit=15.0 2024-08-19 02:41:49,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4236870.0, ans=0.125 2024-08-19 02:42:07,602 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 8100, loss[loss=0.0807, beats_loss=0.01175, ecapa_loss=0.0001643, whisper_loss=0.06731, over 14260.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01043, ecapa_loss=0.0001425, whisper_loss=0.09045, over 3919968.13 frames. ], batch size: 60, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:42:16,823 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.31 vs. limit=15.0 2024-08-19 02:42:27,406 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.244e+01 2.525e+01 2.786e+01 3.995e+01, threshold=5.049e+01, percent-clipped=0.0 2024-08-19 02:42:33,461 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-19 02:42:39,858 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.05 vs. limit=15.0 2024-08-19 02:42:46,592 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 20 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-19 02:42:46,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4237370.0, ans=0.125 2024-08-19 02:42:48,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4237370.0, ans=0.125 2024-08-19 02:42:53,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4237370.0, ans=0.125 2024-08-19 02:43:06,330 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 20 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-19 02:43:08,500 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 8150, loss[loss=0.09819, beats_loss=0.01098, ecapa_loss=0.0001325, whisper_loss=0.08588, over 20518.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01045, ecapa_loss=0.0001414, whisper_loss=0.09029, over 3945315.85 frames. ], batch size: 82, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:43:13,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4237570.0, ans=10.0 2024-08-19 02:43:16,074 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 24 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-19 02:43:45,023 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-19 02:43:52,405 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 14 from LS+wenet, 22 from Vox, 18 fro AS 2024-08-19 02:44:02,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4237970.0, ans=0.125 2024-08-19 02:44:09,551 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 8200, loss[loss=0.1058, beats_loss=0.01088, ecapa_loss=0.0001203, whisper_loss=0.09372, over 16291.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01047, ecapa_loss=0.0001417, whisper_loss=0.08989, over 3932230.30 frames. ], batch size: 62, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:44:20,645 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-19 02:44:26,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4238170.0, ans=0.2 2024-08-19 02:44:28,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4238170.0, ans=0.07 2024-08-19 02:44:28,891 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.281e+01 2.470e+01 2.775e+01 3.796e+01, threshold=4.940e+01, percent-clipped=0.0 2024-08-19 02:44:33,337 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.16 vs. limit=15.0 2024-08-19 02:44:39,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4238270.0, ans=0.125 2024-08-19 02:44:56,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4238370.0, ans=0.0 2024-08-19 02:44:59,695 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 19 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-19 02:45:01,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4238470.0, ans=0.0 2024-08-19 02:45:01,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4238470.0, ans=0.0 2024-08-19 02:45:10,505 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 8250, loss[loss=0.1213, beats_loss=0.008436, ecapa_loss=0.0001395, whisper_loss=0.1115, over 20550.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01048, ecapa_loss=0.0001417, whisper_loss=0.08958, over 3897348.23 frames. ], batch size: 78, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:45:19,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4238570.0, ans=0.0 2024-08-19 02:45:41,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4238770.0, ans=0.125 2024-08-19 02:45:43,143 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-19 02:45:50,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4238870.0, ans=0.125 2024-08-19 02:46:08,062 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-19 02:46:12,818 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 8300, loss[loss=0.08751, beats_loss=0.01136, ecapa_loss=0.0001705, whisper_loss=0.07445, over 16044.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01052, ecapa_loss=0.0001406, whisper_loss=0.08955, over 3913358.14 frames. ], batch size: 68, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:46:30,848 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.38 vs. limit=15.0 2024-08-19 02:46:32,636 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.960e+01 2.439e+01 2.594e+01 2.906e+01 5.754e+01, threshold=5.188e+01, percent-clipped=1.0 2024-08-19 02:46:36,452 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-19 02:46:38,784 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 16 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-19 02:46:39,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4239270.0, ans=0.125 2024-08-19 02:46:50,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4239370.0, ans=0.0 2024-08-19 02:46:58,451 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-19 02:47:04,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4239470.0, ans=0.125 2024-08-19 02:47:04,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4239470.0, ans=0.125 2024-08-19 02:47:11,807 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 30 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-19 02:47:13,430 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.37 vs. limit=15.0 2024-08-19 02:47:13,555 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.89 vs. limit=15.0 2024-08-19 02:47:13,979 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 8350, loss[loss=0.1152, beats_loss=0.009293, ecapa_loss=0.0001559, whisper_loss=0.1044, over 16162.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01054, ecapa_loss=0.0001409, whisper_loss=0.08931, over 3900010.68 frames. ], batch size: 66, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:47:16,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4239570.0, ans=0.125 2024-08-19 02:47:32,408 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-19 02:47:37,221 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-19 02:47:37,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4239770.0, ans=0.0 2024-08-19 02:47:41,672 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 21 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-19 02:47:43,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4239770.0, ans=0.2 2024-08-19 02:47:44,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4239770.0, ans=0.0 2024-08-19 02:47:46,157 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.62 vs. limit=15.0 2024-08-19 02:47:49,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4239870.0, ans=0.1 2024-08-19 02:47:50,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4239870.0, ans=0.0 2024-08-19 02:47:56,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4239870.0, ans=0.125 2024-08-19 02:48:04,974 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-424000.pt 2024-08-19 02:48:11,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4239970.0, ans=0.1 2024-08-19 02:48:17,199 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 8400, loss[loss=0.118, beats_loss=0.00697, ecapa_loss=0.0001687, whisper_loss=0.1094, over 15470.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01044, ecapa_loss=0.0001421, whisper_loss=0.0899, over 3927304.43 frames. ], batch size: 60, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:48:31,827 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 33 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-19 02:48:36,821 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.697e+01 2.296e+01 2.515e+01 2.691e+01 3.893e+01, threshold=5.031e+01, percent-clipped=0.0 2024-08-19 02:48:46,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4240270.0, ans=0.125 2024-08-19 02:48:48,592 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.15 vs. limit=10.0 2024-08-19 02:48:53,215 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.86 vs. limit=15.0 2024-08-19 02:48:56,115 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.14 vs. limit=15.0 2024-08-19 02:49:09,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4240470.0, ans=0.0 2024-08-19 02:49:12,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4240470.0, ans=0.2 2024-08-19 02:49:16,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4240470.0, ans=0.1 2024-08-19 02:49:18,308 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 8450, loss[loss=0.1029, beats_loss=0.009498, ecapa_loss=0.0001428, whisper_loss=0.09193, over 20686.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01035, ecapa_loss=0.0001433, whisper_loss=0.0907, over 3933350.54 frames. ], batch size: 82, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:49:28,085 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.03 vs. limit=22.5 2024-08-19 02:49:39,017 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-19 02:49:40,284 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-19 02:49:51,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4240770.0, ans=0.125 2024-08-19 02:49:53,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4240870.0, ans=0.2 2024-08-19 02:49:55,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4240870.0, ans=0.125 2024-08-19 02:49:56,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=4240870.0, ans=0.025 2024-08-19 02:49:59,809 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-19 02:50:01,017 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 30 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-19 02:50:08,924 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.97 vs. limit=15.0 2024-08-19 02:50:10,792 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-19 02:50:16,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4240970.0, ans=0.1 2024-08-19 02:50:18,814 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 8500, loss[loss=0.1106, beats_loss=0.009129, ecapa_loss=0.0001045, whisper_loss=0.1004, over 14951.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01032, ecapa_loss=0.0001426, whisper_loss=0.09097, over 3926073.92 frames. ], batch size: 53, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:50:18,963 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-19 02:50:20,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4241070.0, ans=0.2 2024-08-19 02:50:31,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4241170.0, ans=0.1 2024-08-19 02:50:33,416 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-19 02:50:37,936 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.287e+01 2.464e+01 2.780e+01 3.780e+01, threshold=4.928e+01, percent-clipped=0.0 2024-08-19 02:50:38,583 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.59 vs. limit=15.0 2024-08-19 02:50:39,248 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 22 from LS+wenet, 24 from Vox, 47 fro AS 2024-08-19 02:50:42,951 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 11 from Vox, 38 fro AS 2024-08-19 02:50:46,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4241270.0, ans=0.125 2024-08-19 02:50:52,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4241270.0, ans=0.0 2024-08-19 02:50:58,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4241370.0, ans=0.0 2024-08-19 02:51:19,386 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 8550, loss[loss=0.1306, beats_loss=0.009591, ecapa_loss=0.0001237, whisper_loss=0.1198, over 21373.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01031, ecapa_loss=0.0001416, whisper_loss=0.09156, over 3941530.71 frames. ], batch size: 79, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:51:35,035 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.20 vs. limit=15.0 2024-08-19 02:51:35,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4241670.0, ans=0.125 2024-08-19 02:51:35,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4241670.0, ans=0.0 2024-08-19 02:51:44,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4241770.0, ans=0.2 2024-08-19 02:51:49,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4241770.0, ans=0.125 2024-08-19 02:51:58,673 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.51 vs. limit=10.0 2024-08-19 02:52:03,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4241870.0, ans=0.0 2024-08-19 02:52:05,658 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 21 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-19 02:52:09,606 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.41 vs. limit=12.0 2024-08-19 02:52:21,322 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 8600, loss[loss=0.09509, beats_loss=0.009845, ecapa_loss=0.000123, whisper_loss=0.08401, over 15383.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01025, ecapa_loss=0.0001431, whisper_loss=0.09195, over 3944893.31 frames. ], batch size: 59, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:52:24,780 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 27 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-19 02:52:30,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4242070.0, ans=0.125 2024-08-19 02:52:32,420 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-19 02:52:42,034 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.756e+01 2.302e+01 2.568e+01 2.849e+01 4.123e+01, threshold=5.135e+01, percent-clipped=0.0 2024-08-19 02:52:51,432 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-19 02:52:57,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4242270.0, ans=0.0 2024-08-19 02:52:57,756 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.26 vs. limit=15.0 2024-08-19 02:52:59,914 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-19 02:53:22,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4242470.0, ans=0.05 2024-08-19 02:53:24,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4242470.0, ans=0.125 2024-08-19 02:53:30,358 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 8650, loss[loss=0.1061, beats_loss=0.00971, ecapa_loss=0.0001532, whisper_loss=0.09487, over 22084.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01023, ecapa_loss=0.0001431, whisper_loss=0.092, over 3948660.50 frames. ], batch size: 92, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:53:41,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4242570.0, ans=0.125 2024-08-19 02:53:46,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4242670.0, ans=0.125 2024-08-19 02:53:47,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4242670.0, ans=0.0 2024-08-19 02:53:53,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4242670.0, ans=0.125 2024-08-19 02:54:23,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4242870.0, ans=0.0 2024-08-19 02:54:26,198 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-19 02:54:33,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4242970.0, ans=0.0 2024-08-19 02:54:44,485 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 8700, loss[loss=0.1011, beats_loss=0.01285, ecapa_loss=0.0001391, whisper_loss=0.08685, over 14039.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0103, ecapa_loss=0.0001434, whisper_loss=0.09154, over 3923661.79 frames. ], batch size: 57, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:54:47,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4243070.0, ans=0.0 2024-08-19 02:54:49,011 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.41 vs. limit=15.0 2024-08-19 02:54:52,057 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 21 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-19 02:54:53,310 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-19 02:55:04,108 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.365e+01 2.539e+01 2.840e+01 3.770e+01, threshold=5.079e+01, percent-clipped=0.0 2024-08-19 02:55:16,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4243270.0, ans=0.1 2024-08-19 02:55:24,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4243370.0, ans=0.125 2024-08-19 02:55:36,858 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 02:55:43,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4243470.0, ans=0.125 2024-08-19 02:55:45,118 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 8750, loss[loss=0.1061, beats_loss=0.01016, ecapa_loss=0.0001561, whisper_loss=0.09433, over 20223.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01043, ecapa_loss=0.0001427, whisper_loss=0.09083, over 3870542.51 frames. ], batch size: 81, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:55:47,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4243570.0, ans=0.2 2024-08-19 02:55:55,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4243570.0, ans=0.125 2024-08-19 02:56:18,147 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 24 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-19 02:56:35,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4243970.0, ans=0.125 2024-08-19 02:56:37,712 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 19 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-19 02:56:44,013 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 29 from LS+wenet, 28 from Vox, 22 fro AS 2024-08-19 02:56:46,333 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 8800, loss[loss=0.1073, beats_loss=0.01008, ecapa_loss=0.0001075, whisper_loss=0.09612, over 22770.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01049, ecapa_loss=0.0001418, whisper_loss=0.09067, over 3864794.71 frames. ], batch size: 84, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:56:56,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4244070.0, ans=0.1 2024-08-19 02:56:58,681 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-19 02:57:05,907 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.929e+01 2.300e+01 2.475e+01 2.796e+01 3.899e+01, threshold=4.950e+01, percent-clipped=0.0 2024-08-19 02:57:06,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4244170.0, ans=0.125 2024-08-19 02:57:07,200 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-19 02:57:11,391 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.23 vs. limit=15.0 2024-08-19 02:57:23,089 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 27 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-19 02:57:32,775 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 15 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-19 02:57:36,900 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.897e-02 2024-08-19 02:57:39,241 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 29 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-19 02:57:47,820 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 8850, loss[loss=0.1148, beats_loss=0.008598, ecapa_loss=0.0001397, whisper_loss=0.1048, over 14186.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01063, ecapa_loss=0.000141, whisper_loss=0.08943, over 3858649.96 frames. ], batch size: 54, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:57:48,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4244570.0, ans=0.1 2024-08-19 02:57:54,063 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-19 02:58:00,388 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 30 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-19 02:58:00,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4244670.0, ans=0.2 2024-08-19 02:58:01,638 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 22 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-19 02:58:04,454 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.20 vs. limit=15.0 2024-08-19 02:58:17,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4244770.0, ans=0.1 2024-08-19 02:58:37,945 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.09 vs. limit=15.0 2024-08-19 02:58:39,766 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 24 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-19 02:58:49,452 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 8900, loss[loss=0.0843, beats_loss=0.01194, ecapa_loss=0.0001262, whisper_loss=0.0711, over 17735.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01059, ecapa_loss=0.0001409, whisper_loss=0.08985, over 3878438.11 frames. ], batch size: 71, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:58:49,607 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 26 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-19 02:58:52,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4245070.0, ans=0.0 2024-08-19 02:58:57,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4245070.0, ans=0.0 2024-08-19 02:58:58,320 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-19 02:59:08,989 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.326e+01 2.543e+01 2.750e+01 4.033e+01, threshold=5.087e+01, percent-clipped=0.0 2024-08-19 02:59:10,434 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-19 02:59:11,805 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 25 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-19 02:59:23,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4245270.0, ans=0.0 2024-08-19 02:59:25,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4245370.0, ans=0.125 2024-08-19 02:59:35,496 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 25 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-19 02:59:37,950 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 22 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-19 02:59:44,193 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-19 02:59:48,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4245470.0, ans=0.0 2024-08-19 02:59:51,576 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 8950, loss[loss=0.1132, beats_loss=0.007671, ecapa_loss=0.0001594, whisper_loss=0.104, over 22523.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01063, ecapa_loss=0.0001411, whisper_loss=0.08933, over 3859154.28 frames. ], batch size: 89, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:59:52,924 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 18 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-19 02:59:55,577 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 14 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-19 03:00:16,642 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 24 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-19 03:00:22,753 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 21 from LS+wenet, 20 from Vox, 17 fro AS 2024-08-19 03:00:28,246 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.21 vs. limit=15.0 2024-08-19 03:00:30,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4245870.0, ans=0.125 2024-08-19 03:00:32,826 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 31 from Vox, 30 fro AS 2024-08-19 03:00:33,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4245870.0, ans=0.125 2024-08-19 03:00:41,500 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-19 03:00:53,890 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 9000, loss[loss=0.1129, beats_loss=0.008806, ecapa_loss=0.0001502, whisper_loss=0.1026, over 22452.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01065, ecapa_loss=0.0001417, whisper_loss=0.08879, over 3838007.30 frames. ], batch size: 90, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 03:00:53,891 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-19 03:01:30,313 INFO [train_multi_KD3.py:1149] (0/4) Epoch 29, validation on ASR_libri: loss=0.2527, beats_loss=0, ecapa_loss=0.0005203, whisper_loss=0.2475, over 922467.00 frames. 2024-08-19 03:01:46,074 INFO [train_multi_KD3.py:1149] (0/4) Epoch 29, validation on SV_voxceleb1: loss=0.004041, beats_loss=0, ecapa_loss=0.0004041, whisper_loss=0, over 939242.00 frames. 2024-08-19 03:03:34,106 INFO [train_multi_KD3.py:1149] (0/4) Epoch 29, validation on AT_audioset: loss=0.02316, beats_loss=0.02316, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 03:03:34,117 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32389MB 2024-08-19 03:03:35,435 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-19 03:03:47,575 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 24 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-19 03:03:47,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4246170.0, ans=0.125 2024-08-19 03:03:53,422 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.336e+01 2.586e+01 2.879e+01 3.784e+01, threshold=5.173e+01, percent-clipped=0.0 2024-08-19 03:04:03,824 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 03:04:06,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4246270.0, ans=0.125 2024-08-19 03:04:13,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4246370.0, ans=0.125 2024-08-19 03:04:19,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4246370.0, ans=0.1 2024-08-19 03:04:28,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4246470.0, ans=0.2 2024-08-19 03:04:29,257 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 25 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-19 03:04:35,618 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 9050, loss[loss=0.1151, beats_loss=0.01121, ecapa_loss=0.0001352, whisper_loss=0.1025, over 23261.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01052, ecapa_loss=0.0001427, whisper_loss=0.0898, over 3840993.64 frames. ], batch size: 94, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 03:04:38,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4246570.0, ans=0.1 2024-08-19 03:05:06,500 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 26 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-19 03:05:13,784 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-19 03:05:23,965 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 15 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-19 03:05:37,339 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 9100, loss[loss=0.1006, beats_loss=0.01242, ecapa_loss=0.0001386, whisper_loss=0.08683, over 22564.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01051, ecapa_loss=0.0001437, whisper_loss=0.0898, over 3875686.65 frames. ], batch size: 89, lr: 2.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:05:48,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4247170.0, ans=0.125 2024-08-19 03:05:49,732 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 40 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-19 03:05:51,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4247170.0, ans=0.125 2024-08-19 03:05:58,188 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.412e+01 2.655e+01 2.932e+01 4.507e+01, threshold=5.309e+01, percent-clipped=0.0 2024-08-19 03:06:04,525 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-19 03:06:22,715 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 15 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-19 03:06:27,403 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 30 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-19 03:06:38,269 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 9150, loss[loss=0.09036, beats_loss=0.01395, ecapa_loss=0.0001265, whisper_loss=0.07514, over 21689.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01056, ecapa_loss=0.0001436, whisper_loss=0.08966, over 3888920.81 frames. ], batch size: 89, lr: 2.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:07:12,466 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.42 vs. limit=15.0 2024-08-19 03:07:15,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4247870.0, ans=0.125 2024-08-19 03:07:19,552 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 17 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-19 03:07:20,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4247870.0, ans=0.0 2024-08-19 03:07:27,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4247870.0, ans=0.125 2024-08-19 03:07:27,388 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.87 vs. limit=15.0 2024-08-19 03:07:31,386 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-19 03:07:38,908 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 20 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-19 03:07:39,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4247970.0, ans=0.2 2024-08-19 03:07:42,269 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 9200, loss[loss=0.1099, beats_loss=0.01099, ecapa_loss=0.0001401, whisper_loss=0.09756, over 22874.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01054, ecapa_loss=0.0001442, whisper_loss=0.08974, over 3937413.04 frames. ], batch size: 91, lr: 2.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:07:42,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4248070.0, ans=0.1 2024-08-19 03:07:50,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=4248070.0, ans=15.0 2024-08-19 03:08:04,112 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.284e+01 2.554e+01 2.841e+01 4.515e+02, threshold=5.108e+01, percent-clipped=1.0 2024-08-19 03:08:04,273 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 26 from LS+wenet, 14 from Vox, 17 fro AS 2024-08-19 03:08:19,941 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 24 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-19 03:08:23,801 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-19 03:08:47,107 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 9250, loss[loss=0.09293, beats_loss=0.01193, ecapa_loss=0.0001822, whisper_loss=0.07918, over 20282.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01059, ecapa_loss=0.000144, whisper_loss=0.08932, over 3940137.14 frames. ], batch size: 86, lr: 2.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:09:05,990 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.43 vs. limit=22.5 2024-08-19 03:09:08,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4248670.0, ans=0.0 2024-08-19 03:09:31,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4248870.0, ans=0.125 2024-08-19 03:09:48,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4248970.0, ans=0.125 2024-08-19 03:09:54,405 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 9300, loss[loss=0.1043, beats_loss=0.007942, ecapa_loss=0.0001607, whisper_loss=0.09477, over 14691.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01055, ecapa_loss=0.0001431, whisper_loss=0.08931, over 3929771.70 frames. ], batch size: 57, lr: 2.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:10:00,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4249070.0, ans=0.2 2024-08-19 03:10:01,219 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 03:10:17,512 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.468e+01 2.687e+01 3.069e+01 9.721e+01, threshold=5.373e+01, percent-clipped=1.0 2024-08-19 03:10:28,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4249270.0, ans=0.1 2024-08-19 03:10:30,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4249270.0, ans=0.0 2024-08-19 03:10:47,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4249470.0, ans=0.0 2024-08-19 03:10:52,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4249470.0, ans=0.125 2024-08-19 03:11:01,529 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 9350, loss[loss=0.09453, beats_loss=0.01053, ecapa_loss=0.0001425, whisper_loss=0.08257, over 22515.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01048, ecapa_loss=0.0001431, whisper_loss=0.08932, over 3902293.75 frames. ], batch size: 93, lr: 2.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:11:13,079 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.51 vs. limit=15.0 2024-08-19 03:11:13,684 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 19 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-19 03:11:18,012 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=11.73 vs. limit=12.0 2024-08-19 03:11:21,587 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-19 03:11:29,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4249770.0, ans=0.0 2024-08-19 03:11:42,543 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 25 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-19 03:11:49,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4249870.0, ans=0.125 2024-08-19 03:12:08,112 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 9400, loss[loss=0.08595, beats_loss=0.009962, ecapa_loss=0.0001343, whisper_loss=0.07464, over 16976.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01048, ecapa_loss=0.0001428, whisper_loss=0.08889, over 3893343.25 frames. ], batch size: 67, lr: 2.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:12:22,759 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-19 03:12:24,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4250170.0, ans=0.125 2024-08-19 03:12:32,928 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.288e+01 2.511e+01 2.753e+01 4.434e+01, threshold=5.022e+01, percent-clipped=0.0 2024-08-19 03:12:53,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4250370.0, ans=0.125 2024-08-19 03:13:06,481 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.06 vs. limit=15.0 2024-08-19 03:13:14,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4250470.0, ans=0.0 2024-08-19 03:13:16,649 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 12 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-19 03:13:16,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4250570.0, ans=0.125 2024-08-19 03:13:17,887 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 9450, loss[loss=0.07834, beats_loss=0.009202, ecapa_loss=0.0001917, whisper_loss=0.06723, over 13055.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01051, ecapa_loss=0.000143, whisper_loss=0.08829, over 3869744.99 frames. ], batch size: 56, lr: 2.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:13:25,278 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.95 vs. limit=15.0 2024-08-19 03:13:54,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4250770.0, ans=0.0 2024-08-19 03:14:06,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4250870.0, ans=0.125 2024-08-19 03:14:07,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4250870.0, ans=0.2 2024-08-19 03:14:26,765 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 9500, loss[loss=0.1095, beats_loss=0.0108, ecapa_loss=0.000131, whisper_loss=0.09742, over 22691.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01048, ecapa_loss=0.0001429, whisper_loss=0.08853, over 3850837.84 frames. ], batch size: 89, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:14:28,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4251070.0, ans=0.125 2024-08-19 03:14:50,643 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.743e+01 2.262e+01 2.501e+01 2.810e+01 4.302e+01, threshold=5.003e+01, percent-clipped=0.0 2024-08-19 03:14:59,451 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 38 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-19 03:15:02,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4251270.0, ans=0.0 2024-08-19 03:15:08,645 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 31 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-19 03:15:13,123 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.97 vs. limit=15.0 2024-08-19 03:15:18,441 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.70 vs. limit=15.0 2024-08-19 03:15:19,148 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 27 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-19 03:15:22,381 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-19 03:15:36,375 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 9550, loss[loss=0.0913, beats_loss=0.009651, ecapa_loss=0.0001906, whisper_loss=0.07974, over 19419.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01041, ecapa_loss=0.0001432, whisper_loss=0.0887, over 3850053.22 frames. ], batch size: 83, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:15:53,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4251670.0, ans=0.125 2024-08-19 03:16:01,476 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 17 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-19 03:16:05,120 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.96 vs. limit=15.0 2024-08-19 03:16:13,900 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-19 03:16:19,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4251870.0, ans=0.125 2024-08-19 03:16:29,437 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.13 vs. limit=15.0 2024-08-19 03:16:31,223 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 21 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-19 03:16:44,660 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 9600, loss[loss=0.1219, beats_loss=0.009922, ecapa_loss=0.0001246, whisper_loss=0.1107, over 23905.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01042, ecapa_loss=0.0001428, whisper_loss=0.08929, over 3865678.47 frames. ], batch size: 92, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:17:08,140 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.320e+01 2.551e+01 2.874e+01 5.589e+01, threshold=5.101e+01, percent-clipped=1.0 2024-08-19 03:17:16,876 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 23 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-19 03:17:26,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4252370.0, ans=0.125 2024-08-19 03:17:39,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4252470.0, ans=0.0 2024-08-19 03:17:52,968 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 9650, loss[loss=0.1062, beats_loss=0.01038, ecapa_loss=0.0001194, whisper_loss=0.09462, over 14262.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01037, ecapa_loss=0.000143, whisper_loss=0.08878, over 3815403.89 frames. ], batch size: 53, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:17:53,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4252570.0, ans=0.2 2024-08-19 03:18:07,170 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=15.0 2024-08-19 03:18:26,452 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4252770.0, ans=0.125 2024-08-19 03:18:55,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4252970.0, ans=0.125 2024-08-19 03:19:02,191 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 9700, loss[loss=0.08765, beats_loss=0.007826, ecapa_loss=0.0001488, whisper_loss=0.07834, over 15518.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01026, ecapa_loss=0.0001452, whisper_loss=0.08933, over 3818103.42 frames. ], batch size: 57, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:19:17,358 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.44 vs. limit=12.0 2024-08-19 03:19:18,745 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.92 vs. limit=15.0 2024-08-19 03:19:24,447 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.350e+01 2.550e+01 2.854e+01 4.797e+01, threshold=5.099e+01, percent-clipped=0.0 2024-08-19 03:19:30,561 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 33 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-19 03:19:32,018 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 15 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-19 03:19:41,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4253370.0, ans=0.0 2024-08-19 03:19:49,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4253370.0, ans=0.125 2024-08-19 03:20:09,344 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 9750, loss[loss=0.1067, beats_loss=0.0107, ecapa_loss=0.0001364, whisper_loss=0.09459, over 22349.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01035, ecapa_loss=0.0001438, whisper_loss=0.0897, over 3825485.95 frames. ], batch size: 87, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:20:17,492 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 23 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-19 03:20:21,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=4253670.0, ans=0.5 2024-08-19 03:20:23,674 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.53 vs. limit=10.0 2024-08-19 03:20:26,823 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 17 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-19 03:20:27,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4253670.0, ans=0.025 2024-08-19 03:20:31,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4253670.0, ans=0.1 2024-08-19 03:20:31,738 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.28 vs. limit=5.0 2024-08-19 03:20:39,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4253770.0, ans=0.5 2024-08-19 03:20:45,452 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-19 03:20:50,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4253870.0, ans=0.0 2024-08-19 03:20:50,429 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.24 vs. limit=15.0 2024-08-19 03:21:04,143 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.19 vs. limit=22.5 2024-08-19 03:21:16,992 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 9800, loss[loss=0.1246, beats_loss=0.008681, ecapa_loss=0.0001409, whisper_loss=0.1145, over 24034.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01041, ecapa_loss=0.0001441, whisper_loss=0.08936, over 3814915.10 frames. ], batch size: 91, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:21:19,157 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.38 vs. limit=15.0 2024-08-19 03:21:30,218 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.72 vs. limit=10.0 2024-08-19 03:21:31,391 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 31 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-19 03:21:36,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4254170.0, ans=0.0 2024-08-19 03:21:40,856 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.265e+01 2.575e+01 2.940e+01 5.043e+01, threshold=5.150e+01, percent-clipped=0.0 2024-08-19 03:21:47,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4254270.0, ans=0.025 2024-08-19 03:21:53,343 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.98 vs. limit=15.0 2024-08-19 03:22:04,092 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 20 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-19 03:22:07,279 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-19 03:22:08,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4254370.0, ans=0.125 2024-08-19 03:22:18,703 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.71 vs. limit=10.0 2024-08-19 03:22:21,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4254470.0, ans=0.2 2024-08-19 03:22:22,133 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 32 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-19 03:22:27,193 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 9850, loss[loss=0.1032, beats_loss=0.01207, ecapa_loss=0.0001303, whisper_loss=0.08982, over 21548.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01047, ecapa_loss=0.0001442, whisper_loss=0.08968, over 3822393.61 frames. ], batch size: 87, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:22:31,988 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.90 vs. limit=6.0 2024-08-19 03:22:36,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4254570.0, ans=0.0 2024-08-19 03:22:46,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4254670.0, ans=0.125 2024-08-19 03:22:50,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4254670.0, ans=0.1 2024-08-19 03:22:51,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4254670.0, ans=0.0 2024-08-19 03:23:10,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4254870.0, ans=0.0 2024-08-19 03:23:26,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4254970.0, ans=0.125 2024-08-19 03:23:36,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4255070.0, ans=0.0 2024-08-19 03:23:38,061 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 9900, loss[loss=0.1158, beats_loss=0.009577, ecapa_loss=0.0001525, whisper_loss=0.1047, over 14994.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01051, ecapa_loss=0.0001432, whisper_loss=0.08976, over 3860894.49 frames. ], batch size: 61, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:23:57,330 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.64 vs. limit=15.0 2024-08-19 03:24:01,888 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.746e+01 2.258e+01 2.526e+01 2.831e+01 1.628e+02, threshold=5.053e+01, percent-clipped=0.0 2024-08-19 03:24:05,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4255270.0, ans=0.0 2024-08-19 03:24:21,662 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.82 vs. limit=15.0 2024-08-19 03:24:43,799 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-19 03:24:49,066 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 9950, loss[loss=0.1099, beats_loss=0.01141, ecapa_loss=0.0001417, whisper_loss=0.09703, over 19862.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01062, ecapa_loss=0.0001432, whisper_loss=0.0899, over 3877733.02 frames. ], batch size: 76, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:24:51,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4255570.0, ans=0.125 2024-08-19 03:25:23,134 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 21 from LS+wenet, 10 from Vox, 23 fro AS 2024-08-19 03:25:51,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4255970.0, ans=0.125 2024-08-19 03:26:02,733 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 10000, loss[loss=0.1068, beats_loss=0.01134, ecapa_loss=0.000131, whisper_loss=0.0941, over 21211.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01061, ecapa_loss=0.0001437, whisper_loss=0.09039, over 3829171.91 frames. ], batch size: 86, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:26:06,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4256070.0, ans=0.125 2024-08-19 03:26:14,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4256070.0, ans=0.125 2024-08-19 03:26:17,133 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-19 03:26:30,163 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.33 vs. limit=15.0 2024-08-19 03:26:30,544 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.221e+01 2.512e+01 2.759e+01 2.738e+02, threshold=5.023e+01, percent-clipped=3.0 2024-08-19 03:26:48,668 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-19 03:27:03,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4256370.0, ans=0.125 2024-08-19 03:27:19,981 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 10050, loss[loss=0.08977, beats_loss=0.01237, ecapa_loss=0.0001526, whisper_loss=0.07587, over 21224.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01049, ecapa_loss=0.0001433, whisper_loss=0.09019, over 3834675.63 frames. ], batch size: 92, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:27:32,506 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-19 03:27:38,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=4256670.0, ans=6.0 2024-08-19 03:27:50,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4256770.0, ans=0.09899494936611666 2024-08-19 03:27:52,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4256770.0, ans=0.125 2024-08-19 03:27:55,120 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-19 03:27:56,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4256770.0, ans=0.2 2024-08-19 03:28:01,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4256770.0, ans=0.125 2024-08-19 03:28:08,369 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 22 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-19 03:28:11,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4256870.0, ans=0.0 2024-08-19 03:28:13,907 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 26 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-19 03:28:36,296 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 10100, loss[loss=0.07637, beats_loss=0.01159, ecapa_loss=0.0001583, whisper_loss=0.0632, over 15129.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01055, ecapa_loss=0.0001422, whisper_loss=0.09051, over 3844867.83 frames. ], batch size: 64, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:28:56,950 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-19 03:29:01,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4257170.0, ans=0.1 2024-08-19 03:29:02,274 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.774e+01 2.419e+01 2.709e+01 3.030e+01 4.080e+01, threshold=5.418e+01, percent-clipped=0.0 2024-08-19 03:29:07,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4257270.0, ans=0.2 2024-08-19 03:29:24,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4257370.0, ans=0.125 2024-08-19 03:29:32,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4257370.0, ans=0.125 2024-08-19 03:29:37,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4257370.0, ans=0.125 2024-08-19 03:29:56,138 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 10150, loss[loss=0.08633, beats_loss=0.01231, ecapa_loss=0.0001131, whisper_loss=0.07289, over 22021.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01051, ecapa_loss=0.0001432, whisper_loss=0.09103, over 3884765.71 frames. ], batch size: 90, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:30:02,926 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 26 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-19 03:30:08,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4257570.0, ans=0.1 2024-08-19 03:30:13,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4257670.0, ans=0.0 2024-08-19 03:30:25,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4257770.0, ans=0.125 2024-08-19 03:30:34,575 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-19 03:30:41,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4257870.0, ans=0.125 2024-08-19 03:30:55,131 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 22 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-19 03:31:08,673 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 10200, loss[loss=0.1139, beats_loss=0.008619, ecapa_loss=0.0001566, whisper_loss=0.1037, over 18257.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01039, ecapa_loss=0.000144, whisper_loss=0.09153, over 3892967.90 frames. ], batch size: 74, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:31:11,222 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.50 vs. limit=22.5 2024-08-19 03:31:14,766 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-19 03:31:22,350 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.978e+01 2024-08-19 03:31:24,373 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-19 03:31:32,156 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.806e+01 2.333e+01 2.551e+01 2.847e+01 4.838e+01, threshold=5.102e+01, percent-clipped=0.0 2024-08-19 03:31:36,569 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-19 03:31:56,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4258370.0, ans=0.125 2024-08-19 03:32:15,683 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-19 03:32:17,784 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 10250, loss[loss=0.105, beats_loss=0.0112, ecapa_loss=0.0001398, whisper_loss=0.09236, over 19786.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01044, ecapa_loss=0.0001432, whisper_loss=0.0911, over 3909173.51 frames. ], batch size: 81, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:32:21,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4258570.0, ans=0.2 2024-08-19 03:32:28,494 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2024-08-19 03:32:30,801 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.22 vs. limit=15.0 2024-08-19 03:32:32,065 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-19 03:32:37,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4258670.0, ans=0.125 2024-08-19 03:33:11,018 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 28 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-19 03:33:27,112 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 10300, loss[loss=0.09802, beats_loss=0.01054, ecapa_loss=0.0001641, whisper_loss=0.08584, over 18992.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01039, ecapa_loss=0.0001424, whisper_loss=0.09135, over 3891541.40 frames. ], batch size: 79, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:33:45,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4259170.0, ans=0.04949747468305833 2024-08-19 03:33:49,511 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.334e+01 2.546e+01 2.816e+01 4.072e+01, threshold=5.092e+01, percent-clipped=0.0 2024-08-19 03:34:22,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4259470.0, ans=0.2 2024-08-19 03:34:23,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4259470.0, ans=0.125 2024-08-19 03:34:31,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4259470.0, ans=0.0 2024-08-19 03:34:35,653 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 10350, loss[loss=0.1145, beats_loss=0.009099, ecapa_loss=0.0001938, whisper_loss=0.1034, over 21932.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01042, ecapa_loss=0.0001419, whisper_loss=0.09134, over 3925909.33 frames. ], batch size: 94, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:34:45,889 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 28 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-19 03:34:47,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4259570.0, ans=0.0 2024-08-19 03:34:52,891 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 31 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-19 03:35:48,027 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 10400, loss[loss=0.06445, beats_loss=0.01462, ecapa_loss=0.0001167, whisper_loss=0.04866, over 14292.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01039, ecapa_loss=0.0001424, whisper_loss=0.09125, over 3885140.49 frames. ], batch size: 62, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:35:56,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4260070.0, ans=0.1 2024-08-19 03:36:01,186 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.70 vs. limit=15.0 2024-08-19 03:36:11,406 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.301e+01 2.515e+01 2.779e+01 4.056e+01, threshold=5.030e+01, percent-clipped=0.0 2024-08-19 03:36:18,421 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.21 vs. limit=22.5 2024-08-19 03:36:31,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=4260370.0, ans=0.5 2024-08-19 03:36:36,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4260370.0, ans=10.0 2024-08-19 03:36:46,290 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2024-08-19 03:36:54,294 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-19 03:36:54,782 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.53 vs. limit=22.5 2024-08-19 03:36:55,418 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 10450, loss[loss=0.1084, beats_loss=0.01143, ecapa_loss=0.0001045, whisper_loss=0.09596, over 16702.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0105, ecapa_loss=0.0001416, whisper_loss=0.09103, over 3898435.51 frames. ], batch size: 64, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:37:10,444 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-19 03:37:14,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4260670.0, ans=0.0 2024-08-19 03:37:19,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4260670.0, ans=0.1 2024-08-19 03:37:27,042 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.08 vs. limit=15.0 2024-08-19 03:37:28,633 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.410e+01 2024-08-19 03:38:07,841 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 10500, loss[loss=0.1043, beats_loss=0.009699, ecapa_loss=0.0001321, whisper_loss=0.09332, over 17827.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01048, ecapa_loss=0.0001424, whisper_loss=0.0899, over 3886082.50 frames. ], batch size: 68, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:38:08,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4261070.0, ans=0.125 2024-08-19 03:38:20,968 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 22 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-19 03:38:21,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4261170.0, ans=0.0 2024-08-19 03:38:23,472 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-19 03:38:25,369 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.10 vs. limit=15.0 2024-08-19 03:38:31,474 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.267e+01 2.482e+01 2.765e+01 1.931e+02, threshold=4.963e+01, percent-clipped=1.0 2024-08-19 03:38:34,536 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 29 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-19 03:38:55,855 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-19 03:38:58,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4261370.0, ans=0.125 2024-08-19 03:39:18,193 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 10550, loss[loss=0.09045, beats_loss=0.01235, ecapa_loss=0.0001163, whisper_loss=0.07694, over 17757.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0105, ecapa_loss=0.0001432, whisper_loss=0.08924, over 3874583.86 frames. ], batch size: 69, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:39:19,694 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 27 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-19 03:39:26,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4261570.0, ans=0.125 2024-08-19 03:39:27,861 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 33 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-19 03:39:31,305 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.76 vs. limit=10.0 2024-08-19 03:39:32,281 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 27 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-19 03:39:38,105 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.98 vs. limit=15.0 2024-08-19 03:40:08,891 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4261870.0, ans=0.1 2024-08-19 03:40:17,015 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.70 vs. limit=15.0 2024-08-19 03:40:20,261 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-19 03:40:28,609 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 10600, loss[loss=0.09919, beats_loss=0.01169, ecapa_loss=0.0001434, whisper_loss=0.08607, over 21264.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01057, ecapa_loss=0.0001429, whisper_loss=0.08885, over 3892046.90 frames. ], batch size: 89, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:40:39,655 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 38 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-19 03:40:52,818 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.950e+01 2.423e+01 2.630e+01 2.907e+01 3.949e+01, threshold=5.261e+01, percent-clipped=0.0 2024-08-19 03:41:08,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4262270.0, ans=0.125 2024-08-19 03:41:37,154 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.32 vs. limit=22.5 2024-08-19 03:41:37,482 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 10650, loss[loss=0.1205, beats_loss=0.00944, ecapa_loss=0.0001624, whisper_loss=0.1094, over 21655.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01048, ecapa_loss=0.0001423, whisper_loss=0.08939, over 3892415.39 frames. ], batch size: 89, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:41:38,945 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 25 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-19 03:41:41,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4262570.0, ans=0.0 2024-08-19 03:41:42,262 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.55 vs. limit=15.0 2024-08-19 03:41:50,721 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 23 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-19 03:42:09,883 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=4262770.0, ans=0.5 2024-08-19 03:42:27,437 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4262870.0, ans=0.0 2024-08-19 03:42:29,720 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 30 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-19 03:42:39,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4262970.0, ans=0.125 2024-08-19 03:42:46,294 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 10700, loss[loss=0.1068, beats_loss=0.00899, ecapa_loss=0.0001521, whisper_loss=0.09632, over 13145.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01047, ecapa_loss=0.0001416, whisper_loss=0.09037, over 3904147.10 frames. ], batch size: 54, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:43:00,675 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 12 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-19 03:43:05,709 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 19 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-19 03:43:08,065 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-19 03:43:09,190 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.702e+01 2.321e+01 2.471e+01 2.734e+01 8.130e+01, threshold=4.942e+01, percent-clipped=1.0 2024-08-19 03:43:11,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4263170.0, ans=0.1 2024-08-19 03:43:19,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4263270.0, ans=0.125 2024-08-19 03:43:21,944 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 31 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-19 03:43:29,355 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 18 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-19 03:43:53,777 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 10750, loss[loss=0.09482, beats_loss=0.01061, ecapa_loss=0.0001558, whisper_loss=0.08265, over 21067.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0106, ecapa_loss=0.0001406, whisper_loss=0.08963, over 3891240.04 frames. ], batch size: 88, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:43:54,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4263570.0, ans=0.05 2024-08-19 03:43:58,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4263570.0, ans=0.125 2024-08-19 03:44:20,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4263770.0, ans=0.125 2024-08-19 03:44:21,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4263770.0, ans=0.1 2024-08-19 03:44:25,647 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.52 vs. limit=15.0 2024-08-19 03:44:32,269 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.89 vs. limit=10.0 2024-08-19 03:44:43,094 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 21 from LS+wenet, 17 from Vox, 50 fro AS 2024-08-19 03:44:56,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4263970.0, ans=0.125 2024-08-19 03:45:00,426 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 10800, loss[loss=0.1285, beats_loss=0.009215, ecapa_loss=0.0001468, whisper_loss=0.1178, over 23755.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01065, ecapa_loss=0.0001411, whisper_loss=0.09004, over 3899355.60 frames. ], batch size: 92, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 03:45:07,674 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 18 from LS+wenet, 29 from Vox, 44 fro AS 2024-08-19 03:45:11,729 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-19 03:45:25,263 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.325e+01 2.616e+01 2.924e+01 8.173e+01, threshold=5.233e+01, percent-clipped=1.0 2024-08-19 03:45:32,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4264270.0, ans=0.2 2024-08-19 03:45:38,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4264270.0, ans=0.125 2024-08-19 03:45:41,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4264370.0, ans=0.125 2024-08-19 03:45:50,442 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.91 vs. limit=15.0 2024-08-19 03:46:09,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4264570.0, ans=0.0 2024-08-19 03:46:09,899 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 10850, loss[loss=0.1264, beats_loss=0.005927, ecapa_loss=0.000162, whisper_loss=0.1188, over 15127.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01059, ecapa_loss=0.0001414, whisper_loss=0.09024, over 3889774.72 frames. ], batch size: 56, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 03:46:15,779 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-19 03:46:19,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4264570.0, ans=10.0 2024-08-19 03:46:23,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4264670.0, ans=0.125 2024-08-19 03:46:24,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4264670.0, ans=0.2 2024-08-19 03:46:39,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4264770.0, ans=0.125 2024-08-19 03:46:48,691 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.15 vs. limit=15.0 2024-08-19 03:47:13,745 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 18 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-19 03:47:20,502 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 17 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-19 03:47:30,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4264870.0, ans=0.125 2024-08-19 03:47:31,487 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 34 from Vox, 33 fro AS 2024-08-19 03:47:45,132 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 21 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-19 03:47:51,073 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 10900, loss[loss=0.1203, beats_loss=0.01006, ecapa_loss=0.0001154, whisper_loss=0.1091, over 23406.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01055, ecapa_loss=0.000141, whisper_loss=0.09028, over 3905555.22 frames. ], batch size: 91, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 03:47:52,789 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-19 03:47:54,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4265070.0, ans=0.125 2024-08-19 03:48:17,828 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-19 03:48:19,165 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.006e+01 2.332e+01 2.581e+01 2.915e+01 5.254e+01, threshold=5.161e+01, percent-clipped=1.0 2024-08-19 03:48:19,629 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.92 vs. limit=22.5 2024-08-19 03:48:52,775 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.40 vs. limit=22.5 2024-08-19 03:49:07,462 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 23 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-19 03:49:08,516 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 10950, loss[loss=0.1041, beats_loss=0.01074, ecapa_loss=0.0001651, whisper_loss=0.09168, over 17598.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01049, ecapa_loss=0.0001415, whisper_loss=0.09068, over 3928720.55 frames. ], batch size: 72, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 03:49:12,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4265570.0, ans=0.1 2024-08-19 03:49:14,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4265570.0, ans=0.125 2024-08-19 03:49:17,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4265570.0, ans=0.125 2024-08-19 03:49:20,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4265570.0, ans=0.0 2024-08-19 03:49:22,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4265670.0, ans=0.0 2024-08-19 03:49:26,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4265670.0, ans=0.1 2024-08-19 03:49:26,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4265670.0, ans=0.0 2024-08-19 03:49:43,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4265770.0, ans=0.0 2024-08-19 03:49:54,953 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-19 03:50:00,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4265870.0, ans=0.125 2024-08-19 03:50:01,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4265870.0, ans=0.0 2024-08-19 03:50:03,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4265870.0, ans=0.125 2024-08-19 03:50:08,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4265970.0, ans=0.125 2024-08-19 03:50:12,719 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.20 vs. limit=15.0 2024-08-19 03:50:22,920 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 11000, loss[loss=0.09904, beats_loss=0.01097, ecapa_loss=0.0001382, whisper_loss=0.08668, over 22770.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0105, ecapa_loss=0.0001417, whisper_loss=0.09015, over 3902845.74 frames. ], batch size: 92, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 03:50:28,498 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-19 03:50:42,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4266170.0, ans=0.125 2024-08-19 03:50:48,748 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.345e+01 2.567e+01 2.803e+01 3.050e+02, threshold=5.135e+01, percent-clipped=1.0 2024-08-19 03:50:58,315 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.49 vs. limit=10.0 2024-08-19 03:51:35,600 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 11050, loss[loss=0.1027, beats_loss=0.01099, ecapa_loss=0.0001788, whisper_loss=0.08991, over 21123.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01046, ecapa_loss=0.0001428, whisper_loss=0.09043, over 3901717.57 frames. ], batch size: 90, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 03:52:12,594 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 15 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-19 03:52:18,689 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.51 vs. limit=10.0 2024-08-19 03:52:21,277 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 17 from LS+wenet, 31 from Vox, 42 fro AS 2024-08-19 03:52:21,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4266870.0, ans=0.95 2024-08-19 03:52:37,764 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 29 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-19 03:52:42,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4266970.0, ans=0.125 2024-08-19 03:52:49,134 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 11100, loss[loss=0.144, beats_loss=0.006634, ecapa_loss=0.0001414, whisper_loss=0.1359, over 21327.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01042, ecapa_loss=0.0001425, whisper_loss=0.09106, over 3905109.20 frames. ], batch size: 78, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 03:52:53,544 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-19 03:53:07,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4267170.0, ans=0.125 2024-08-19 03:53:11,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4267170.0, ans=0.125 2024-08-19 03:53:14,772 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.398e+01 2.605e+01 2.848e+01 4.368e+01, threshold=5.210e+01, percent-clipped=0.0 2024-08-19 03:53:16,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4267270.0, ans=0.0 2024-08-19 03:53:30,835 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-19 03:53:45,378 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 21 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-19 03:53:45,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4267470.0, ans=0.2 2024-08-19 03:53:54,189 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.422e+01 2024-08-19 03:54:00,477 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 11150, loss[loss=0.1075, beats_loss=0.01087, ecapa_loss=0.0001431, whisper_loss=0.09522, over 22814.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01043, ecapa_loss=0.0001408, whisper_loss=0.09129, over 3925909.10 frames. ], batch size: 91, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 03:54:26,476 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-19 03:54:29,977 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2024-08-19 03:54:34,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4267770.0, ans=0.1 2024-08-19 03:54:35,004 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 21 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-19 03:54:44,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4267870.0, ans=0.1 2024-08-19 03:54:48,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4267870.0, ans=0.2 2024-08-19 03:54:50,431 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.95 vs. limit=22.5 2024-08-19 03:54:52,123 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.07 vs. limit=22.5 2024-08-19 03:54:55,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4267870.0, ans=0.04949747468305833 2024-08-19 03:55:00,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4267970.0, ans=0.125 2024-08-19 03:55:02,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=4267970.0, ans=10.0 2024-08-19 03:55:11,527 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 11200, loss[loss=0.08204, beats_loss=0.01103, ecapa_loss=0.0001086, whisper_loss=0.06992, over 15151.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01036, ecapa_loss=0.000142, whisper_loss=0.09121, over 3892742.90 frames. ], batch size: 58, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 03:55:12,163 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.60 vs. limit=15.0 2024-08-19 03:55:23,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4268070.0, ans=0.025 2024-08-19 03:55:34,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4268170.0, ans=0.125 2024-08-19 03:55:37,720 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.683e+01 2.383e+01 2.571e+01 2.965e+01 4.836e+01, threshold=5.143e+01, percent-clipped=0.0 2024-08-19 03:55:39,109 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-19 03:55:54,623 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4268370.0, ans=0.125 2024-08-19 03:56:04,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4268370.0, ans=0.0 2024-08-19 03:56:05,592 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4268370.0, ans=0.2 2024-08-19 03:56:05,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4268370.0, ans=0.0 2024-08-19 03:56:14,550 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-19 03:56:17,783 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-19 03:56:21,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4268470.0, ans=0.0 2024-08-19 03:56:26,212 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 11250, loss[loss=0.1072, beats_loss=0.009636, ecapa_loss=0.0001411, whisper_loss=0.09619, over 22614.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01032, ecapa_loss=0.0001428, whisper_loss=0.09106, over 3908182.90 frames. ], batch size: 91, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 03:56:50,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4268670.0, ans=0.0 2024-08-19 03:56:51,624 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 32 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-19 03:56:57,489 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.91 vs. limit=22.5 2024-08-19 03:57:12,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4268870.0, ans=0.125 2024-08-19 03:57:25,593 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-19 03:57:38,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4269070.0, ans=0.0 2024-08-19 03:57:40,751 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 11300, loss[loss=0.1127, beats_loss=0.01044, ecapa_loss=0.0001675, whisper_loss=0.1006, over 22350.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01032, ecapa_loss=0.0001425, whisper_loss=0.09164, over 3914273.20 frames. ], batch size: 89, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 03:57:45,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4269070.0, ans=0.2 2024-08-19 03:57:45,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4269070.0, ans=0.0 2024-08-19 03:58:05,685 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.950e+01 2.405e+01 2.659e+01 2.967e+01 4.406e+01, threshold=5.318e+01, percent-clipped=0.0 2024-08-19 03:58:21,543 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-19 03:58:25,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4269370.0, ans=0.0 2024-08-19 03:58:39,996 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.23 vs. limit=15.0 2024-08-19 03:58:49,139 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-19 03:58:49,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4269570.0, ans=0.1 2024-08-19 03:58:50,319 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 11350, loss[loss=0.1049, beats_loss=0.01119, ecapa_loss=0.0001153, whisper_loss=0.09251, over 22689.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01032, ecapa_loss=0.0001422, whisper_loss=0.09123, over 3894691.14 frames. ], batch size: 89, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 03:58:58,366 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-19 03:59:15,159 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.66 vs. limit=15.0 2024-08-19 03:59:18,932 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4269670.0, ans=0.125 2024-08-19 03:59:30,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4269770.0, ans=0.2 2024-08-19 03:59:34,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4269870.0, ans=0.125 2024-08-19 03:59:36,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4269870.0, ans=0.1 2024-08-19 03:59:39,967 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 31 from Vox, 28 fro AS 2024-08-19 03:59:44,712 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.950e+01 2024-08-19 03:59:47,601 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-19 04:00:03,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4270070.0, ans=0.0 2024-08-19 04:00:04,464 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 11400, loss[loss=0.09698, beats_loss=0.01095, ecapa_loss=0.0001295, whisper_loss=0.08474, over 22070.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01033, ecapa_loss=0.0001422, whisper_loss=0.09135, over 3923452.77 frames. ], batch size: 88, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:00:30,693 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.261e+01 2.453e+01 2.815e+01 3.733e+01, threshold=4.905e+01, percent-clipped=0.0 2024-08-19 04:00:35,714 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.30 vs. limit=15.0 2024-08-19 04:00:43,631 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 17 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-19 04:00:46,649 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 36 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-19 04:00:59,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4270370.0, ans=0.0 2024-08-19 04:01:14,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4270470.0, ans=0.125 2024-08-19 04:01:16,510 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 11450, loss[loss=0.1083, beats_loss=0.01054, ecapa_loss=0.0001506, whisper_loss=0.09627, over 15268.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01038, ecapa_loss=0.0001427, whisper_loss=0.09086, over 3909635.89 frames. ], batch size: 61, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:01:22,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4270570.0, ans=0.125 2024-08-19 04:01:30,716 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 14 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-19 04:01:38,570 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4270670.0, ans=0.0 2024-08-19 04:01:45,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4270770.0, ans=0.125 2024-08-19 04:01:55,321 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 29 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-19 04:02:11,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4270870.0, ans=0.125 2024-08-19 04:02:26,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4270970.0, ans=0.015 2024-08-19 04:02:29,408 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 11500, loss[loss=0.1096, beats_loss=0.01074, ecapa_loss=0.0001411, whisper_loss=0.09745, over 18697.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01043, ecapa_loss=0.0001417, whisper_loss=0.09062, over 3895416.21 frames. ], batch size: 73, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:02:29,569 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-19 04:02:46,174 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 30 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-19 04:02:53,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4271170.0, ans=0.125 2024-08-19 04:02:56,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4271170.0, ans=0.0 2024-08-19 04:02:57,421 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.365e+01 2.674e+01 3.024e+01 4.760e+02, threshold=5.347e+01, percent-clipped=3.0 2024-08-19 04:03:01,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4271270.0, ans=0.0 2024-08-19 04:03:06,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4271270.0, ans=0.0 2024-08-19 04:03:07,759 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-19 04:03:15,651 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-19 04:03:18,283 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 29 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-19 04:03:35,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4271470.0, ans=0.125 2024-08-19 04:03:48,905 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 11550, loss[loss=0.09976, beats_loss=0.01143, ecapa_loss=0.0001535, whisper_loss=0.0868, over 22766.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01038, ecapa_loss=0.0001426, whisper_loss=0.09119, over 3904671.69 frames. ], batch size: 93, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:03:50,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4271570.0, ans=0.2 2024-08-19 04:03:51,175 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.35 vs. limit=15.0 2024-08-19 04:03:52,124 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 28 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-19 04:04:02,045 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 04:04:04,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4271670.0, ans=0.1 2024-08-19 04:04:26,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4271770.0, ans=0.0 2024-08-19 04:04:27,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4271770.0, ans=0.1 2024-08-19 04:04:47,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4271870.0, ans=0.125 2024-08-19 04:04:50,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4271970.0, ans=0.125 2024-08-19 04:04:50,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4271970.0, ans=0.0 2024-08-19 04:04:51,945 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 20 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-19 04:04:58,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4271970.0, ans=0.0 2024-08-19 04:04:59,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4271970.0, ans=0.0 2024-08-19 04:05:07,664 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 11600, loss[loss=0.1089, beats_loss=0.01041, ecapa_loss=0.000141, whisper_loss=0.09704, over 23692.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0103, ecapa_loss=0.0001424, whisper_loss=0.09149, over 3871841.33 frames. ], batch size: 91, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:05:12,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4272070.0, ans=0.125 2024-08-19 04:05:15,571 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 17 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-19 04:05:23,832 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-19 04:05:29,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4272170.0, ans=0.0 2024-08-19 04:05:31,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4272170.0, ans=0.1 2024-08-19 04:05:35,355 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.347e+01 2.564e+01 2.905e+01 5.911e+01, threshold=5.128e+01, percent-clipped=1.0 2024-08-19 04:05:40,480 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=4272270.0, ans=10.0 2024-08-19 04:05:49,196 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.50 vs. limit=15.0 2024-08-19 04:06:07,030 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-19 04:06:20,261 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 30 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-19 04:06:23,416 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 19 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-19 04:06:25,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=4272470.0, ans=0.1 2024-08-19 04:06:29,690 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 11650, loss[loss=0.0941, beats_loss=0.01007, ecapa_loss=0.0001082, whisper_loss=0.08295, over 19823.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01039, ecapa_loss=0.0001426, whisper_loss=0.09039, over 3903411.88 frames. ], batch size: 76, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:06:33,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4272570.0, ans=0.125 2024-08-19 04:06:37,641 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.10 vs. limit=15.0 2024-08-19 04:07:02,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4272770.0, ans=0.0 2024-08-19 04:07:13,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4272770.0, ans=0.0 2024-08-19 04:07:13,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4272770.0, ans=0.0 2024-08-19 04:07:16,865 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.17 vs. limit=15.0 2024-08-19 04:07:18,136 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 25 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-19 04:07:19,424 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-19 04:07:25,457 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 23 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-19 04:07:46,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4272970.0, ans=0.125 2024-08-19 04:07:49,505 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 30 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-19 04:07:54,683 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 11700, loss[loss=0.09027, beats_loss=0.0113, ecapa_loss=0.0001609, whisper_loss=0.07736, over 18368.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01048, ecapa_loss=0.0001424, whisper_loss=0.08988, over 3916536.13 frames. ], batch size: 78, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:07:54,827 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 27 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-19 04:08:09,417 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.51 vs. limit=15.0 2024-08-19 04:08:16,798 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 24 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-19 04:08:23,705 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.836e+01 2.305e+01 2.645e+01 2.900e+01 9.382e+01, threshold=5.291e+01, percent-clipped=2.0 2024-08-19 04:08:25,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4273270.0, ans=0.125 2024-08-19 04:08:33,480 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-19 04:08:43,140 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 12 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-19 04:09:02,067 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.154e-02 2024-08-19 04:09:09,775 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-19 04:09:14,189 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.26 vs. limit=15.0 2024-08-19 04:09:14,954 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 11750, loss[loss=0.1221, beats_loss=0.009232, ecapa_loss=0.0001548, whisper_loss=0.1113, over 22163.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01059, ecapa_loss=0.0001417, whisper_loss=0.08933, over 3921458.09 frames. ], batch size: 88, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:09:24,352 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-19 04:09:29,729 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.26 vs. limit=15.0 2024-08-19 04:09:39,258 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-19 04:09:40,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4273670.0, ans=0.125 2024-08-19 04:10:00,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4273870.0, ans=0.0 2024-08-19 04:10:06,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4273870.0, ans=0.2 2024-08-19 04:10:32,832 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 11800, loss[loss=0.09399, beats_loss=0.01162, ecapa_loss=0.0001067, whisper_loss=0.08129, over 24095.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01063, ecapa_loss=0.000142, whisper_loss=0.08919, over 3916491.09 frames. ], batch size: 95, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:10:40,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4274070.0, ans=0.0 2024-08-19 04:10:55,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4274170.0, ans=0.1 2024-08-19 04:10:59,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4274170.0, ans=0.07 2024-08-19 04:11:03,170 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.316e+01 2.542e+01 2.977e+01 1.357e+02, threshold=5.084e+01, percent-clipped=2.0 2024-08-19 04:11:03,370 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-19 04:11:06,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4274270.0, ans=0.1 2024-08-19 04:11:10,814 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.33 vs. limit=15.0 2024-08-19 04:11:17,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4274270.0, ans=0.125 2024-08-19 04:11:20,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4274270.0, ans=0.125 2024-08-19 04:11:30,712 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 21 from LS+wenet, 23 from Vox, 49 fro AS 2024-08-19 04:11:40,334 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.33 vs. limit=22.5 2024-08-19 04:11:48,515 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.91 vs. limit=15.0 2024-08-19 04:11:54,537 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 11850, loss[loss=0.09324, beats_loss=0.0132, ecapa_loss=9.735e-05, whisper_loss=0.07907, over 25023.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01063, ecapa_loss=0.0001429, whisper_loss=0.08918, over 3922550.86 frames. ], batch size: 97, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:12:08,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4274570.0, ans=0.2 2024-08-19 04:12:16,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4274670.0, ans=0.125 2024-08-19 04:12:16,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4274670.0, ans=0.125 2024-08-19 04:12:28,120 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 14 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-19 04:12:47,046 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.74 vs. limit=15.0 2024-08-19 04:12:50,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4274870.0, ans=0.2 2024-08-19 04:12:57,392 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.75 vs. limit=12.0 2024-08-19 04:13:02,444 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 28 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-19 04:13:02,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=4274970.0, ans=10.0 2024-08-19 04:13:06,156 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.37 vs. limit=22.5 2024-08-19 04:13:11,774 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 11900, loss[loss=0.09897, beats_loss=0.01056, ecapa_loss=0.0001469, whisper_loss=0.08695, over 19076.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01067, ecapa_loss=0.0001415, whisper_loss=0.08979, over 3919899.26 frames. ], batch size: 77, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:13:25,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4275170.0, ans=0.0 2024-08-19 04:13:36,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4275170.0, ans=10.0 2024-08-19 04:13:39,084 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.362e+01 2.683e+01 3.004e+01 4.414e+01, threshold=5.366e+01, percent-clipped=0.0 2024-08-19 04:13:41,249 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-19 04:13:53,553 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 04:13:59,040 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.84 vs. limit=15.0 2024-08-19 04:14:08,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4275370.0, ans=0.0 2024-08-19 04:14:12,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4275470.0, ans=0.2 2024-08-19 04:14:18,489 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-19 04:14:20,986 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 27 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-19 04:14:26,844 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 11950, loss[loss=0.08806, beats_loss=0.01083, ecapa_loss=0.0001364, whisper_loss=0.07587, over 16125.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01063, ecapa_loss=0.0001416, whisper_loss=0.08973, over 3932869.44 frames. ], batch size: 63, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:14:35,386 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.73 vs. limit=22.5 2024-08-19 04:14:52,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4275670.0, ans=0.2 2024-08-19 04:14:58,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4275770.0, ans=0.125 2024-08-19 04:15:00,114 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-19 04:15:05,776 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 31 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-19 04:15:18,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4275870.0, ans=0.125 2024-08-19 04:15:37,462 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 12000, loss[loss=0.08977, beats_loss=0.01063, ecapa_loss=0.0001516, whisper_loss=0.07763, over 14415.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01063, ecapa_loss=0.0001417, whisper_loss=0.08931, over 3911431.61 frames. ], batch size: 57, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:15:37,463 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-19 04:16:19,285 INFO [train_multi_KD3.py:1149] (0/4) Epoch 29, validation on ASR_libri: loss=0.2542, beats_loss=0, ecapa_loss=0.0005284, whisper_loss=0.2489, over 922467.00 frames. 2024-08-19 04:16:36,960 INFO [train_multi_KD3.py:1149] (0/4) Epoch 29, validation on SV_voxceleb1: loss=0.004097, beats_loss=0, ecapa_loss=0.0004097, whisper_loss=0, over 939242.00 frames. 2024-08-19 04:18:29,250 INFO [train_multi_KD3.py:1149] (0/4) Epoch 29, validation on AT_audioset: loss=0.02313, beats_loss=0.02313, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 04:18:29,255 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32389MB 2024-08-19 04:18:37,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4276070.0, ans=0.125 2024-08-19 04:18:43,168 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 24 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-19 04:18:50,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4276170.0, ans=0.125 2024-08-19 04:18:54,618 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.268e+01 2.505e+01 2.757e+01 3.965e+01, threshold=5.011e+01, percent-clipped=0.0 2024-08-19 04:19:25,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4276470.0, ans=0.125 2024-08-19 04:19:25,762 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4276470.0, ans=0.125 2024-08-19 04:19:31,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4276470.0, ans=0.125 2024-08-19 04:19:34,149 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-19 04:19:38,967 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 12050, loss[loss=0.1113, beats_loss=0.008335, ecapa_loss=0.0001429, whisper_loss=0.1015, over 18264.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01054, ecapa_loss=0.0001425, whisper_loss=0.09003, over 3879313.96 frames. ], batch size: 71, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:19:47,185 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 12 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-19 04:20:01,414 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.86 vs. limit=6.0 2024-08-19 04:20:10,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4276770.0, ans=0.1 2024-08-19 04:20:20,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4276770.0, ans=0.125 2024-08-19 04:20:20,767 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.66 vs. limit=15.0 2024-08-19 04:20:21,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4276870.0, ans=0.0 2024-08-19 04:20:23,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4276870.0, ans=0.125 2024-08-19 04:20:26,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4276870.0, ans=0.125 2024-08-19 04:20:28,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4276870.0, ans=0.0 2024-08-19 04:20:29,137 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-19 04:20:34,455 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 21 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-19 04:20:37,588 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-19 04:20:48,512 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 12100, loss[loss=0.09835, beats_loss=0.009696, ecapa_loss=0.0001211, whisper_loss=0.08744, over 17593.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01053, ecapa_loss=0.0001427, whisper_loss=0.09015, over 3896774.28 frames. ], batch size: 68, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:21:03,021 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-19 04:21:09,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4277170.0, ans=0.1 2024-08-19 04:21:13,376 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.256e+01 2.611e+01 2.868e+01 1.471e+02, threshold=5.223e+01, percent-clipped=2.0 2024-08-19 04:21:22,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4277270.0, ans=0.0 2024-08-19 04:21:27,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4277270.0, ans=0.04949747468305833 2024-08-19 04:21:58,647 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 12150, loss[loss=0.1144, beats_loss=0.0106, ecapa_loss=0.0001389, whisper_loss=0.1024, over 17997.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01064, ecapa_loss=0.0001429, whisper_loss=0.09035, over 3863241.16 frames. ], batch size: 71, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:22:07,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4277570.0, ans=0.125 2024-08-19 04:22:09,496 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.13 vs. limit=15.0 2024-08-19 04:22:11,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4277670.0, ans=0.1 2024-08-19 04:22:36,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4277770.0, ans=0.125 2024-08-19 04:22:55,886 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2024-08-19 04:23:02,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4277970.0, ans=0.125 2024-08-19 04:23:08,949 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 12200, loss[loss=0.07798, beats_loss=0.01109, ecapa_loss=0.000115, whisper_loss=0.06574, over 17056.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01066, ecapa_loss=0.0001425, whisper_loss=0.08981, over 3821538.32 frames. ], batch size: 67, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:23:36,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4278170.0, ans=0.0 2024-08-19 04:23:38,849 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.338e+01 2.537e+01 2.829e+01 3.837e+02, threshold=5.074e+01, percent-clipped=2.0 2024-08-19 04:23:42,126 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.80 vs. limit=15.0 2024-08-19 04:24:13,251 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 24 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-19 04:24:31,623 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 12250, loss[loss=0.09508, beats_loss=0.01154, ecapa_loss=9.836e-05, whisper_loss=0.08255, over 16503.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01061, ecapa_loss=0.0001431, whisper_loss=0.08975, over 3827599.71 frames. ], batch size: 60, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:25:02,999 WARNING [optim.py:496] (0/4) Scaling gradients by 0.05575420707464218, model_norm_threshold=50.743568420410156 2024-08-19 04:25:03,163 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.0.self_attn_weights.in_proj.bias with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.164e+05, grad_sumsq=1.291e+04, orig_rms_sq=9.017e+00 2024-08-19 04:25:08,451 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-19 04:25:17,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4278770.0, ans=10.0 2024-08-19 04:25:21,466 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 04:25:29,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4278870.0, ans=0.125 2024-08-19 04:25:31,659 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-19 04:25:40,244 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 24 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-19 04:25:47,011 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-19 04:25:48,989 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 27 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-19 04:25:49,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4278970.0, ans=0.125 2024-08-19 04:25:50,914 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 24 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-19 04:25:59,456 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 12300, loss[loss=0.09362, beats_loss=0.01001, ecapa_loss=0.0001359, whisper_loss=0.08225, over 17430.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01057, ecapa_loss=0.0001423, whisper_loss=0.09031, over 3824263.26 frames. ], batch size: 70, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:26:32,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4279170.0, ans=10.0 2024-08-19 04:26:33,443 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.928e+01 2.490e+01 2.668e+01 2.990e+01 9.101e+02, threshold=5.335e+01, percent-clipped=3.0 2024-08-19 04:26:44,946 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.06 vs. limit=15.0 2024-08-19 04:26:49,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4279270.0, ans=0.0 2024-08-19 04:26:52,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4279270.0, ans=0.125 2024-08-19 04:26:54,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4279370.0, ans=0.1 2024-08-19 04:27:03,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4279370.0, ans=0.125 2024-08-19 04:27:18,256 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 32 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-19 04:27:30,249 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 12350, loss[loss=0.09284, beats_loss=0.01193, ecapa_loss=0.0001132, whisper_loss=0.07978, over 18795.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01044, ecapa_loss=0.000145, whisper_loss=0.09068, over 3846633.65 frames. ], batch size: 75, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:28:06,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4279770.0, ans=0.125 2024-08-19 04:28:19,087 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 30 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-19 04:28:31,797 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-19 04:28:34,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4279970.0, ans=0.125 2024-08-19 04:28:37,463 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 23 from LS+wenet, 26 from Vox, 23 fro AS 2024-08-19 04:28:38,700 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-428000.pt 2024-08-19 04:28:41,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4279970.0, ans=0.0 2024-08-19 04:28:42,559 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-19 04:28:45,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4279970.0, ans=0.125 2024-08-19 04:28:45,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4279970.0, ans=0.125 2024-08-19 04:28:51,341 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 12400, loss[loss=0.08676, beats_loss=0.01042, ecapa_loss=0.0001602, whisper_loss=0.07473, over 20646.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01038, ecapa_loss=0.0001438, whisper_loss=0.09059, over 3885606.03 frames. ], batch size: 84, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:28:51,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4280070.0, ans=0.125 2024-08-19 04:29:08,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4280170.0, ans=0.125 2024-08-19 04:29:14,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4280170.0, ans=0.1 2024-08-19 04:29:15,113 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.385e+01 2.588e+01 2.896e+01 2.116e+02, threshold=5.177e+01, percent-clipped=1.0 2024-08-19 04:29:31,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4280370.0, ans=0.125 2024-08-19 04:29:36,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4280370.0, ans=0.0 2024-08-19 04:29:47,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4280470.0, ans=0.2 2024-08-19 04:29:52,875 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.44 vs. limit=10.0 2024-08-19 04:29:53,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4280470.0, ans=0.125 2024-08-19 04:29:54,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4280470.0, ans=0.125 2024-08-19 04:29:57,091 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 12450, loss[loss=0.1035, beats_loss=0.009288, ecapa_loss=0.0001465, whisper_loss=0.09277, over 20916.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01039, ecapa_loss=0.000144, whisper_loss=0.09025, over 3904385.21 frames. ], batch size: 88, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:30:16,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4280670.0, ans=0.125 2024-08-19 04:30:21,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=4280670.0, ans=0.5 2024-08-19 04:30:22,096 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.53 vs. limit=15.0 2024-08-19 04:30:22,636 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 16 from Vox, 50 fro AS 2024-08-19 04:30:34,053 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.03 vs. limit=15.0 2024-08-19 04:31:00,245 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-19 04:31:01,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4281070.0, ans=0.125 2024-08-19 04:31:02,546 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 12500, loss[loss=0.09374, beats_loss=0.01209, ecapa_loss=0.0001186, whisper_loss=0.08046, over 22842.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01046, ecapa_loss=0.0001435, whisper_loss=0.08986, over 3905946.95 frames. ], batch size: 92, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:31:22,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4281170.0, ans=0.125 2024-08-19 04:31:25,715 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.257e+01 2.522e+01 2.778e+01 4.051e+01, threshold=5.043e+01, percent-clipped=0.0 2024-08-19 04:31:30,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4281270.0, ans=0.0 2024-08-19 04:31:32,410 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 37 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-19 04:31:43,021 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-19 04:31:47,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4281370.0, ans=0.125 2024-08-19 04:31:56,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4281470.0, ans=0.125 2024-08-19 04:32:06,722 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 12550, loss[loss=0.0873, beats_loss=0.01235, ecapa_loss=0.0001365, whisper_loss=0.07358, over 22497.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01047, ecapa_loss=0.0001425, whisper_loss=0.0902, over 3916242.17 frames. ], batch size: 94, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:32:08,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4281570.0, ans=0.2 2024-08-19 04:32:15,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4281570.0, ans=0.125 2024-08-19 04:32:29,321 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 14 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-19 04:32:29,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4281670.0, ans=0.2 2024-08-19 04:32:50,288 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.12 vs. limit=15.0 2024-08-19 04:33:11,135 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 12600, loss[loss=0.09318, beats_loss=0.01248, ecapa_loss=0.000143, whisper_loss=0.07927, over 21880.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01052, ecapa_loss=0.0001424, whisper_loss=0.09009, over 3911410.03 frames. ], batch size: 90, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:33:26,423 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.39 vs. limit=15.0 2024-08-19 04:33:29,825 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 26 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-19 04:33:34,966 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.025e+01 2.310e+01 2.550e+01 2.894e+01 3.916e+01, threshold=5.099e+01, percent-clipped=0.0 2024-08-19 04:33:50,781 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4282370.0, ans=0.125 2024-08-19 04:33:50,869 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4282370.0, ans=0.04949747468305833 2024-08-19 04:34:01,130 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.38 vs. limit=15.0 2024-08-19 04:34:16,141 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 12650, loss[loss=0.1207, beats_loss=0.01019, ecapa_loss=0.0001491, whisper_loss=0.1091, over 22819.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01054, ecapa_loss=0.000142, whisper_loss=0.09015, over 3891590.26 frames. ], batch size: 90, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:34:24,570 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-19 04:34:27,436 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-19 04:34:32,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4282670.0, ans=0.1 2024-08-19 04:34:33,115 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.06 vs. limit=15.0 2024-08-19 04:34:35,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4282670.0, ans=0.0 2024-08-19 04:34:41,009 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 10 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-19 04:34:45,019 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 28 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-19 04:34:55,686 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 12 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-19 04:34:59,163 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.10 vs. limit=10.0 2024-08-19 04:35:06,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4282870.0, ans=0.125 2024-08-19 04:35:12,169 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-19 04:35:13,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4282970.0, ans=0.125 2024-08-19 04:35:21,218 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 12700, loss[loss=0.08627, beats_loss=0.009979, ecapa_loss=0.0001427, whisper_loss=0.07487, over 15998.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01057, ecapa_loss=0.000142, whisper_loss=0.08975, over 3872442.45 frames. ], batch size: 63, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:35:21,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4283070.0, ans=0.125 2024-08-19 04:35:24,006 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 29 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-19 04:35:44,699 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.234e+01 2.542e+01 2.819e+01 3.569e+01, threshold=5.084e+01, percent-clipped=0.0 2024-08-19 04:35:47,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4283270.0, ans=0.1 2024-08-19 04:35:56,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4283270.0, ans=0.2 2024-08-19 04:35:58,268 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 26 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-19 04:36:08,038 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.99 vs. limit=12.0 2024-08-19 04:36:13,990 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 36 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-19 04:36:20,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4283470.0, ans=0.0 2024-08-19 04:36:27,300 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 12750, loss[loss=0.08821, beats_loss=0.0127, ecapa_loss=0.000151, whisper_loss=0.074, over 20001.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0106, ecapa_loss=0.0001421, whisper_loss=0.09007, over 3894002.23 frames. ], batch size: 83, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:36:27,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4283570.0, ans=0.125 2024-08-19 04:36:47,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4283670.0, ans=0.125 2024-08-19 04:36:54,019 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-19 04:36:54,606 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.80 vs. limit=15.0 2024-08-19 04:36:55,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4283770.0, ans=0.1 2024-08-19 04:37:11,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4283870.0, ans=0.125 2024-08-19 04:37:17,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4283870.0, ans=0.125 2024-08-19 04:37:19,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4283970.0, ans=0.0 2024-08-19 04:37:31,164 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 19 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-19 04:37:33,403 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 12800, loss[loss=0.07832, beats_loss=0.01136, ecapa_loss=0.0001099, whisper_loss=0.06586, over 13632.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0106, ecapa_loss=0.0001425, whisper_loss=0.08943, over 3864852.58 frames. ], batch size: 54, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:37:43,076 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.06 vs. limit=22.5 2024-08-19 04:37:56,530 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.707e+01 2.197e+01 2.426e+01 2.664e+01 3.787e+01, threshold=4.851e+01, percent-clipped=0.0 2024-08-19 04:38:14,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4284370.0, ans=0.125 2024-08-19 04:38:33,629 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-19 04:38:37,224 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 12850, loss[loss=0.1324, beats_loss=0.009071, ecapa_loss=0.0001289, whisper_loss=0.122, over 23853.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01062, ecapa_loss=0.0001438, whisper_loss=0.08915, over 3861873.23 frames. ], batch size: 90, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:38:47,424 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 19 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-19 04:38:51,514 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4284670.0, ans=0.125 2024-08-19 04:38:55,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4284670.0, ans=0.2 2024-08-19 04:39:10,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4284770.0, ans=0.0 2024-08-19 04:39:12,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4284770.0, ans=0.1 2024-08-19 04:39:12,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4284770.0, ans=0.125 2024-08-19 04:39:13,909 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.47 vs. limit=22.5 2024-08-19 04:39:19,873 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=6.17 vs. limit=12.0 2024-08-19 04:39:23,933 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 27 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-19 04:39:33,620 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.69 vs. limit=10.0 2024-08-19 04:39:36,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4284970.0, ans=0.125 2024-08-19 04:39:38,051 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4284970.0, ans=0.2 2024-08-19 04:39:40,130 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 12900, loss[loss=0.1012, beats_loss=0.0103, ecapa_loss=0.0001104, whisper_loss=0.08985, over 20532.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01053, ecapa_loss=0.0001442, whisper_loss=0.0895, over 3883234.24 frames. ], batch size: 77, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:39:49,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4285070.0, ans=0.125 2024-08-19 04:39:58,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4285170.0, ans=0.1 2024-08-19 04:40:02,817 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.358e+01 2.618e+01 2.908e+01 5.283e+01, threshold=5.236e+01, percent-clipped=1.0 2024-08-19 04:40:04,471 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 34 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-19 04:40:05,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4285270.0, ans=0.1 2024-08-19 04:40:18,634 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.34 vs. limit=12.0 2024-08-19 04:40:34,073 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-19 04:40:38,943 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 28 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-19 04:40:41,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4285570.0, ans=0.0 2024-08-19 04:40:42,501 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 12950, loss[loss=0.07123, beats_loss=0.01113, ecapa_loss=0.0001703, whisper_loss=0.05839, over 15382.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01045, ecapa_loss=0.0001448, whisper_loss=0.08963, over 3880112.89 frames. ], batch size: 64, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:40:43,862 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 21 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-19 04:40:46,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4285570.0, ans=0.09899494936611666 2024-08-19 04:40:57,710 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 16 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-19 04:40:58,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4285670.0, ans=0.1 2024-08-19 04:41:01,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4285670.0, ans=0.015 2024-08-19 04:41:15,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=4285770.0, ans=0.5 2024-08-19 04:41:20,248 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.23 vs. limit=15.0 2024-08-19 04:41:44,615 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 13000, loss[loss=0.07439, beats_loss=0.01249, ecapa_loss=0.0001405, whisper_loss=0.06049, over 13493.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01051, ecapa_loss=0.0001448, whisper_loss=0.0898, over 3857367.18 frames. ], batch size: 57, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:41:56,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4286170.0, ans=0.2 2024-08-19 04:42:06,515 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.266e+01 2.592e+01 2.874e+01 4.367e+01, threshold=5.183e+01, percent-clipped=0.0 2024-08-19 04:42:09,768 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.57 vs. limit=15.0 2024-08-19 04:42:12,778 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-19 04:42:18,113 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-19 04:42:21,032 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.52 vs. limit=15.0 2024-08-19 04:42:28,476 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 17 from LS+wenet, 8 from Vox, 35 fro AS 2024-08-19 04:42:46,819 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 13050, loss[loss=0.0963, beats_loss=0.01181, ecapa_loss=0.0001344, whisper_loss=0.08315, over 18892.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01058, ecapa_loss=0.0001434, whisper_loss=0.08918, over 3876692.16 frames. ], batch size: 76, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:42:57,350 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.51 vs. limit=15.0 2024-08-19 04:43:06,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4286670.0, ans=0.125 2024-08-19 04:43:27,708 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-19 04:43:53,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4286970.0, ans=0.07 2024-08-19 04:43:57,079 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 13100, loss[loss=0.1077, beats_loss=0.01117, ecapa_loss=0.0001377, whisper_loss=0.0952, over 23060.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01046, ecapa_loss=0.000143, whisper_loss=0.09005, over 3906653.55 frames. ], batch size: 92, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:43:58,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4287070.0, ans=0.125 2024-08-19 04:43:58,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4287070.0, ans=0.0 2024-08-19 04:44:15,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4287170.0, ans=0.0 2024-08-19 04:44:23,735 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.776e+01 2.315e+01 2.592e+01 2.960e+01 1.126e+02, threshold=5.185e+01, percent-clipped=1.0 2024-08-19 04:44:39,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4287270.0, ans=0.0 2024-08-19 04:44:42,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4287370.0, ans=0.09899494936611666 2024-08-19 04:44:42,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4287370.0, ans=0.125 2024-08-19 04:44:44,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4287370.0, ans=0.2 2024-08-19 04:45:08,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4287470.0, ans=0.2 2024-08-19 04:45:11,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4287470.0, ans=0.0 2024-08-19 04:45:12,778 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 17 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-19 04:45:14,377 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 13150, loss[loss=0.08595, beats_loss=0.01332, ecapa_loss=0.000127, whisper_loss=0.07136, over 18543.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01045, ecapa_loss=0.0001426, whisper_loss=0.09005, over 3904275.55 frames. ], batch size: 75, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:45:34,287 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-19 04:45:34,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4287670.0, ans=0.09899494936611666 2024-08-19 04:45:40,903 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2024-08-19 04:45:57,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4287770.0, ans=0.1 2024-08-19 04:46:04,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4287870.0, ans=0.125 2024-08-19 04:46:11,202 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-19 04:46:18,368 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 22 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-19 04:46:18,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4287970.0, ans=0.025 2024-08-19 04:46:22,003 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.63 vs. limit=22.5 2024-08-19 04:46:25,195 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-19 04:46:30,675 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 13200, loss[loss=0.08164, beats_loss=0.009151, ecapa_loss=0.0001826, whisper_loss=0.07066, over 17892.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01039, ecapa_loss=0.0001425, whisper_loss=0.09027, over 3876995.83 frames. ], batch size: 70, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:46:44,998 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.90 vs. limit=15.0 2024-08-19 04:46:47,286 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.87 vs. limit=12.0 2024-08-19 04:46:51,062 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.53 vs. limit=15.0 2024-08-19 04:46:52,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4288170.0, ans=0.1 2024-08-19 04:46:57,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4288170.0, ans=0.125 2024-08-19 04:46:59,768 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+01 2.394e+01 2.675e+01 2.983e+01 8.603e+01, threshold=5.350e+01, percent-clipped=2.0 2024-08-19 04:47:06,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4288270.0, ans=0.125 2024-08-19 04:47:13,740 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.05 vs. limit=15.0 2024-08-19 04:47:19,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4288370.0, ans=0.0 2024-08-19 04:47:29,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4288370.0, ans=0.07 2024-08-19 04:47:34,250 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.89 vs. limit=22.5 2024-08-19 04:47:39,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=4288470.0, ans=0.05 2024-08-19 04:47:46,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4288470.0, ans=0.125 2024-08-19 04:47:49,118 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 13250, loss[loss=0.07844, beats_loss=0.01323, ecapa_loss=0.0001157, whisper_loss=0.06406, over 17749.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01044, ecapa_loss=0.0001439, whisper_loss=0.08999, over 3893600.72 frames. ], batch size: 73, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:47:56,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4288570.0, ans=0.125 2024-08-19 04:48:14,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4288670.0, ans=0.2 2024-08-19 04:48:14,583 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.87 vs. limit=15.0 2024-08-19 04:48:27,333 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.95 vs. limit=12.0 2024-08-19 04:49:03,125 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 19 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-19 04:49:07,106 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 13300, loss[loss=0.09974, beats_loss=0.01146, ecapa_loss=0.0001528, whisper_loss=0.08674, over 21756.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01039, ecapa_loss=0.0001425, whisper_loss=0.09076, over 3898943.85 frames. ], batch size: 89, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:49:09,256 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 15 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-19 04:49:14,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4289070.0, ans=0.125 2024-08-19 04:49:17,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4289070.0, ans=0.125 2024-08-19 04:49:21,924 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-19 04:49:30,704 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 40 from LS+wenet, 12 from Vox, 43 fro AS 2024-08-19 04:49:34,964 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.945e+01 2.357e+01 2.503e+01 2.761e+01 3.724e+01, threshold=5.005e+01, percent-clipped=0.0 2024-08-19 04:49:57,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4289370.0, ans=0.2 2024-08-19 04:50:02,144 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 16 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-19 04:50:07,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4289470.0, ans=0.125 2024-08-19 04:50:12,594 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.20 vs. limit=10.0 2024-08-19 04:50:21,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4289470.0, ans=0.2 2024-08-19 04:50:24,057 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 13350, loss[loss=0.104, beats_loss=0.01022, ecapa_loss=0.0001292, whisper_loss=0.0925, over 21850.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0104, ecapa_loss=0.0001416, whisper_loss=0.09076, over 3899506.01 frames. ], batch size: 87, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:50:30,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4289570.0, ans=0.125 2024-08-19 04:50:34,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4289570.0, ans=0.0 2024-08-19 04:51:01,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4289770.0, ans=0.125 2024-08-19 04:51:06,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4289770.0, ans=0.125 2024-08-19 04:51:17,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4289870.0, ans=0.125 2024-08-19 04:51:24,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4289870.0, ans=0.1 2024-08-19 04:51:27,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4289970.0, ans=0.125 2024-08-19 04:51:35,766 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.31 vs. limit=22.5 2024-08-19 04:51:40,235 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 13400, loss[loss=0.1236, beats_loss=0.007659, ecapa_loss=0.0001633, whisper_loss=0.1143, over 22956.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01035, ecapa_loss=0.0001432, whisper_loss=0.0911, over 3897505.38 frames. ], batch size: 92, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:51:42,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4290070.0, ans=0.1 2024-08-19 04:51:47,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4290070.0, ans=0.1 2024-08-19 04:52:05,742 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.283e+01 2.575e+01 2.794e+01 4.211e+01, threshold=5.151e+01, percent-clipped=0.0 2024-08-19 04:52:12,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4290270.0, ans=0.125 2024-08-19 04:52:14,730 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-19 04:52:16,519 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 15 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-19 04:52:21,074 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 21 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-19 04:52:32,405 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-19 04:52:46,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4290470.0, ans=0.0 2024-08-19 04:52:47,781 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 20 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-19 04:52:52,714 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 13450, loss[loss=0.1054, beats_loss=0.009144, ecapa_loss=0.0001731, whisper_loss=0.09451, over 22014.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01034, ecapa_loss=0.0001433, whisper_loss=0.09119, over 3885514.00 frames. ], batch size: 91, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:53:01,753 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-19 04:53:26,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4290770.0, ans=0.04949747468305833 2024-08-19 04:53:31,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4290770.0, ans=0.1 2024-08-19 04:53:45,760 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4290870.0, ans=0.125 2024-08-19 04:53:48,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4290870.0, ans=0.2 2024-08-19 04:54:10,684 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 13500, loss[loss=0.09557, beats_loss=0.00788, ecapa_loss=0.0001981, whisper_loss=0.08571, over 18981.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01039, ecapa_loss=0.0001435, whisper_loss=0.09045, over 3880393.87 frames. ], batch size: 79, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:54:13,491 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.41 vs. limit=22.5 2024-08-19 04:54:20,924 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 31 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-19 04:54:26,361 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 18 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-19 04:54:37,459 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.892e+01 2.302e+01 2.521e+01 2.822e+01 3.950e+01, threshold=5.042e+01, percent-clipped=0.0 2024-08-19 04:54:42,086 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2024-08-19 04:54:51,781 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-19 04:55:08,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4291470.0, ans=0.0 2024-08-19 04:55:10,691 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 22 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-19 04:55:12,162 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 25 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-19 04:55:12,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4291470.0, ans=0.125 2024-08-19 04:55:13,438 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 37 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-19 04:55:19,980 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.49 vs. limit=15.0 2024-08-19 04:55:23,447 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 13550, loss[loss=0.102, beats_loss=0.01167, ecapa_loss=0.0001239, whisper_loss=0.08907, over 21699.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01042, ecapa_loss=0.0001433, whisper_loss=0.09059, over 3889994.18 frames. ], batch size: 89, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:55:23,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4291570.0, ans=0.125 2024-08-19 04:55:42,510 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 27 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-19 04:55:53,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4291770.0, ans=0.125 2024-08-19 04:56:22,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4291970.0, ans=0.0 2024-08-19 04:56:27,627 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 20 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-19 04:56:34,942 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-19 04:56:36,270 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 13600, loss[loss=0.1164, beats_loss=0.01055, ecapa_loss=0.0001228, whisper_loss=0.1046, over 23727.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01053, ecapa_loss=0.0001416, whisper_loss=0.09018, over 3890590.84 frames. ], batch size: 90, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:56:39,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4292070.0, ans=0.125 2024-08-19 04:56:54,072 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.59 vs. limit=15.0 2024-08-19 04:57:02,140 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.298e+01 2.598e+01 3.003e+01 1.611e+02, threshold=5.196e+01, percent-clipped=4.0 2024-08-19 04:57:31,635 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 29 from LS+wenet, 13 from Vox, 41 fro AS 2024-08-19 04:57:45,482 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.59 vs. limit=15.0 2024-08-19 04:57:50,872 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 13650, loss[loss=0.1127, beats_loss=0.009098, ecapa_loss=0.0001859, whisper_loss=0.1018, over 20705.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01051, ecapa_loss=0.0001423, whisper_loss=0.09075, over 3892057.30 frames. ], batch size: 89, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:57:52,658 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.03 vs. limit=15.0 2024-08-19 04:57:59,087 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.86 vs. limit=22.5 2024-08-19 04:58:02,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4292570.0, ans=0.0 2024-08-19 04:58:03,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=4292570.0, ans=0.05 2024-08-19 04:58:20,007 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 11 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-19 04:58:33,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4292770.0, ans=0.0 2024-08-19 04:59:08,233 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 13700, loss[loss=0.1098, beats_loss=0.009708, ecapa_loss=0.0001817, whisper_loss=0.09824, over 17097.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01043, ecapa_loss=0.0001438, whisper_loss=0.09087, over 3867211.90 frames. ], batch size: 70, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:59:20,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4293070.0, ans=0.0 2024-08-19 04:59:25,443 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 23 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-19 04:59:38,032 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.742e+01 2.235e+01 2.503e+01 2.717e+01 3.786e+01, threshold=5.006e+01, percent-clipped=0.0 2024-08-19 04:59:56,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4293370.0, ans=0.125 2024-08-19 05:00:28,127 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 13750, loss[loss=0.104, beats_loss=0.00838, ecapa_loss=0.0001811, whisper_loss=0.09381, over 21607.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01045, ecapa_loss=0.000143, whisper_loss=0.0908, over 3887702.24 frames. ], batch size: 90, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:01:02,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4293770.0, ans=0.125 2024-08-19 05:01:10,529 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 27 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-19 05:01:20,714 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.90 vs. limit=15.0 2024-08-19 05:01:23,499 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.73 vs. limit=12.0 2024-08-19 05:01:29,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4293970.0, ans=0.125 2024-08-19 05:01:29,894 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.201e+01 2024-08-19 05:01:35,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4293970.0, ans=0.125 2024-08-19 05:01:37,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=4293970.0, ans=15.0 2024-08-19 05:01:39,179 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 13800, loss[loss=0.1288, beats_loss=0.007594, ecapa_loss=0.0001423, whisper_loss=0.1198, over 16467.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01039, ecapa_loss=0.0001422, whisper_loss=0.09087, over 3891502.05 frames. ], batch size: 63, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:01:52,025 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.21 vs. limit=15.0 2024-08-19 05:01:53,135 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.32 vs. limit=6.0 2024-08-19 05:01:58,422 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.46 vs. limit=15.0 2024-08-19 05:02:00,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4294170.0, ans=0.125 2024-08-19 05:02:02,832 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.336e+01 2.495e+01 2.875e+01 4.670e+01, threshold=4.991e+01, percent-clipped=0.0 2024-08-19 05:02:05,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=4294270.0, ans=0.05 2024-08-19 05:02:08,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4294270.0, ans=0.1 2024-08-19 05:02:18,888 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 23 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-19 05:02:21,930 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.56 vs. limit=15.0 2024-08-19 05:02:30,621 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 21 from LS+wenet, 31 from Vox, 37 fro AS 2024-08-19 05:02:30,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4294470.0, ans=0.125 2024-08-19 05:02:41,480 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4294470.0, ans=0.125 2024-08-19 05:02:41,791 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.50 vs. limit=22.5 2024-08-19 05:02:45,064 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 13850, loss[loss=0.1, beats_loss=0.009768, ecapa_loss=0.0001934, whisper_loss=0.08833, over 21449.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01038, ecapa_loss=0.0001428, whisper_loss=0.09087, over 3903551.27 frames. ], batch size: 95, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:02:45,156 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-19 05:02:54,176 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-19 05:02:54,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4294570.0, ans=0.2 2024-08-19 05:02:59,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4294670.0, ans=0.1 2024-08-19 05:03:04,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4294670.0, ans=0.125 2024-08-19 05:03:13,811 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-19 05:03:14,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4294770.0, ans=0.125 2024-08-19 05:03:15,430 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-19 05:03:17,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4294770.0, ans=0.1 2024-08-19 05:03:23,326 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-19 05:03:31,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4294870.0, ans=0.125 2024-08-19 05:03:47,669 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-19 05:03:50,093 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 13900, loss[loss=0.08024, beats_loss=0.01224, ecapa_loss=0.0001244, whisper_loss=0.06676, over 19715.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01037, ecapa_loss=0.0001424, whisper_loss=0.09143, over 3918831.73 frames. ], batch size: 77, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:03:54,133 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 17 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-19 05:03:54,950 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.60 vs. limit=22.5 2024-08-19 05:03:59,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4295070.0, ans=0.1 2024-08-19 05:04:13,638 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.343e+01 2.547e+01 2.776e+01 4.991e+01, threshold=5.095e+01, percent-clipped=1.0 2024-08-19 05:04:15,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4295270.0, ans=0.0 2024-08-19 05:04:31,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4295370.0, ans=0.125 2024-08-19 05:04:31,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4295370.0, ans=0.125 2024-08-19 05:04:46,170 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 24 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-19 05:04:56,500 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 13950, loss[loss=0.1023, beats_loss=0.008648, ecapa_loss=0.0001602, whisper_loss=0.09207, over 18415.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01034, ecapa_loss=0.0001429, whisper_loss=0.09174, over 3955906.13 frames. ], batch size: 74, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:04:56,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=4295570.0, ans=0.05 2024-08-19 05:04:57,566 WARNING [optim.py:496] (0/4) Scaling gradients by 0.029894206672906876, model_norm_threshold=50.94768524169922 2024-08-19 05:04:57,730 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.708e+05, grad_sumsq=1.429e+05, orig_rms_sq=3.294e+00 2024-08-19 05:05:12,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4295670.0, ans=0.125 2024-08-19 05:05:13,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4295670.0, ans=0.0 2024-08-19 05:05:16,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4295670.0, ans=0.0 2024-08-19 05:05:30,717 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 16 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-19 05:05:39,439 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.46 vs. limit=10.0 2024-08-19 05:05:54,520 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-19 05:06:02,413 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 14000, loss[loss=0.1218, beats_loss=0.01004, ecapa_loss=0.0001811, whisper_loss=0.11, over 22412.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01039, ecapa_loss=0.0001422, whisper_loss=0.09164, over 3946223.54 frames. ], batch size: 90, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:06:03,766 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 24 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-19 05:06:19,780 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-19 05:06:25,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4296170.0, ans=0.0 2024-08-19 05:06:26,042 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.077e+01 2.382e+01 2.706e+01 3.107e+01 1.704e+03, threshold=5.412e+01, percent-clipped=4.0 2024-08-19 05:06:31,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4296270.0, ans=0.125 2024-08-19 05:07:07,726 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 14050, loss[loss=0.1168, beats_loss=0.01043, ecapa_loss=0.0001255, whisper_loss=0.1052, over 23401.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01047, ecapa_loss=0.0001412, whisper_loss=0.09146, over 3920894.22 frames. ], batch size: 93, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:07:20,070 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-19 05:07:29,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4296670.0, ans=0.0 2024-08-19 05:07:30,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4296670.0, ans=0.0 2024-08-19 05:07:38,753 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-19 05:07:41,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4296770.0, ans=0.125 2024-08-19 05:08:07,291 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.79 vs. limit=15.0 2024-08-19 05:08:11,992 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-19 05:08:14,743 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 14100, loss[loss=0.1189, beats_loss=0.01082, ecapa_loss=0.0001225, whisper_loss=0.1069, over 20114.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0105, ecapa_loss=0.000141, whisper_loss=0.09099, over 3929608.05 frames. ], batch size: 78, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:08:26,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4297170.0, ans=0.125 2024-08-19 05:08:29,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4297170.0, ans=0.1 2024-08-19 05:08:38,006 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.983e+01 2.319e+01 2.520e+01 2.882e+01 1.476e+02, threshold=5.041e+01, percent-clipped=1.0 2024-08-19 05:08:42,084 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 21 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-19 05:08:52,026 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 23 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-19 05:09:04,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4297370.0, ans=0.0 2024-08-19 05:09:19,362 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 14150, loss[loss=0.09205, beats_loss=0.01188, ecapa_loss=0.0001036, whisper_loss=0.07914, over 18042.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0105, ecapa_loss=0.0001407, whisper_loss=0.09111, over 3915688.27 frames. ], batch size: 69, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:09:32,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4297670.0, ans=0.125 2024-08-19 05:09:59,669 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 27 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-19 05:09:59,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4297870.0, ans=0.2 2024-08-19 05:10:02,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4297870.0, ans=0.125 2024-08-19 05:10:06,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4297870.0, ans=0.125 2024-08-19 05:10:20,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4297970.0, ans=0.0 2024-08-19 05:10:23,715 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.59 vs. limit=10.0 2024-08-19 05:10:24,144 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 14200, loss[loss=0.1057, beats_loss=0.009832, ecapa_loss=0.0001529, whisper_loss=0.09435, over 15964.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01047, ecapa_loss=0.0001404, whisper_loss=0.09156, over 3922223.04 frames. ], batch size: 66, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:10:30,083 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.62 vs. limit=15.0 2024-08-19 05:10:46,873 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 15 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-19 05:10:48,040 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.276e+01 2.492e+01 2.801e+01 5.821e+01, threshold=4.984e+01, percent-clipped=1.0 2024-08-19 05:10:59,098 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-19 05:11:01,950 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.870e+01 2024-08-19 05:11:03,059 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-19 05:11:17,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4298470.0, ans=0.125 2024-08-19 05:11:17,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4298470.0, ans=0.0 2024-08-19 05:11:29,944 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 14250, loss[loss=0.1195, beats_loss=0.01077, ecapa_loss=0.0001291, whisper_loss=0.1074, over 22221.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01043, ecapa_loss=0.0001405, whisper_loss=0.09156, over 3917086.21 frames. ], batch size: 90, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:11:36,962 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.142e-03 2024-08-19 05:11:39,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4298570.0, ans=0.0 2024-08-19 05:11:45,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4298670.0, ans=0.125 2024-08-19 05:11:45,259 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.20 vs. limit=22.5 2024-08-19 05:11:50,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4298670.0, ans=0.0 2024-08-19 05:11:57,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4298770.0, ans=0.1 2024-08-19 05:12:02,802 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 19 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-19 05:12:08,885 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-19 05:12:11,702 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=8.791e-02 2024-08-19 05:12:18,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4298870.0, ans=0.0 2024-08-19 05:12:20,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4298970.0, ans=0.125 2024-08-19 05:12:33,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4299070.0, ans=0.125 2024-08-19 05:12:34,228 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 14300, loss[loss=0.125, beats_loss=0.007299, ecapa_loss=0.0001355, whisper_loss=0.1164, over 22157.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01051, ecapa_loss=0.0001408, whisper_loss=0.09064, over 3914579.57 frames. ], batch size: 88, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:12:47,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4299170.0, ans=0.2 2024-08-19 05:12:47,857 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 25 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-19 05:12:57,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4299170.0, ans=0.125 2024-08-19 05:12:57,971 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.820e+01 2.328e+01 2.560e+01 2.862e+01 1.139e+02, threshold=5.121e+01, percent-clipped=2.0 2024-08-19 05:13:03,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=4299270.0, ans=0.025 2024-08-19 05:13:05,635 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 23 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-19 05:13:24,640 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-19 05:13:40,119 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 14350, loss[loss=0.1048, beats_loss=0.011, ecapa_loss=0.0001756, whisper_loss=0.09205, over 18010.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01057, ecapa_loss=0.000141, whisper_loss=0.08982, over 3908562.79 frames. ], batch size: 74, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:13:41,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4299570.0, ans=0.125 2024-08-19 05:13:59,592 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4299670.0, ans=0.1 2024-08-19 05:14:11,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4299770.0, ans=0.0 2024-08-19 05:14:11,964 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 17 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-19 05:14:13,315 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-19 05:14:17,989 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2024-08-19 05:14:33,962 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-19 05:14:36,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4299970.0, ans=0.1 2024-08-19 05:14:43,944 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 14400, loss[loss=0.1016, beats_loss=0.01072, ecapa_loss=0.0001666, whisper_loss=0.08922, over 15910.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01055, ecapa_loss=0.0001407, whisper_loss=0.09026, over 3915466.20 frames. ], batch size: 67, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:14:59,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4300170.0, ans=0.2 2024-08-19 05:15:06,839 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.240e+01 2.498e+01 2.789e+01 4.237e+01, threshold=4.997e+01, percent-clipped=0.0 2024-08-19 05:15:24,069 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 05:15:31,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4300370.0, ans=0.125 2024-08-19 05:15:32,017 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=7.854e-02 2024-08-19 05:15:39,291 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.40 vs. limit=15.0 2024-08-19 05:15:45,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4300470.0, ans=0.0 2024-08-19 05:15:48,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4300570.0, ans=0.1 2024-08-19 05:15:48,995 INFO [train_multi_KD3.py:1116] (0/4) Epoch 29, batch 14450, loss[loss=0.1237, beats_loss=0.00839, ecapa_loss=0.0001139, whisper_loss=0.1142, over 18772.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01049, ecapa_loss=0.0001422, whisper_loss=0.0906, over 3930850.31 frames. ], batch size: 68, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:15:50,359 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-19 05:15:55,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4300570.0, ans=0.2 2024-08-19 05:15:57,112 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 20 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-19 05:15:59,845 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 26 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-19 05:16:01,120 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 36 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-19 05:16:03,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4300670.0, ans=0.0 2024-08-19 05:16:07,116 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-19 05:16:08,277 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 23 from LS+wenet, 11 from Vox, 20 fro AS 2024-08-19 05:16:11,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4300670.0, ans=0.2 2024-08-19 05:16:12,162 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.79 vs. limit=15.0 2024-08-19 05:16:19,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4300770.0, ans=0.0 2024-08-19 05:16:22,731 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 20 from LS+wenet, 24 from Vox, 18 fro AS 2024-08-19 05:16:24,854 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.38 vs. limit=6.0 2024-08-19 05:16:28,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4300870.0, ans=0.0 2024-08-19 05:16:29,612 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.93 vs. limit=22.5 2024-08-19 05:16:31,311 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 21 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-19 05:16:42,932 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/epoch-29.pt 2024-08-19 05:17:19,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4300990.0, ans=0.09899494936611666 2024-08-19 05:17:20,840 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 0, loss[loss=0.1083, beats_loss=0.009567, ecapa_loss=0.0001806, whisper_loss=0.0969, over 19236.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.009567, ecapa_loss=0.0001806, whisper_loss=0.0969, over 19236.00 frames. ], batch size: 81, lr: 2.06e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:17:20,841 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-19 05:17:59,446 INFO [train_multi_KD3.py:1149] (0/4) Epoch 30, validation on ASR_libri: loss=0.2538, beats_loss=0, ecapa_loss=0.0005174, whisper_loss=0.2486, over 922467.00 frames. 2024-08-19 05:18:14,954 INFO [train_multi_KD3.py:1149] (0/4) Epoch 30, validation on SV_voxceleb1: loss=0.003909, beats_loss=0, ecapa_loss=0.0003909, whisper_loss=0, over 939242.00 frames. 2024-08-19 05:19:49,563 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.9721, 2.3597, 1.9498, 1.4612, 1.8610, 1.7389, 2.1158, 2.0183], device='cuda:0') 2024-08-19 05:20:09,418 INFO [train_multi_KD3.py:1149] (0/4) Epoch 30, validation on AT_audioset: loss=0.02304, beats_loss=0.02304, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 05:20:09,422 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32389MB 2024-08-19 05:20:09,565 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 12 from Vox, 37 fro AS 2024-08-19 05:20:22,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4300990.0, ans=0.0 2024-08-19 05:21:04,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4301190.0, ans=0.2 2024-08-19 05:21:16,716 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.422e+01 2.650e+01 2.982e+01 4.420e+02, threshold=5.300e+01, percent-clipped=2.0 2024-08-19 05:21:18,678 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-19 05:21:24,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4301290.0, ans=0.2 2024-08-19 05:21:28,794 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 20 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-19 05:21:35,972 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-19 05:21:36,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4301290.0, ans=0.2 2024-08-19 05:21:59,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4301390.0, ans=0.125 2024-08-19 05:22:12,616 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 50, loss[loss=0.1049, beats_loss=0.007574, ecapa_loss=0.0001844, whisper_loss=0.09549, over 14585.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.008959, ecapa_loss=0.0001553, whisper_loss=0.09065, over 890192.98 frames. ], batch size: 61, lr: 2.06e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:23:10,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4301690.0, ans=0.125 2024-08-19 05:23:27,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4301790.0, ans=0.0 2024-08-19 05:23:29,477 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 26 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-19 05:23:32,584 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.19 vs. limit=22.5 2024-08-19 05:23:45,683 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.54 vs. limit=10.0 2024-08-19 05:23:56,624 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-19 05:24:07,538 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 100, loss[loss=0.0964, beats_loss=0.01047, ecapa_loss=0.0001545, whisper_loss=0.08438, over 20804.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.009035, ecapa_loss=0.0001474, whisper_loss=0.09085, over 1573043.82 frames. ], batch size: 81, lr: 2.06e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:24:12,308 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-19 05:24:43,785 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.11 vs. limit=22.5 2024-08-19 05:24:50,262 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-19 05:25:05,388 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.646e+01 2.905e+01 3.235e+01 6.271e+01, threshold=5.810e+01, percent-clipped=1.0 2024-08-19 05:25:30,447 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.92 vs. limit=15.0 2024-08-19 05:25:34,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4302390.0, ans=0.0 2024-08-19 05:25:42,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4302390.0, ans=0.125 2024-08-19 05:25:51,600 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 150, loss[loss=0.09319, beats_loss=0.009361, ecapa_loss=0.0001331, whisper_loss=0.08249, over 21962.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.009198, ecapa_loss=0.0001451, whisper_loss=0.09097, over 2073175.73 frames. ], batch size: 84, lr: 2.06e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:26:18,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4302590.0, ans=0.2 2024-08-19 05:26:23,440 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.06 vs. limit=22.5 2024-08-19 05:26:26,039 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-19 05:26:26,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4302690.0, ans=0.125 2024-08-19 05:26:28,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4302690.0, ans=0.125 2024-08-19 05:26:32,344 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 35 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-19 05:26:32,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4302690.0, ans=0.2 2024-08-19 05:26:32,899 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.24 vs. limit=15.0 2024-08-19 05:26:38,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4302790.0, ans=0.125 2024-08-19 05:26:43,535 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-19 05:26:46,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4302790.0, ans=0.125 2024-08-19 05:27:05,499 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.31 vs. limit=15.0 2024-08-19 05:27:05,908 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 200, loss[loss=0.1117, beats_loss=0.007238, ecapa_loss=0.0001493, whisper_loss=0.1029, over 14776.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0094, ecapa_loss=0.0001463, whisper_loss=0.09074, over 2439384.98 frames. ], batch size: 55, lr: 2.06e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:27:07,721 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-19 05:27:15,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4302990.0, ans=0.125 2024-08-19 05:27:28,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4303090.0, ans=0.0 2024-08-19 05:27:31,331 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.58 vs. limit=15.0 2024-08-19 05:27:42,098 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.381e+01 2.550e+01 2.834e+01 4.054e+01, threshold=5.100e+01, percent-clipped=0.0 2024-08-19 05:27:46,779 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.90 vs. limit=15.0 2024-08-19 05:27:47,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4303290.0, ans=0.0 2024-08-19 05:27:49,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4303290.0, ans=0.125 2024-08-19 05:27:49,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4303290.0, ans=0.125 2024-08-19 05:27:52,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4303290.0, ans=0.125 2024-08-19 05:27:53,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4303290.0, ans=0.1 2024-08-19 05:27:55,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4303290.0, ans=0.2 2024-08-19 05:27:58,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4303290.0, ans=0.125 2024-08-19 05:28:12,963 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 250, loss[loss=0.07456, beats_loss=0.0122, ecapa_loss=0.000149, whisper_loss=0.06087, over 16127.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.009668, ecapa_loss=0.0001449, whisper_loss=0.09047, over 2726240.25 frames. ], batch size: 68, lr: 2.06e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:28:15,755 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 15 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-19 05:28:21,084 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.43 vs. limit=15.0 2024-08-19 05:28:45,039 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2024-08-19 05:29:02,427 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2024-08-19 05:29:05,526 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-19 05:29:12,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4303890.0, ans=0.0 2024-08-19 05:29:15,184 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 300, loss[loss=0.1293, beats_loss=0.009444, ecapa_loss=0.0001261, whisper_loss=0.1186, over 20829.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.009932, ecapa_loss=0.0001448, whisper_loss=0.08973, over 2959802.31 frames. ], batch size: 77, lr: 2.06e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:29:17,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4303990.0, ans=0.1 2024-08-19 05:29:23,683 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.18 vs. limit=15.0 2024-08-19 05:29:29,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4304090.0, ans=0.125 2024-08-19 05:29:36,010 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.81 vs. limit=15.0 2024-08-19 05:29:40,447 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 30 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-19 05:29:41,946 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 14 from LS+wenet, 28 from Vox, 23 fro AS 2024-08-19 05:29:47,833 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.657e+01 2.190e+01 2.363e+01 2.558e+01 4.313e+01, threshold=4.727e+01, percent-clipped=0.0 2024-08-19 05:29:53,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4304290.0, ans=0.125 2024-08-19 05:30:00,828 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-19 05:30:00,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4304290.0, ans=0.0 2024-08-19 05:30:09,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4304390.0, ans=0.125 2024-08-19 05:30:12,535 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.56 vs. limit=15.0 2024-08-19 05:30:17,657 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 350, loss[loss=0.09422, beats_loss=0.01028, ecapa_loss=0.0001317, whisper_loss=0.08262, over 16049.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.009929, ecapa_loss=0.0001442, whisper_loss=0.08995, over 3123109.60 frames. ], batch size: 63, lr: 2.06e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:30:18,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4304490.0, ans=0.125 2024-08-19 05:30:25,596 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 29 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-19 05:30:36,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4304590.0, ans=0.1 2024-08-19 05:30:45,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4304690.0, ans=0.0 2024-08-19 05:30:47,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4304690.0, ans=0.2 2024-08-19 05:30:51,940 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.65 vs. limit=22.5 2024-08-19 05:31:10,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4304890.0, ans=0.125 2024-08-19 05:31:11,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4304890.0, ans=0.0 2024-08-19 05:31:16,982 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 25 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-19 05:31:19,717 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 400, loss[loss=0.1082, beats_loss=0.007787, ecapa_loss=0.0001738, whisper_loss=0.09871, over 21528.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.009983, ecapa_loss=0.0001442, whisper_loss=0.08977, over 3273759.29 frames. ], batch size: 88, lr: 2.06e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:31:28,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4304990.0, ans=0.2 2024-08-19 05:31:33,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4305090.0, ans=0.0 2024-08-19 05:31:46,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4305190.0, ans=0.1 2024-08-19 05:31:47,693 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.82 vs. limit=15.0 2024-08-19 05:31:51,891 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.222e+01 2.400e+01 2.669e+01 1.087e+02, threshold=4.801e+01, percent-clipped=1.0 2024-08-19 05:32:01,064 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 27 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-19 05:32:06,965 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-19 05:32:19,438 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-19 05:32:21,890 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 450, loss[loss=0.08219, beats_loss=0.008645, ecapa_loss=0.0001603, whisper_loss=0.07194, over 16939.00 frames. ], tot_loss[loss=0.101, beats_loss=0.009961, ecapa_loss=0.0001436, whisper_loss=0.08964, over 3415793.56 frames. ], batch size: 67, lr: 2.06e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:32:23,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4305490.0, ans=0.125 2024-08-19 05:32:28,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4305490.0, ans=0.2 2024-08-19 05:32:30,346 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-19 05:32:39,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4305590.0, ans=0.125 2024-08-19 05:32:40,725 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 22 from LS+wenet, 30 from Vox, 44 fro AS 2024-08-19 05:32:53,881 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-19 05:32:54,513 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.80 vs. limit=15.0 2024-08-19 05:33:08,166 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 23 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-19 05:33:13,990 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-19 05:33:24,255 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 500, loss[loss=0.09453, beats_loss=0.01157, ecapa_loss=0.0001499, whisper_loss=0.08146, over 18674.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01009, ecapa_loss=0.0001423, whisper_loss=0.08902, over 3510116.75 frames. ], batch size: 76, lr: 2.06e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:33:26,766 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 22 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-19 05:33:34,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4305990.0, ans=0.125 2024-08-19 05:33:37,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4306090.0, ans=0.0 2024-08-19 05:33:38,275 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-19 05:33:49,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4306190.0, ans=0.1 2024-08-19 05:33:56,743 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.612e+01 2.301e+01 2.420e+01 2.660e+01 4.195e+01, threshold=4.841e+01, percent-clipped=0.0 2024-08-19 05:33:59,760 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4306190.0, ans=0.1 2024-08-19 05:34:01,232 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.97 vs. limit=22.5 2024-08-19 05:34:02,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4306290.0, ans=0.125 2024-08-19 05:34:14,294 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 28 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-19 05:34:18,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4306390.0, ans=0.125 2024-08-19 05:34:19,301 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-19 05:34:20,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4306390.0, ans=0.125 2024-08-19 05:34:26,666 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 550, loss[loss=0.103, beats_loss=0.01224, ecapa_loss=0.0001351, whisper_loss=0.08941, over 19159.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0101, ecapa_loss=0.0001414, whisper_loss=0.0903, over 3586088.84 frames. ], batch size: 76, lr: 2.06e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:34:52,714 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 30 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-19 05:35:15,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4306890.0, ans=0.125 2024-08-19 05:35:22,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4306890.0, ans=0.125 2024-08-19 05:35:27,442 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-19 05:35:28,487 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 600, loss[loss=0.1159, beats_loss=0.009388, ecapa_loss=0.0001516, whisper_loss=0.105, over 22418.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01014, ecapa_loss=0.0001414, whisper_loss=0.09065, over 3677345.74 frames. ], batch size: 89, lr: 2.06e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:35:43,180 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.76 vs. limit=15.0 2024-08-19 05:35:45,301 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.29 vs. limit=15.0 2024-08-19 05:35:47,532 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 22 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-19 05:35:51,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4307090.0, ans=0.0 2024-08-19 05:35:55,283 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-19 05:35:56,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4307190.0, ans=0.2 2024-08-19 05:36:01,143 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.277e+01 2.491e+01 2.795e+01 3.103e+02, threshold=4.982e+01, percent-clipped=2.0 2024-08-19 05:36:16,133 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-19 05:36:18,626 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 19 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-19 05:36:23,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4307390.0, ans=0.2 2024-08-19 05:36:30,723 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 650, loss[loss=0.1014, beats_loss=0.01077, ecapa_loss=0.0001134, whisper_loss=0.08948, over 16055.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0102, ecapa_loss=0.0001408, whisper_loss=0.09065, over 3706687.17 frames. ], batch size: 62, lr: 2.06e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:36:41,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4307590.0, ans=0.0 2024-08-19 05:36:59,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4307690.0, ans=0.125 2024-08-19 05:37:00,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4307690.0, ans=0.1 2024-08-19 05:37:00,735 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.51 vs. limit=15.0 2024-08-19 05:37:14,131 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4307790.0, ans=0.125 2024-08-19 05:37:18,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4307890.0, ans=0.0 2024-08-19 05:37:27,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4307890.0, ans=0.0 2024-08-19 05:37:32,277 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 700, loss[loss=0.1191, beats_loss=0.00899, ecapa_loss=0.0001222, whisper_loss=0.1089, over 24491.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01012, ecapa_loss=0.0001406, whisper_loss=0.09153, over 3742707.66 frames. ], batch size: 92, lr: 2.06e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:37:37,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4307990.0, ans=0.125 2024-08-19 05:37:40,747 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 15 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-19 05:37:40,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4307990.0, ans=0.125 2024-08-19 05:37:59,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4308190.0, ans=0.125 2024-08-19 05:38:03,910 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.712e+01 2.327e+01 2.528e+01 2.779e+01 3.860e+01, threshold=5.056e+01, percent-clipped=1.0 2024-08-19 05:38:07,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4308290.0, ans=0.125 2024-08-19 05:38:17,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4308290.0, ans=0.125 2024-08-19 05:38:18,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4308290.0, ans=0.125 2024-08-19 05:38:28,315 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.34 vs. limit=15.0 2024-08-19 05:38:33,923 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 750, loss[loss=0.1383, beats_loss=0.006242, ecapa_loss=0.0001396, whisper_loss=0.1307, over 16004.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01012, ecapa_loss=0.0001399, whisper_loss=0.09115, over 3765076.08 frames. ], batch size: 56, lr: 2.06e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:38:50,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4308590.0, ans=0.0 2024-08-19 05:38:53,941 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 14 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-19 05:38:55,157 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-19 05:38:57,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4308690.0, ans=0.2 2024-08-19 05:38:58,582 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 9 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-19 05:39:10,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4308790.0, ans=0.1 2024-08-19 05:39:13,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4308790.0, ans=0.125 2024-08-19 05:39:25,468 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 31 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-19 05:39:29,061 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 22 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-19 05:39:35,145 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 800, loss[loss=0.09399, beats_loss=0.01015, ecapa_loss=0.0001411, whisper_loss=0.08242, over 15678.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01016, ecapa_loss=0.0001394, whisper_loss=0.09123, over 3778608.34 frames. ], batch size: 61, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:39:36,523 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 18 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-19 05:39:41,244 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 28 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-19 05:39:47,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4309090.0, ans=0.1 2024-08-19 05:40:00,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4309190.0, ans=0.0 2024-08-19 05:40:01,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4309190.0, ans=0.125 2024-08-19 05:40:02,456 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-19 05:40:07,650 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.249e+01 2.414e+01 2.640e+01 3.694e+01, threshold=4.828e+01, percent-clipped=0.0 2024-08-19 05:40:23,825 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-19 05:40:34,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4309390.0, ans=0.125 2024-08-19 05:40:37,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4309490.0, ans=0.125 2024-08-19 05:40:37,923 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 850, loss[loss=0.1026, beats_loss=0.009993, ecapa_loss=0.0001095, whisper_loss=0.0915, over 15933.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01023, ecapa_loss=0.0001389, whisper_loss=0.09017, over 3771052.34 frames. ], batch size: 59, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:40:44,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=4309490.0, ans=0.5 2024-08-19 05:40:56,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4309590.0, ans=0.125 2024-08-19 05:41:23,243 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-19 05:41:28,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4309890.0, ans=0.0 2024-08-19 05:41:38,641 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 19 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-19 05:41:38,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4309890.0, ans=0.0 2024-08-19 05:41:39,049 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.13 vs. limit=15.0 2024-08-19 05:41:39,777 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 21 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-19 05:41:41,187 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 900, loss[loss=0.09224, beats_loss=0.01085, ecapa_loss=0.0001187, whisper_loss=0.08021, over 18724.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0102, ecapa_loss=0.0001383, whisper_loss=0.08987, over 3794758.07 frames. ], batch size: 74, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:41:45,254 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 23 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-19 05:41:45,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4309990.0, ans=0.125 2024-08-19 05:41:46,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4309990.0, ans=0.125 2024-08-19 05:41:51,184 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 14 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-19 05:41:53,653 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 20 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-19 05:41:53,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4310090.0, ans=0.1 2024-08-19 05:42:05,939 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.08 vs. limit=22.5 2024-08-19 05:42:14,286 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.707e+01 2.290e+01 2.531e+01 2.891e+01 5.693e+01, threshold=5.062e+01, percent-clipped=1.0 2024-08-19 05:42:19,530 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 16 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-19 05:42:20,663 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 13 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-19 05:42:22,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4310290.0, ans=0.125 2024-08-19 05:42:34,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4310390.0, ans=0.0 2024-08-19 05:42:36,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4310390.0, ans=0.125 2024-08-19 05:42:42,125 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.45 vs. limit=22.5 2024-08-19 05:42:45,443 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 950, loss[loss=0.0826, beats_loss=0.01234, ecapa_loss=0.0001413, whisper_loss=0.06885, over 22532.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01027, ecapa_loss=0.0001384, whisper_loss=0.0888, over 3792913.50 frames. ], batch size: 95, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:42:45,565 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 30 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-19 05:42:50,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4310490.0, ans=0.0 2024-08-19 05:43:03,321 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-19 05:43:07,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4310590.0, ans=0.125 2024-08-19 05:43:46,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4310890.0, ans=0.0 2024-08-19 05:43:49,827 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 1000, loss[loss=0.09107, beats_loss=0.01073, ecapa_loss=0.0001432, whisper_loss=0.07891, over 23099.00 frames. ], tot_loss[loss=0.09988, beats_loss=0.01037, ecapa_loss=0.0001384, whisper_loss=0.08813, over 3770937.29 frames. ], batch size: 94, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:43:51,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4310990.0, ans=0.125 2024-08-19 05:43:58,762 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.094e+00 2024-08-19 05:44:18,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4311190.0, ans=0.1 2024-08-19 05:44:24,332 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.215e+01 2.410e+01 2.644e+01 3.685e+01, threshold=4.819e+01, percent-clipped=0.0 2024-08-19 05:44:26,657 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.88 vs. limit=15.0 2024-08-19 05:44:27,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4311190.0, ans=0.125 2024-08-19 05:44:34,075 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-19 05:44:52,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4311390.0, ans=0.125 2024-08-19 05:44:56,078 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 1050, loss[loss=0.08476, beats_loss=0.01016, ecapa_loss=0.0001365, whisper_loss=0.07324, over 19507.00 frames. ], tot_loss[loss=0.09987, beats_loss=0.01038, ecapa_loss=0.0001374, whisper_loss=0.08811, over 3800408.33 frames. ], batch size: 78, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:44:58,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4311490.0, ans=0.0 2024-08-19 05:45:11,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4311590.0, ans=0.125 2024-08-19 05:45:30,744 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-19 05:45:33,267 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 13 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-19 05:45:36,386 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.38 vs. limit=22.5 2024-08-19 05:45:43,210 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.22 vs. limit=15.0 2024-08-19 05:45:45,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4311790.0, ans=0.0 2024-08-19 05:45:51,449 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 24 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-19 05:45:55,535 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 27 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-19 05:45:55,883 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4311890.0, ans=0.05 2024-08-19 05:46:02,105 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 1100, loss[loss=0.09899, beats_loss=0.009732, ecapa_loss=0.0001275, whisper_loss=0.08798, over 17048.00 frames. ], tot_loss[loss=0.09985, beats_loss=0.01035, ecapa_loss=0.0001382, whisper_loss=0.08813, over 3795220.45 frames. ], batch size: 65, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:46:04,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4311990.0, ans=0.125 2024-08-19 05:46:19,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4312090.0, ans=0.125 2024-08-19 05:46:27,573 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 28 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-19 05:46:29,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4312190.0, ans=0.1 2024-08-19 05:46:36,712 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.284e+01 2.503e+01 2.808e+01 4.234e+01, threshold=5.006e+01, percent-clipped=0.0 2024-08-19 05:46:45,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4312290.0, ans=0.025 2024-08-19 05:46:46,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4312290.0, ans=0.2 2024-08-19 05:46:46,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4312290.0, ans=0.125 2024-08-19 05:46:55,339 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 05:47:06,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4312390.0, ans=0.0 2024-08-19 05:47:10,040 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 1150, loss[loss=0.1012, beats_loss=0.009939, ecapa_loss=0.0001547, whisper_loss=0.08975, over 19777.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01026, ecapa_loss=0.0001402, whisper_loss=0.08883, over 3778533.94 frames. ], batch size: 80, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:47:10,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4312490.0, ans=0.2 2024-08-19 05:47:17,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4312490.0, ans=0.2 2024-08-19 05:47:33,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4312590.0, ans=0.025 2024-08-19 05:47:46,671 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 26 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-19 05:47:59,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4312790.0, ans=0.0 2024-08-19 05:48:01,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4312790.0, ans=10.0 2024-08-19 05:48:11,647 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 19 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-19 05:48:18,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4312890.0, ans=0.0 2024-08-19 05:48:21,567 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 1200, loss[loss=0.1036, beats_loss=0.009859, ecapa_loss=0.000161, whisper_loss=0.09209, over 15281.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01035, ecapa_loss=0.0001395, whisper_loss=0.08864, over 3800488.19 frames. ], batch size: 61, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:48:37,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4313090.0, ans=0.0 2024-08-19 05:48:44,869 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4313090.0, ans=0.125 2024-08-19 05:48:55,509 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-19 05:48:59,726 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.227e+01 2.498e+01 2.672e+01 3.472e+01, threshold=4.995e+01, percent-clipped=0.0 2024-08-19 05:49:11,075 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 22 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-19 05:49:22,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4313390.0, ans=10.0 2024-08-19 05:49:30,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4313390.0, ans=0.0 2024-08-19 05:49:34,264 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 1250, loss[loss=0.08034, beats_loss=0.01551, ecapa_loss=9.611e-05, whisper_loss=0.06386, over 22940.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01036, ecapa_loss=0.0001383, whisper_loss=0.08875, over 3837484.96 frames. ], batch size: 91, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:49:36,761 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.76 vs. limit=15.0 2024-08-19 05:49:38,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=4313490.0, ans=10.0 2024-08-19 05:49:43,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4313490.0, ans=0.125 2024-08-19 05:49:51,566 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.79 vs. limit=15.0 2024-08-19 05:50:03,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4313690.0, ans=0.125 2024-08-19 05:50:04,156 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-19 05:50:25,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4313790.0, ans=0.0 2024-08-19 05:50:48,405 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 1300, loss[loss=0.0958, beats_loss=0.00868, ecapa_loss=0.0001385, whisper_loss=0.08574, over 16094.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01037, ecapa_loss=0.0001376, whisper_loss=0.08918, over 3837390.59 frames. ], batch size: 61, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:50:57,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=4313990.0, ans=0.05 2024-08-19 05:51:00,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4313990.0, ans=0.125 2024-08-19 05:51:00,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4313990.0, ans=0.1 2024-08-19 05:51:07,873 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.20 vs. limit=10.0 2024-08-19 05:51:10,821 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4314090.0, ans=0.125 2024-08-19 05:51:26,421 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.752e+01 2.267e+01 2.463e+01 2.675e+01 4.452e+01, threshold=4.926e+01, percent-clipped=0.0 2024-08-19 05:51:33,836 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 22 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-19 05:51:52,238 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-19 05:51:54,458 WARNING [optim.py:496] (0/4) Scaling gradients by 0.009816886857151985, model_norm_threshold=49.264041900634766 2024-08-19 05:51:54,632 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.0.norm.log_scale with proportion 0.15, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.852e+06, grad_sumsq=3.852e+06, orig_rms_sq=1.000e+00 2024-08-19 05:51:58,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4314390.0, ans=0.125 2024-08-19 05:52:02,061 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 1350, loss[loss=0.1115, beats_loss=0.009835, ecapa_loss=0.0001145, whisper_loss=0.1006, over 18342.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01045, ecapa_loss=0.0001379, whisper_loss=0.08989, over 3831924.69 frames. ], batch size: 69, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:52:05,786 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.41 vs. limit=22.5 2024-08-19 05:52:09,724 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 20 from LS+wenet, 26 from Vox, 19 fro AS 2024-08-19 05:52:10,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4314490.0, ans=0.0 2024-08-19 05:52:21,300 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2024-08-19 05:52:30,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4314590.0, ans=0.0 2024-08-19 05:52:33,240 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.07 vs. limit=15.0 2024-08-19 05:52:33,425 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.02 vs. limit=15.0 2024-08-19 05:52:57,812 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-19 05:53:02,648 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-19 05:53:18,921 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 1400, loss[loss=0.09943, beats_loss=0.0113, ecapa_loss=0.0001188, whisper_loss=0.08693, over 21287.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01034, ecapa_loss=0.0001395, whisper_loss=0.0896, over 3816227.56 frames. ], batch size: 84, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:53:28,084 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 32 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-19 05:53:40,158 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.34 vs. limit=5.0 2024-08-19 05:53:42,346 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 05:53:42,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4315090.0, ans=0.025 2024-08-19 05:53:45,063 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-19 05:53:57,441 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.218e+01 2.484e+01 2.837e+01 5.018e+03, threshold=4.968e+01, percent-clipped=1.0 2024-08-19 05:54:02,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4315290.0, ans=0.1 2024-08-19 05:54:16,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4315390.0, ans=0.125 2024-08-19 05:54:27,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4315390.0, ans=0.125 2024-08-19 05:54:53,137 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 1450, loss[loss=0.1035, beats_loss=0.01089, ecapa_loss=0.0001433, whisper_loss=0.09116, over 18171.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01042, ecapa_loss=0.0001375, whisper_loss=0.08933, over 3841753.43 frames. ], batch size: 72, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:55:34,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4315790.0, ans=0.1 2024-08-19 05:55:45,964 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 28 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-19 05:56:04,404 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 1500, loss[loss=0.07554, beats_loss=0.01124, ecapa_loss=0.0001695, whisper_loss=0.0626, over 16804.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01045, ecapa_loss=0.0001383, whisper_loss=0.08877, over 3846779.67 frames. ], batch size: 74, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:56:38,841 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 25 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-19 05:56:40,285 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.207e+01 2.443e+01 2.756e+01 5.889e+01, threshold=4.886e+01, percent-clipped=1.0 2024-08-19 05:56:54,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4316290.0, ans=0.1 2024-08-19 05:56:58,358 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.85 vs. limit=15.0 2024-08-19 05:56:59,490 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.56 vs. limit=15.0 2024-08-19 05:57:11,947 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-19 05:57:13,144 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 22 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-19 05:57:14,338 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 1550, loss[loss=0.08587, beats_loss=0.01088, ecapa_loss=0.0001377, whisper_loss=0.07361, over 21633.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01042, ecapa_loss=0.0001382, whisper_loss=0.08844, over 3825683.02 frames. ], batch size: 89, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:57:24,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4316490.0, ans=0.0 2024-08-19 05:57:38,099 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.00 vs. limit=15.0 2024-08-19 05:58:04,718 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.22 vs. limit=15.0 2024-08-19 05:58:08,131 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 20 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-19 05:58:11,761 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.49 vs. limit=15.0 2024-08-19 05:58:16,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4316890.0, ans=0.0 2024-08-19 05:58:17,841 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-19 05:58:23,281 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-19 05:58:24,365 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 1600, loss[loss=0.1045, beats_loss=0.01056, ecapa_loss=0.0001483, whisper_loss=0.09246, over 18590.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01038, ecapa_loss=0.0001393, whisper_loss=0.0887, over 3852209.57 frames. ], batch size: 73, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:58:27,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4316990.0, ans=0.05 2024-08-19 05:58:28,005 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.27 vs. limit=15.0 2024-08-19 05:58:33,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4316990.0, ans=0.0 2024-08-19 05:58:34,236 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-19 05:58:38,654 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 15 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-19 05:58:39,199 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.65 vs. limit=10.0 2024-08-19 05:58:40,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4317090.0, ans=0.125 2024-08-19 05:58:44,543 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 05:58:59,942 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.330e+01 2.560e+01 2.867e+01 4.282e+01, threshold=5.120e+01, percent-clipped=0.0 2024-08-19 05:59:07,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4317290.0, ans=0.0 2024-08-19 05:59:29,599 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 18 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-19 05:59:31,765 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 1650, loss[loss=0.09746, beats_loss=0.01099, ecapa_loss=0.0001103, whisper_loss=0.08537, over 16175.00 frames. ], tot_loss[loss=0.101, beats_loss=0.0103, ecapa_loss=0.000138, whisper_loss=0.08935, over 3837840.00 frames. ], batch size: 63, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:59:52,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4317590.0, ans=0.2 2024-08-19 06:00:06,782 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.10 vs. limit=22.5 2024-08-19 06:00:17,911 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-19 06:00:38,087 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 1700, loss[loss=0.09546, beats_loss=0.01235, ecapa_loss=0.0001267, whisper_loss=0.08185, over 21482.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01032, ecapa_loss=0.0001381, whisper_loss=0.08957, over 3841332.38 frames. ], batch size: 88, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:00:43,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4317990.0, ans=0.1 2024-08-19 06:00:46,290 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 12 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-19 06:00:53,092 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.82 vs. limit=15.0 2024-08-19 06:00:57,246 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 25 from LS+wenet, 26 from Vox, 20 fro AS 2024-08-19 06:01:05,747 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.57 vs. limit=15.0 2024-08-19 06:01:11,376 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.261e+01 2.522e+01 2.818e+01 7.809e+01, threshold=5.044e+01, percent-clipped=1.0 2024-08-19 06:01:45,060 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 1750, loss[loss=0.09871, beats_loss=0.01069, ecapa_loss=0.0001514, whisper_loss=0.08651, over 20287.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01031, ecapa_loss=0.000138, whisper_loss=0.08962, over 3879417.91 frames. ], batch size: 84, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:01:48,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4318490.0, ans=0.125 2024-08-19 06:01:49,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4318490.0, ans=0.07 2024-08-19 06:01:59,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4318590.0, ans=0.125 2024-08-19 06:02:00,159 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-19 06:02:03,857 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-19 06:02:11,077 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 13 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-19 06:02:15,710 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 21 from LS+wenet, 22 from Vox, 12 fro AS 2024-08-19 06:02:30,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4318690.0, ans=0.125 2024-08-19 06:02:38,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4318790.0, ans=0.125 2024-08-19 06:03:03,154 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 1800, loss[loss=0.11, beats_loss=0.009061, ecapa_loss=0.000119, whisper_loss=0.09975, over 18766.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01028, ecapa_loss=0.0001394, whisper_loss=0.08921, over 3876346.24 frames. ], batch size: 71, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:03:08,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4318990.0, ans=0.1 2024-08-19 06:03:27,447 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 28 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-19 06:03:27,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4319090.0, ans=0.2 2024-08-19 06:03:32,483 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 27 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-19 06:03:33,863 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 27 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-19 06:03:41,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4319190.0, ans=0.125 2024-08-19 06:03:41,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4319190.0, ans=0.0 2024-08-19 06:03:49,009 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.637e+01 2.229e+01 2.414e+01 2.728e+01 1.792e+02, threshold=4.829e+01, percent-clipped=1.0 2024-08-19 06:03:51,377 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 06:03:51,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4319190.0, ans=0.0 2024-08-19 06:03:55,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4319290.0, ans=0.07 2024-08-19 06:04:16,776 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 25 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-19 06:04:20,409 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 20 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-19 06:04:26,918 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=4319390.0, ans=15.0 2024-08-19 06:04:35,209 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 1850, loss[loss=0.09809, beats_loss=0.0112, ecapa_loss=0.0001588, whisper_loss=0.08531, over 20937.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01031, ecapa_loss=0.0001393, whisper_loss=0.08922, over 3866277.86 frames. ], batch size: 87, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:04:39,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4319490.0, ans=0.125 2024-08-19 06:04:41,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4319490.0, ans=0.125 2024-08-19 06:04:45,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4319490.0, ans=0.2 2024-08-19 06:04:45,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=4319490.0, ans=0.02 2024-08-19 06:04:54,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4319590.0, ans=0.125 2024-08-19 06:04:58,910 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 18 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-19 06:05:21,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4319690.0, ans=0.125 2024-08-19 06:05:30,362 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 12 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-19 06:05:34,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4319790.0, ans=0.0 2024-08-19 06:06:00,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4319890.0, ans=0.0 2024-08-19 06:06:02,801 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.38 vs. limit=6.0 2024-08-19 06:06:06,932 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-19 06:06:07,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4319890.0, ans=0.125 2024-08-19 06:06:14,504 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-432000.pt 2024-08-19 06:06:18,383 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 1900, loss[loss=0.08239, beats_loss=0.01021, ecapa_loss=0.0001301, whisper_loss=0.07088, over 15747.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.0103, ecapa_loss=0.0001386, whisper_loss=0.08918, over 3813610.69 frames. ], batch size: 61, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:06:21,694 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.01 vs. limit=15.0 2024-08-19 06:06:39,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4320090.0, ans=0.0 2024-08-19 06:06:48,290 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 24 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-19 06:06:48,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4320090.0, ans=0.125 2024-08-19 06:06:50,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4320090.0, ans=0.125 2024-08-19 06:07:08,662 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-19 06:07:13,364 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.747e+01 2.248e+01 2.499e+01 2.694e+01 3.637e+01, threshold=4.999e+01, percent-clipped=0.0 2024-08-19 06:07:16,862 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 20 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-19 06:07:18,571 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4320190.0, ans=0.125 2024-08-19 06:07:20,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4320290.0, ans=0.09899494936611666 2024-08-19 06:07:36,668 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.68 vs. limit=15.0 2024-08-19 06:07:43,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4320390.0, ans=0.125 2024-08-19 06:08:03,830 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 20 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-19 06:08:04,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4320390.0, ans=0.125 2024-08-19 06:08:09,160 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 1950, loss[loss=0.09756, beats_loss=0.0111, ecapa_loss=0.0001473, whisper_loss=0.08499, over 19455.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01034, ecapa_loss=0.0001374, whisper_loss=0.08891, over 3824814.50 frames. ], batch size: 81, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:08:17,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4320490.0, ans=0.0 2024-08-19 06:08:22,999 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.08 vs. limit=15.0 2024-08-19 06:08:51,329 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.38 vs. limit=22.5 2024-08-19 06:09:00,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4320690.0, ans=0.2 2024-08-19 06:10:07,811 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 2000, loss[loss=0.1278, beats_loss=0.008051, ecapa_loss=0.0001573, whisper_loss=0.1182, over 17678.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01028, ecapa_loss=0.0001377, whisper_loss=0.08949, over 3840875.27 frames. ], batch size: 70, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:10:08,794 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-08-19 06:10:17,168 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 29 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-19 06:10:35,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4321090.0, ans=0.125 2024-08-19 06:10:44,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=4321090.0, ans=0.95 2024-08-19 06:11:04,804 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.942e+01 2.279e+01 2.537e+01 2.810e+01 5.508e+01, threshold=5.074e+01, percent-clipped=1.0 2024-08-19 06:11:10,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4321290.0, ans=0.0 2024-08-19 06:11:10,994 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.296e+05 2024-08-19 06:11:17,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4321290.0, ans=0.0 2024-08-19 06:11:40,479 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 2050, loss[loss=0.1137, beats_loss=0.008672, ecapa_loss=0.0001297, whisper_loss=0.1037, over 19226.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01032, ecapa_loss=0.0001374, whisper_loss=0.08989, over 3853300.28 frames. ], batch size: 75, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:11:46,735 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 23 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-19 06:11:55,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4321590.0, ans=0.125 2024-08-19 06:11:57,613 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 25 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-19 06:12:02,415 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 28 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-19 06:12:06,386 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-19 06:12:09,908 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 28 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-19 06:12:15,567 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 32 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-19 06:12:35,537 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 12 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-19 06:12:43,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4321890.0, ans=0.125 2024-08-19 06:12:57,906 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 2100, loss[loss=0.1136, beats_loss=0.008534, ecapa_loss=0.0001189, whisper_loss=0.1039, over 16614.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01041, ecapa_loss=0.0001361, whisper_loss=0.0894, over 3815402.19 frames. ], batch size: 59, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:13:21,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4322090.0, ans=0.1 2024-08-19 06:13:29,832 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.22 vs. limit=15.0 2024-08-19 06:13:31,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4322190.0, ans=0.2 2024-08-19 06:13:39,215 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.726e+01 2.264e+01 2.459e+01 2.750e+01 5.104e+01, threshold=4.918e+01, percent-clipped=1.0 2024-08-19 06:13:57,763 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-19 06:14:13,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4322390.0, ans=0.0 2024-08-19 06:14:17,392 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 2150, loss[loss=0.1064, beats_loss=0.01054, ecapa_loss=0.0001323, whisper_loss=0.09454, over 22947.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01047, ecapa_loss=0.0001367, whisper_loss=0.08925, over 3802561.65 frames. ], batch size: 93, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:14:29,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4322490.0, ans=0.0 2024-08-19 06:14:50,995 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 17 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-19 06:14:54,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4322690.0, ans=0.0 2024-08-19 06:14:58,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4322690.0, ans=0.1 2024-08-19 06:15:15,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4322790.0, ans=0.1 2024-08-19 06:15:15,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4322790.0, ans=0.2 2024-08-19 06:15:38,170 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 2200, loss[loss=0.1063, beats_loss=0.01199, ecapa_loss=0.000139, whisper_loss=0.09294, over 22029.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01051, ecapa_loss=0.0001369, whisper_loss=0.0893, over 3815419.97 frames. ], batch size: 89, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:15:40,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4322990.0, ans=0.125 2024-08-19 06:16:12,304 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.91 vs. limit=15.0 2024-08-19 06:16:17,793 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.245e+01 2.459e+01 2.668e+01 3.758e+01, threshold=4.917e+01, percent-clipped=0.0 2024-08-19 06:16:39,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4323390.0, ans=0.035 2024-08-19 06:16:44,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=4323390.0, ans=0.05 2024-08-19 06:16:45,336 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-19 06:16:56,482 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 2250, loss[loss=0.1142, beats_loss=0.01061, ecapa_loss=0.0001429, whisper_loss=0.1022, over 19938.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01051, ecapa_loss=0.0001381, whisper_loss=0.09016, over 3840607.30 frames. ], batch size: 79, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:17:07,017 WARNING [optim.py:496] (0/4) Scaling gradients by 0.027778564020991325, model_norm_threshold=49.17220687866211 2024-08-19 06:17:07,181 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.0.norm.log_scale with proportion 0.24, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.536e+05, grad_sumsq=7.536e+05, orig_rms_sq=1.000e+00 2024-08-19 06:17:20,155 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4323590.0, ans=0.0 2024-08-19 06:17:23,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4323590.0, ans=0.125 2024-08-19 06:17:26,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4323690.0, ans=0.2 2024-08-19 06:17:30,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4323690.0, ans=0.1 2024-08-19 06:17:38,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4323690.0, ans=0.125 2024-08-19 06:17:38,644 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.25 vs. limit=10.0 2024-08-19 06:17:51,350 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-19 06:17:55,533 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-19 06:18:00,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4323890.0, ans=0.125 2024-08-19 06:18:05,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4323890.0, ans=0.125 2024-08-19 06:18:08,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4323890.0, ans=0.125 2024-08-19 06:18:09,644 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 28 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-19 06:18:14,583 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 2300, loss[loss=0.1095, beats_loss=0.009696, ecapa_loss=0.0001617, whisper_loss=0.09816, over 21119.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01054, ecapa_loss=0.0001385, whisper_loss=0.09082, over 3893709.96 frames. ], batch size: 84, lr: 2.05e-03, grad_scale: 1.152921504606847e+18 2024-08-19 06:18:16,119 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-19 06:18:20,335 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-19 06:18:20,950 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.17 vs. limit=15.0 2024-08-19 06:18:21,668 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 22 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-19 06:18:41,499 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 23 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-19 06:18:45,936 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 22 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-19 06:18:46,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4324190.0, ans=0.07 2024-08-19 06:18:53,067 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.282e+01 2.493e+01 2.821e+01 1.770e+03, threshold=4.986e+01, percent-clipped=1.0 2024-08-19 06:18:56,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4324190.0, ans=0.125 2024-08-19 06:18:58,118 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-19 06:19:11,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4324290.0, ans=0.125 2024-08-19 06:19:20,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4324390.0, ans=0.125 2024-08-19 06:19:20,130 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 06:19:31,977 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 2350, loss[loss=0.08975, beats_loss=0.008291, ecapa_loss=0.0001644, whisper_loss=0.07982, over 15326.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01049, ecapa_loss=0.00014, whisper_loss=0.09047, over 3818414.92 frames. ], batch size: 58, lr: 2.05e-03, grad_scale: 1.152921504606847e+18 2024-08-19 06:19:34,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4324490.0, ans=0.1 2024-08-19 06:19:44,894 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 28 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-19 06:19:46,456 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 12 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-19 06:19:52,769 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 20 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-19 06:19:52,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4324590.0, ans=0.015 2024-08-19 06:19:54,426 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-19 06:20:11,320 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-19 06:20:31,818 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 23 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-19 06:20:40,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4324890.0, ans=0.0 2024-08-19 06:20:49,906 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 2400, loss[loss=0.1374, beats_loss=0.008547, ecapa_loss=0.0001369, whisper_loss=0.1274, over 21629.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0104, ecapa_loss=0.0001405, whisper_loss=0.09071, over 3829094.29 frames. ], batch size: 81, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:21:07,054 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 25 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-19 06:21:08,693 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-19 06:21:11,406 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 16 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-19 06:21:12,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4325090.0, ans=0.125 2024-08-19 06:21:14,582 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 06:21:20,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4325190.0, ans=0.125 2024-08-19 06:21:30,497 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.266e+01 2.549e+01 2.810e+01 4.372e+01, threshold=5.098e+01, percent-clipped=0.0 2024-08-19 06:21:36,562 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 06:21:43,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4325290.0, ans=0.1 2024-08-19 06:21:44,932 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4325290.0, ans=0.125 2024-08-19 06:21:47,884 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-19 06:21:53,814 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-19 06:21:59,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4325390.0, ans=0.1 2024-08-19 06:22:06,951 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 2450, loss[loss=0.1063, beats_loss=0.009878, ecapa_loss=0.0001536, whisper_loss=0.09487, over 16497.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01037, ecapa_loss=0.0001406, whisper_loss=0.09068, over 3840013.38 frames. ], batch size: 66, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:22:16,040 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 24 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-19 06:22:23,918 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4325590.0, ans=0.2 2024-08-19 06:22:25,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4325590.0, ans=0.125 2024-08-19 06:22:26,973 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-19 06:22:46,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4325690.0, ans=0.1 2024-08-19 06:22:48,686 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 29 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-19 06:22:53,627 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-19 06:22:55,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4325790.0, ans=0.125 2024-08-19 06:23:13,503 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 32 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-19 06:23:21,369 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-19 06:23:25,117 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 2500, loss[loss=0.09465, beats_loss=0.01177, ecapa_loss=0.0001242, whisper_loss=0.08165, over 22598.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01041, ecapa_loss=0.0001405, whisper_loss=0.09081, over 3880617.58 frames. ], batch size: 94, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:23:28,013 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-19 06:23:30,935 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 22 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-19 06:23:31,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4325990.0, ans=0.0 2024-08-19 06:23:40,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4326090.0, ans=0.125 2024-08-19 06:23:43,624 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-19 06:23:45,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4326090.0, ans=0.125 2024-08-19 06:23:49,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4326090.0, ans=0.125 2024-08-19 06:24:01,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4326190.0, ans=0.125 2024-08-19 06:24:06,659 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.293e+01 2.574e+01 2.771e+01 4.931e+01, threshold=5.149e+01, percent-clipped=0.0 2024-08-19 06:24:07,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4326190.0, ans=0.0 2024-08-19 06:24:12,400 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-19 06:24:29,205 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 31 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-19 06:24:29,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4326390.0, ans=0.125 2024-08-19 06:24:45,745 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 2550, loss[loss=0.1078, beats_loss=0.009539, ecapa_loss=0.0001185, whisper_loss=0.09706, over 19073.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01034, ecapa_loss=0.0001413, whisper_loss=0.09076, over 3879581.25 frames. ], batch size: 74, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:24:52,233 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 30 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-19 06:25:16,992 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 21 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-19 06:25:17,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4326690.0, ans=0.125 2024-08-19 06:25:22,016 WARNING [optim.py:496] (0/4) Scaling gradients by 0.054495006799697876, model_norm_threshold=51.48568344116211 2024-08-19 06:25:22,181 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.10, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=9.141e+04, grad_sumsq=2.777e+04, orig_rms_sq=3.292e+00 2024-08-19 06:25:46,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4326790.0, ans=0.125 2024-08-19 06:25:53,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4326890.0, ans=0.07 2024-08-19 06:25:59,669 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 27 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-19 06:26:04,343 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 2600, loss[loss=0.1063, beats_loss=0.01067, ecapa_loss=0.0001323, whisper_loss=0.09433, over 21947.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01035, ecapa_loss=0.0001412, whisper_loss=0.09053, over 3872352.29 frames. ], batch size: 87, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:26:08,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4326990.0, ans=0.0 2024-08-19 06:26:14,404 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 11 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-19 06:26:17,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4326990.0, ans=0.125 2024-08-19 06:26:35,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4327190.0, ans=0.0 2024-08-19 06:26:35,418 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.58 vs. limit=15.0 2024-08-19 06:26:45,667 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.338e+01 2.555e+01 2.845e+01 9.448e+02, threshold=5.110e+01, percent-clipped=2.0 2024-08-19 06:26:47,572 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-19 06:26:52,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4327290.0, ans=0.0 2024-08-19 06:27:23,435 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 2650, loss[loss=0.1094, beats_loss=0.01008, ecapa_loss=0.0001656, whisper_loss=0.09762, over 14281.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01028, ecapa_loss=0.0001421, whisper_loss=0.09139, over 3881382.61 frames. ], batch size: 59, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:27:23,559 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-19 06:27:31,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4327490.0, ans=0.125 2024-08-19 06:27:34,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4327490.0, ans=0.09899494936611666 2024-08-19 06:28:29,184 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4327890.0, ans=0.0 2024-08-19 06:28:36,901 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 21 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-19 06:28:37,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4327890.0, ans=0.125 2024-08-19 06:28:41,964 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 2700, loss[loss=0.1339, beats_loss=0.008987, ecapa_loss=0.0001497, whisper_loss=0.1234, over 23110.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01028, ecapa_loss=0.0001412, whisper_loss=0.09108, over 3864327.54 frames. ], batch size: 90, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:28:44,409 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-19 06:29:13,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4328190.0, ans=0.0 2024-08-19 06:29:15,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4328190.0, ans=0.2 2024-08-19 06:29:21,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4328190.0, ans=0.0 2024-08-19 06:29:24,276 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.337e+01 2.538e+01 2.914e+01 3.709e+01, threshold=5.076e+01, percent-clipped=0.0 2024-08-19 06:29:48,085 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 29 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-19 06:30:00,771 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 2750, loss[loss=0.07917, beats_loss=0.01346, ecapa_loss=0.0001366, whisper_loss=0.06435, over 22295.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01025, ecapa_loss=0.0001409, whisper_loss=0.09098, over 3854941.91 frames. ], batch size: 95, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:30:57,371 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 24 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-19 06:31:20,972 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 2800, loss[loss=0.08878, beats_loss=0.008778, ecapa_loss=0.0001201, whisper_loss=0.0788, over 15316.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01027, ecapa_loss=0.0001396, whisper_loss=0.09118, over 3860074.05 frames. ], batch size: 55, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:31:26,459 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.91 vs. limit=15.0 2024-08-19 06:31:27,419 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-19 06:31:35,424 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 19 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-19 06:31:37,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4329090.0, ans=0.125 2024-08-19 06:31:46,022 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.25 vs. limit=5.0 2024-08-19 06:31:54,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4329190.0, ans=0.125 2024-08-19 06:32:03,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4329190.0, ans=0.07 2024-08-19 06:32:04,776 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.348e+01 2.575e+01 2.807e+01 4.733e+01, threshold=5.150e+01, percent-clipped=0.0 2024-08-19 06:32:06,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4329190.0, ans=0.0 2024-08-19 06:32:10,074 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.82 vs. limit=22.5 2024-08-19 06:32:41,436 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 2850, loss[loss=0.09524, beats_loss=0.00978, ecapa_loss=0.0001474, whisper_loss=0.08399, over 14924.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01028, ecapa_loss=0.0001401, whisper_loss=0.09164, over 3841132.35 frames. ], batch size: 58, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:33:35,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4329790.0, ans=0.0 2024-08-19 06:33:36,583 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 21 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-19 06:33:44,612 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 06:34:02,129 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 2900, loss[loss=0.09953, beats_loss=0.01027, ecapa_loss=0.0001125, whisper_loss=0.08814, over 18744.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0104, ecapa_loss=0.0001405, whisper_loss=0.09098, over 3860032.40 frames. ], batch size: 71, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:34:26,402 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.44 vs. limit=15.0 2024-08-19 06:34:46,027 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.702e+01 2.333e+01 2.520e+01 2.841e+01 3.701e+01, threshold=5.040e+01, percent-clipped=0.0 2024-08-19 06:34:47,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4330190.0, ans=0.125 2024-08-19 06:34:52,223 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 16 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-19 06:35:00,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4330290.0, ans=0.025 2024-08-19 06:35:21,641 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 2950, loss[loss=0.09566, beats_loss=0.01077, ecapa_loss=0.0001472, whisper_loss=0.08342, over 20110.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01053, ecapa_loss=0.0001417, whisper_loss=0.08988, over 3871680.94 frames. ], batch size: 82, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:35:23,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4330490.0, ans=0.07 2024-08-19 06:35:26,902 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.05 vs. limit=22.5 2024-08-19 06:35:29,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4330490.0, ans=0.125 2024-08-19 06:36:08,396 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 19 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-19 06:36:15,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4330790.0, ans=0.0 2024-08-19 06:36:16,154 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.85 vs. limit=15.0 2024-08-19 06:36:16,835 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 26 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-19 06:36:22,932 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-19 06:36:24,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4330790.0, ans=0.1 2024-08-19 06:36:28,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4330890.0, ans=0.0 2024-08-19 06:36:37,883 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-19 06:36:44,148 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 3000, loss[loss=0.09374, beats_loss=0.01031, ecapa_loss=0.0001109, whisper_loss=0.08232, over 19190.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0105, ecapa_loss=0.0001412, whisper_loss=0.09061, over 3897004.09 frames. ], batch size: 74, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:36:44,149 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-19 06:37:20,887 INFO [train_multi_KD3.py:1149] (0/4) Epoch 30, validation on ASR_libri: loss=0.2537, beats_loss=0, ecapa_loss=0.0005147, whisper_loss=0.2486, over 922467.00 frames. 2024-08-19 06:37:39,415 INFO [train_multi_KD3.py:1149] (0/4) Epoch 30, validation on SV_voxceleb1: loss=0.003985, beats_loss=0, ecapa_loss=0.0003985, whisper_loss=0, over 939242.00 frames. 2024-08-19 06:38:00,093 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.5884, 2.0722, 2.4886, 1.3886], device='cuda:0') 2024-08-19 06:39:27,089 INFO [train_multi_KD3.py:1149] (0/4) Epoch 30, validation on AT_audioset: loss=0.02305, beats_loss=0.02305, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 06:39:27,093 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32389MB 2024-08-19 06:39:50,201 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.85 vs. limit=15.0 2024-08-19 06:39:58,899 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.90 vs. limit=5.0 2024-08-19 06:40:09,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4331190.0, ans=0.0 2024-08-19 06:40:11,477 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.401e+01 2.628e+01 2.942e+01 3.563e+02, threshold=5.255e+01, percent-clipped=2.0 2024-08-19 06:40:21,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4331290.0, ans=0.125 2024-08-19 06:40:21,645 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.02 vs. limit=22.5 2024-08-19 06:40:39,176 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.75 vs. limit=22.5 2024-08-19 06:40:42,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4331390.0, ans=0.2 2024-08-19 06:40:47,477 WARNING [optim.py:496] (0/4) Scaling gradients by 0.05165092274546623, model_norm_threshold=52.55119705200195 2024-08-19 06:40:47,639 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.383e+05, grad_sumsq=1.383e+05, orig_rms_sq=1.000e+00 2024-08-19 06:40:53,229 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 3050, loss[loss=0.09243, beats_loss=0.01226, ecapa_loss=0.0001236, whisper_loss=0.07893, over 15955.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01053, ecapa_loss=0.0001423, whisper_loss=0.09099, over 3905396.72 frames. ], batch size: 62, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:40:53,402 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-19 06:41:05,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4331490.0, ans=0.2 2024-08-19 06:41:24,199 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 29 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-19 06:41:26,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4331690.0, ans=0.2 2024-08-19 06:42:18,075 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 3100, loss[loss=0.1281, beats_loss=0.009666, ecapa_loss=0.000145, whisper_loss=0.117, over 23529.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01053, ecapa_loss=0.0001419, whisper_loss=0.09103, over 3905387.06 frames. ], batch size: 90, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:42:18,903 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.42 vs. limit=12.0 2024-08-19 06:42:33,929 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 25 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-19 06:42:37,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4332090.0, ans=0.125 2024-08-19 06:43:01,790 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.302e+01 2.496e+01 2.802e+01 1.017e+03, threshold=4.993e+01, percent-clipped=2.0 2024-08-19 06:43:08,743 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 22 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-19 06:43:21,742 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-19 06:43:25,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4332390.0, ans=0.1 2024-08-19 06:43:25,855 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.70 vs. limit=10.0 2024-08-19 06:43:26,043 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2024-08-19 06:43:40,237 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 3150, loss[loss=0.1237, beats_loss=0.008987, ecapa_loss=0.0001599, whisper_loss=0.1131, over 21772.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01057, ecapa_loss=0.000142, whisper_loss=0.0909, over 3917022.64 frames. ], batch size: 86, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:43:44,444 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.86 vs. limit=15.0 2024-08-19 06:44:00,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4332590.0, ans=0.1 2024-08-19 06:44:15,041 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 26 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-19 06:44:15,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4332690.0, ans=0.0 2024-08-19 06:44:15,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4332690.0, ans=0.125 2024-08-19 06:44:18,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4332690.0, ans=0.0 2024-08-19 06:44:29,752 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.32 vs. limit=15.0 2024-08-19 06:44:35,552 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 26 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-19 06:44:45,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4332890.0, ans=0.125 2024-08-19 06:44:49,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4332890.0, ans=0.125 2024-08-19 06:44:58,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4332890.0, ans=0.09899494936611666 2024-08-19 06:45:00,966 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 3200, loss[loss=0.1059, beats_loss=0.008775, ecapa_loss=0.0001648, whisper_loss=0.09548, over 17992.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01056, ecapa_loss=0.000142, whisper_loss=0.09088, over 3905749.88 frames. ], batch size: 71, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:45:11,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4332990.0, ans=0.2 2024-08-19 06:45:14,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4332990.0, ans=0.0 2024-08-19 06:45:14,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4332990.0, ans=0.1 2024-08-19 06:45:21,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4333090.0, ans=0.2 2024-08-19 06:45:26,503 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 20 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-19 06:45:34,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4333190.0, ans=0.125 2024-08-19 06:45:42,803 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.260e+01 2.450e+01 2.731e+01 1.494e+02, threshold=4.900e+01, percent-clipped=1.0 2024-08-19 06:45:48,027 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.04 vs. limit=12.0 2024-08-19 06:46:02,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4333390.0, ans=0.2 2024-08-19 06:46:03,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4333390.0, ans=0.125 2024-08-19 06:46:13,390 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 25 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-19 06:46:19,764 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 3250, loss[loss=0.0948, beats_loss=0.01199, ecapa_loss=0.0001241, whisper_loss=0.08157, over 22591.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01054, ecapa_loss=0.0001424, whisper_loss=0.09097, over 3880834.48 frames. ], batch size: 90, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:47:08,241 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 25 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-19 06:47:26,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4333890.0, ans=0.0 2024-08-19 06:47:29,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4333890.0, ans=0.0 2024-08-19 06:47:38,042 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 3300, loss[loss=0.1046, beats_loss=0.01009, ecapa_loss=0.000124, whisper_loss=0.09331, over 23460.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0105, ecapa_loss=0.0001424, whisper_loss=0.09123, over 3890944.06 frames. ], batch size: 91, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:47:43,551 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 06:47:57,434 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.29 vs. limit=15.0 2024-08-19 06:47:59,108 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.72 vs. limit=15.0 2024-08-19 06:48:14,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4334190.0, ans=0.125 2024-08-19 06:48:18,347 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.382e+01 2.607e+01 2.876e+01 9.685e+01, threshold=5.214e+01, percent-clipped=1.0 2024-08-19 06:48:38,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4334390.0, ans=0.125 2024-08-19 06:48:52,053 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 06:48:52,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4334490.0, ans=0.04949747468305833 2024-08-19 06:48:52,873 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 3350, loss[loss=0.1311, beats_loss=0.007606, ecapa_loss=0.0001486, whisper_loss=0.122, over 16428.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01046, ecapa_loss=0.0001424, whisper_loss=0.09183, over 3904568.42 frames. ], batch size: 64, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:48:56,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4334490.0, ans=0.125 2024-08-19 06:49:02,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4334490.0, ans=0.0 2024-08-19 06:49:12,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4334590.0, ans=0.125 2024-08-19 06:49:15,823 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 13 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-19 06:49:23,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4334690.0, ans=0.125 2024-08-19 06:49:51,166 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4334890.0, ans=0.125 2024-08-19 06:49:55,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4334890.0, ans=0.2 2024-08-19 06:50:05,002 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 3400, loss[loss=0.08721, beats_loss=0.01267, ecapa_loss=0.0001133, whisper_loss=0.07341, over 18425.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01046, ecapa_loss=0.0001423, whisper_loss=0.09149, over 3924639.17 frames. ], batch size: 73, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:50:08,014 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 17 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-19 06:50:15,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4334990.0, ans=0.015 2024-08-19 06:50:43,283 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.739e+01 2.221e+01 2.426e+01 2.713e+01 1.019e+02, threshold=4.853e+01, percent-clipped=2.0 2024-08-19 06:51:13,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4335390.0, ans=0.125 2024-08-19 06:51:15,620 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 3450, loss[loss=0.09096, beats_loss=0.01053, ecapa_loss=0.0001271, whisper_loss=0.07916, over 19241.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01047, ecapa_loss=0.0001421, whisper_loss=0.09112, over 3906733.71 frames. ], batch size: 77, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:51:15,962 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4335490.0, ans=0.125 2024-08-19 06:51:15,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4335490.0, ans=0.2 2024-08-19 06:51:22,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4335490.0, ans=0.125 2024-08-19 06:51:23,376 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-19 06:51:25,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4335490.0, ans=0.125 2024-08-19 06:51:36,002 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 15 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-19 06:51:51,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4335690.0, ans=0.5 2024-08-19 06:51:58,710 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.18 vs. limit=15.0 2024-08-19 06:52:02,079 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-19 06:52:18,538 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 20 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-19 06:52:23,109 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 3500, loss[loss=0.1185, beats_loss=0.01024, ecapa_loss=0.0001793, whisper_loss=0.1064, over 20980.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01052, ecapa_loss=0.000142, whisper_loss=0.09107, over 3923944.07 frames. ], batch size: 89, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:52:24,406 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 25 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-19 06:52:24,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4335990.0, ans=0.95 2024-08-19 06:52:41,370 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 35 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-19 06:52:43,636 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-19 06:52:52,731 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-19 06:52:57,036 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.291e+01 2.555e+01 2.847e+01 3.911e+01, threshold=5.110e+01, percent-clipped=0.0 2024-08-19 06:53:03,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4336290.0, ans=0.125 2024-08-19 06:53:13,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4336390.0, ans=0.125 2024-08-19 06:53:22,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4336390.0, ans=0.125 2024-08-19 06:53:25,304 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 3550, loss[loss=0.08288, beats_loss=0.0127, ecapa_loss=0.0001179, whisper_loss=0.069, over 23072.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01051, ecapa_loss=0.0001416, whisper_loss=0.09118, over 3915574.79 frames. ], batch size: 95, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:53:35,924 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.07 vs. limit=22.5 2024-08-19 06:53:42,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4336590.0, ans=0.0 2024-08-19 06:53:58,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4336690.0, ans=0.0 2024-08-19 06:54:25,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4336890.0, ans=0.125 2024-08-19 06:54:27,095 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 3600, loss[loss=0.1093, beats_loss=0.01064, ecapa_loss=0.0001318, whisper_loss=0.09731, over 21443.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01049, ecapa_loss=0.0001417, whisper_loss=0.09112, over 3891000.20 frames. ], batch size: 87, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:54:29,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4336990.0, ans=0.1 2024-08-19 06:54:47,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4337090.0, ans=0.125 2024-08-19 06:55:00,627 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.265e+01 2.489e+01 2.858e+01 3.762e+01, threshold=4.979e+01, percent-clipped=0.0 2024-08-19 06:55:28,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4337490.0, ans=0.125 2024-08-19 06:55:29,293 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 3650, loss[loss=0.1029, beats_loss=0.01102, ecapa_loss=0.0001324, whisper_loss=0.09053, over 22492.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01043, ecapa_loss=0.0001424, whisper_loss=0.0906, over 3854492.06 frames. ], batch size: 90, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:55:35,414 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-19 06:55:40,176 WARNING [optim.py:496] (0/4) Scaling gradients by 0.09651452302932739, model_norm_threshold=49.788700103759766 2024-08-19 06:55:40,338 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.29, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.699e+04, grad_sumsq=7.699e+04, orig_rms_sq=1.000e+00 2024-08-19 06:55:43,196 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 13 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-19 06:55:50,843 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.02 vs. limit=12.0 2024-08-19 06:56:05,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4337690.0, ans=0.1 2024-08-19 06:56:12,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4337790.0, ans=0.125 2024-08-19 06:56:12,672 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.86 vs. limit=15.0 2024-08-19 06:56:17,442 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-19 06:56:32,394 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 3700, loss[loss=0.1149, beats_loss=0.009552, ecapa_loss=0.0001333, whisper_loss=0.104, over 20377.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01042, ecapa_loss=0.0001419, whisper_loss=0.09071, over 3822372.58 frames. ], batch size: 78, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:56:34,168 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.30 vs. limit=22.5 2024-08-19 06:56:35,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=4337990.0, ans=0.025 2024-08-19 06:56:40,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4337990.0, ans=0.125 2024-08-19 06:56:44,913 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 21 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-19 06:56:52,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4338090.0, ans=0.125 2024-08-19 06:56:53,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4338090.0, ans=0.2 2024-08-19 06:56:59,437 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 21 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-19 06:57:02,117 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-19 06:57:05,626 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.332e+01 2.586e+01 3.000e+01 5.159e+02, threshold=5.172e+01, percent-clipped=5.0 2024-08-19 06:57:13,143 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 22 from LS+wenet, 28 from Vox, 22 fro AS 2024-08-19 06:57:13,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4338290.0, ans=0.125 2024-08-19 06:57:15,166 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.21 vs. limit=15.0 2024-08-19 06:57:26,948 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-19 06:57:29,637 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 06:57:34,160 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 3750, loss[loss=0.08353, beats_loss=0.009706, ecapa_loss=0.0001488, whisper_loss=0.07234, over 17851.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01049, ecapa_loss=0.0001424, whisper_loss=0.09041, over 3824802.44 frames. ], batch size: 73, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:57:46,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4338590.0, ans=0.2 2024-08-19 06:57:59,895 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.06 vs. limit=15.0 2024-08-19 06:58:00,526 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 42 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-19 06:58:05,947 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.77 vs. limit=15.0 2024-08-19 06:58:19,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4338790.0, ans=0.2 2024-08-19 06:58:26,912 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.52 vs. limit=22.5 2024-08-19 06:58:27,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4338890.0, ans=0.125 2024-08-19 06:58:35,917 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 3800, loss[loss=0.08966, beats_loss=0.01245, ecapa_loss=0.0001029, whisper_loss=0.07618, over 22733.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01044, ecapa_loss=0.0001438, whisper_loss=0.09049, over 3850784.73 frames. ], batch size: 88, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:58:49,474 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-19 06:58:50,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4339090.0, ans=0.035 2024-08-19 06:58:57,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4339090.0, ans=0.125 2024-08-19 06:59:07,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4339190.0, ans=0.125 2024-08-19 06:59:09,296 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.974e+01 2.289e+01 2.536e+01 2.898e+01 5.473e+01, threshold=5.073e+01, percent-clipped=1.0 2024-08-19 06:59:18,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4339290.0, ans=0.2 2024-08-19 06:59:24,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4339390.0, ans=0.125 2024-08-19 06:59:24,722 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.70 vs. limit=15.0 2024-08-19 06:59:37,789 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 3850, loss[loss=0.09738, beats_loss=0.01009, ecapa_loss=0.0001584, whisper_loss=0.08571, over 16328.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01045, ecapa_loss=0.000144, whisper_loss=0.09021, over 3868114.49 frames. ], batch size: 67, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:59:40,654 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 19 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-19 06:59:41,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4339490.0, ans=0.125 2024-08-19 06:59:44,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4339490.0, ans=0.0 2024-08-19 06:59:53,459 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=15.0 2024-08-19 07:00:02,582 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 16 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-19 07:00:06,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4339690.0, ans=0.125 2024-08-19 07:00:15,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=4339790.0, ans=10.0 2024-08-19 07:00:20,810 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 28 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-19 07:00:24,374 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-19 07:00:24,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4339790.0, ans=0.125 2024-08-19 07:00:27,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4339890.0, ans=0.125 2024-08-19 07:00:39,510 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 3900, loss[loss=0.1006, beats_loss=0.01031, ecapa_loss=0.0001467, whisper_loss=0.08885, over 22721.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01047, ecapa_loss=0.0001442, whisper_loss=0.09045, over 3864538.88 frames. ], batch size: 93, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:00:44,626 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 07:00:53,033 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-19 07:00:55,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4340090.0, ans=0.0 2024-08-19 07:01:08,071 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 11 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-19 07:01:13,028 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.932e+01 2.284e+01 2.480e+01 2.767e+01 3.650e+01, threshold=4.959e+01, percent-clipped=0.0 2024-08-19 07:01:25,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4340290.0, ans=0.0 2024-08-19 07:01:41,229 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 3950, loss[loss=0.1046, beats_loss=0.01117, ecapa_loss=0.0001401, whisper_loss=0.09206, over 17706.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01052, ecapa_loss=0.0001435, whisper_loss=0.09065, over 3885041.95 frames. ], batch size: 71, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:01:41,338 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 16 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-19 07:01:41,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4340490.0, ans=0.0 2024-08-19 07:01:43,884 WARNING [optim.py:496] (0/4) Scaling gradients by 0.08397600054740906, model_norm_threshold=49.59146499633789 2024-08-19 07:01:44,047 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.26, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=9.206e+04, grad_sumsq=8.842e+06, orig_rms_sq=1.041e-02 2024-08-19 07:01:48,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4340490.0, ans=0.0 2024-08-19 07:01:49,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4340490.0, ans=0.125 2024-08-19 07:01:50,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4340490.0, ans=0.0 2024-08-19 07:01:53,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4340590.0, ans=0.0 2024-08-19 07:02:03,678 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-19 07:02:05,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4340690.0, ans=0.1 2024-08-19 07:02:17,340 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-19 07:02:23,881 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 25 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-19 07:02:27,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4340790.0, ans=0.0 2024-08-19 07:02:30,765 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.89 vs. limit=15.0 2024-08-19 07:02:35,650 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.70 vs. limit=6.0 2024-08-19 07:02:43,738 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 4000, loss[loss=0.1054, beats_loss=0.01116, ecapa_loss=0.0001058, whisper_loss=0.09322, over 23754.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01046, ecapa_loss=0.0001438, whisper_loss=0.0906, over 3910013.35 frames. ], batch size: 93, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:02:54,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4341090.0, ans=0.125 2024-08-19 07:02:56,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4341090.0, ans=0.0 2024-08-19 07:03:00,535 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.78 vs. limit=10.0 2024-08-19 07:03:13,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4341190.0, ans=0.1 2024-08-19 07:03:17,339 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.399e+01 2.689e+01 3.054e+01 5.905e+02, threshold=5.377e+01, percent-clipped=2.0 2024-08-19 07:03:21,359 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-19 07:03:43,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4341390.0, ans=0.125 2024-08-19 07:03:44,971 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.01 vs. limit=15.0 2024-08-19 07:03:46,330 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 4050, loss[loss=0.1096, beats_loss=0.01177, ecapa_loss=0.0001227, whisper_loss=0.09657, over 23463.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01043, ecapa_loss=0.0001449, whisper_loss=0.09036, over 3902499.79 frames. ], batch size: 91, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:03:50,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4341490.0, ans=0.2 2024-08-19 07:04:12,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4341690.0, ans=0.125 2024-08-19 07:04:16,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4341690.0, ans=0.125 2024-08-19 07:04:27,947 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.49 vs. limit=12.0 2024-08-19 07:04:29,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4341790.0, ans=0.0 2024-08-19 07:04:32,940 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.929e+01 2024-08-19 07:04:36,493 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-19 07:04:48,725 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 4100, loss[loss=0.1032, beats_loss=0.01359, ecapa_loss=0.0001012, whisper_loss=0.08863, over 18666.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01048, ecapa_loss=0.0001457, whisper_loss=0.09, over 3906326.41 frames. ], batch size: 73, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:04:52,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4341990.0, ans=0.125 2024-08-19 07:05:14,509 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 19 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-19 07:05:21,935 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.730e+01 2.245e+01 2.633e+01 2.897e+01 5.694e+01, threshold=5.267e+01, percent-clipped=1.0 2024-08-19 07:05:24,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4342290.0, ans=0.0 2024-08-19 07:05:30,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4342290.0, ans=0.025 2024-08-19 07:05:31,045 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-19 07:05:32,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4342290.0, ans=0.125 2024-08-19 07:05:39,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4342390.0, ans=0.0 2024-08-19 07:05:51,372 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 4150, loss[loss=0.0981, beats_loss=0.008789, ecapa_loss=0.0001835, whisper_loss=0.08748, over 16731.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01047, ecapa_loss=0.0001454, whisper_loss=0.08999, over 3908779.19 frames. ], batch size: 67, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:05:51,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4342490.0, ans=0.0 2024-08-19 07:05:52,368 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.05 vs. limit=6.0 2024-08-19 07:05:58,081 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-19 07:06:04,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4342590.0, ans=0.125 2024-08-19 07:06:04,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4342590.0, ans=0.1 2024-08-19 07:06:10,476 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-19 07:06:11,796 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-19 07:06:22,809 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-19 07:06:30,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4342790.0, ans=0.125 2024-08-19 07:06:31,837 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 20 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-19 07:06:54,598 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 4200, loss[loss=0.1074, beats_loss=0.01001, ecapa_loss=0.0001328, whisper_loss=0.0961, over 15636.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01045, ecapa_loss=0.0001461, whisper_loss=0.0899, over 3884509.15 frames. ], batch size: 60, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:06:57,782 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.79 vs. limit=6.0 2024-08-19 07:07:02,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4342990.0, ans=0.125 2024-08-19 07:07:08,576 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-19 07:07:15,056 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-19 07:07:20,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4343190.0, ans=10.0 2024-08-19 07:07:20,423 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.59 vs. limit=15.0 2024-08-19 07:07:28,653 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.261e+01 2.591e+01 2.854e+01 3.492e+01, threshold=5.183e+01, percent-clipped=0.0 2024-08-19 07:07:56,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4343490.0, ans=0.1 2024-08-19 07:07:57,617 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 4250, loss[loss=0.08439, beats_loss=0.01146, ecapa_loss=0.0001521, whisper_loss=0.0714, over 13245.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01048, ecapa_loss=0.0001441, whisper_loss=0.0895, over 3880941.77 frames. ], batch size: 54, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:08:01,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4343490.0, ans=0.125 2024-08-19 07:08:02,769 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 20 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-19 07:08:06,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4343490.0, ans=0.125 2024-08-19 07:08:14,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4343590.0, ans=0.125 2024-08-19 07:08:15,216 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-19 07:08:20,337 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 07:08:21,212 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 15 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-19 07:08:30,078 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-19 07:08:36,192 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 14 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-19 07:08:41,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4343790.0, ans=0.125 2024-08-19 07:08:59,594 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 4300, loss[loss=0.1195, beats_loss=0.009074, ecapa_loss=0.0001916, whisper_loss=0.1085, over 21931.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01051, ecapa_loss=0.0001433, whisper_loss=0.08958, over 3860724.38 frames. ], batch size: 93, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:09:02,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4343990.0, ans=0.125 2024-08-19 07:09:04,931 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-19 07:09:09,943 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 33 from Vox, 26 fro AS 2024-08-19 07:09:17,303 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 22 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-19 07:09:19,709 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-19 07:09:33,367 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.222e+01 2.440e+01 2.788e+01 3.909e+01, threshold=4.880e+01, percent-clipped=0.0 2024-08-19 07:09:40,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4344290.0, ans=0.0 2024-08-19 07:09:44,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=4344290.0, ans=10.0 2024-08-19 07:09:44,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4344290.0, ans=0.125 2024-08-19 07:10:01,976 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 4350, loss[loss=0.08585, beats_loss=0.0119, ecapa_loss=0.0001195, whisper_loss=0.07276, over 22824.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01055, ecapa_loss=0.0001442, whisper_loss=0.0889, over 3850193.71 frames. ], batch size: 93, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:10:08,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4344490.0, ans=0.125 2024-08-19 07:10:08,920 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.48 vs. limit=15.0 2024-08-19 07:10:09,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=4344490.0, ans=0.95 2024-08-19 07:10:13,316 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 22 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-19 07:10:24,742 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 26 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-19 07:10:42,616 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.196e+01 2024-08-19 07:11:01,218 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 16 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-19 07:11:05,010 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 4400, loss[loss=0.104, beats_loss=0.01133, ecapa_loss=0.0001121, whisper_loss=0.09157, over 23173.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01053, ecapa_loss=0.000144, whisper_loss=0.08874, over 3814893.78 frames. ], batch size: 89, lr: 2.05e-03, grad_scale: 1.152921504606847e+18 2024-08-19 07:11:12,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4344990.0, ans=0.0 2024-08-19 07:11:15,985 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 24 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-19 07:11:16,594 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.28 vs. limit=15.0 2024-08-19 07:11:18,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4345090.0, ans=0.125 2024-08-19 07:11:25,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4345090.0, ans=0.2 2024-08-19 07:11:32,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4345190.0, ans=0.125 2024-08-19 07:11:32,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4345190.0, ans=0.125 2024-08-19 07:11:39,289 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.205e+01 2.466e+01 2.774e+01 4.446e+01, threshold=4.932e+01, percent-clipped=0.0 2024-08-19 07:12:02,623 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.15 vs. limit=15.0 2024-08-19 07:12:05,038 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.24 vs. limit=15.0 2024-08-19 07:12:06,638 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 4450, loss[loss=0.1271, beats_loss=0.008669, ecapa_loss=0.0001457, whisper_loss=0.1169, over 23172.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01044, ecapa_loss=0.0001434, whisper_loss=0.08992, over 3834827.05 frames. ], batch size: 91, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:12:16,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=4345490.0, ans=10.0 2024-08-19 07:12:21,231 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.12 vs. limit=15.0 2024-08-19 07:12:45,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4345790.0, ans=0.125 2024-08-19 07:13:04,273 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.75 vs. limit=15.0 2024-08-19 07:13:06,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4345890.0, ans=0.125 2024-08-19 07:13:10,038 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 4500, loss[loss=0.1155, beats_loss=0.008438, ecapa_loss=0.0001748, whisper_loss=0.1053, over 22838.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01048, ecapa_loss=0.0001438, whisper_loss=0.08931, over 3845141.21 frames. ], batch size: 91, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:13:21,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4346090.0, ans=0.0 2024-08-19 07:13:30,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4346090.0, ans=0.0 2024-08-19 07:13:31,984 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.40 vs. limit=10.0 2024-08-19 07:13:44,261 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-19 07:13:45,334 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.773e+01 2.205e+01 2.468e+01 2.775e+01 4.472e+01, threshold=4.935e+01, percent-clipped=0.0 2024-08-19 07:13:54,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4346290.0, ans=0.2 2024-08-19 07:13:55,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4346290.0, ans=0.125 2024-08-19 07:13:57,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4346290.0, ans=0.0 2024-08-19 07:14:13,005 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 4550, loss[loss=0.106, beats_loss=0.01001, ecapa_loss=0.0001387, whisper_loss=0.09457, over 21648.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01047, ecapa_loss=0.0001437, whisper_loss=0.08948, over 3839346.97 frames. ], batch size: 88, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:14:18,280 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 20 from LS+wenet, 10 from Vox, 25 fro AS 2024-08-19 07:14:18,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4346490.0, ans=0.125 2024-08-19 07:14:21,406 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.05 vs. limit=15.0 2024-08-19 07:14:31,101 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2024-08-19 07:14:37,911 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-19 07:14:43,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4346690.0, ans=0.2 2024-08-19 07:14:46,818 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 24 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-19 07:14:53,117 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-19 07:15:07,949 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 29 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-19 07:15:13,104 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 16 from LS+wenet, 9 from Vox, 34 fro AS 2024-08-19 07:15:13,768 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.15 vs. limit=22.5 2024-08-19 07:15:15,500 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 4600, loss[loss=0.0904, beats_loss=0.009741, ecapa_loss=0.0001116, whisper_loss=0.07954, over 19478.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01057, ecapa_loss=0.0001433, whisper_loss=0.08887, over 3826806.56 frames. ], batch size: 74, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:15:30,578 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-19 07:15:30,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4347090.0, ans=0.125 2024-08-19 07:15:32,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4347090.0, ans=0.125 2024-08-19 07:15:33,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4347090.0, ans=0.1 2024-08-19 07:15:43,091 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 17 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-19 07:15:50,492 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.976e+01 2.323e+01 2.576e+01 2.927e+01 9.021e+01, threshold=5.152e+01, percent-clipped=3.0 2024-08-19 07:15:51,990 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-19 07:16:04,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4347390.0, ans=0.0 2024-08-19 07:16:07,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4347390.0, ans=0.1 2024-08-19 07:16:12,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4347390.0, ans=0.07 2024-08-19 07:16:16,107 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.39 vs. limit=15.0 2024-08-19 07:16:17,703 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 4650, loss[loss=0.1017, beats_loss=0.01019, ecapa_loss=0.0001599, whisper_loss=0.08991, over 21710.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01059, ecapa_loss=0.0001437, whisper_loss=0.08884, over 3842357.21 frames. ], batch size: 91, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:16:17,825 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-19 07:16:28,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4347490.0, ans=0.125 2024-08-19 07:16:30,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4347590.0, ans=0.2 2024-08-19 07:16:38,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4347590.0, ans=0.125 2024-08-19 07:16:45,277 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 23 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-19 07:16:48,500 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.26 vs. limit=15.0 2024-08-19 07:16:49,929 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.71 vs. limit=15.0 2024-08-19 07:16:55,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=4347790.0, ans=0.5 2024-08-19 07:16:56,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4347790.0, ans=0.125 2024-08-19 07:17:02,668 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-19 07:17:06,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4347890.0, ans=0.0 2024-08-19 07:17:07,455 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-19 07:17:11,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4347890.0, ans=0.0 2024-08-19 07:17:14,343 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.19 vs. limit=10.0 2024-08-19 07:17:19,821 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 4700, loss[loss=0.1102, beats_loss=0.01124, ecapa_loss=0.0001345, whisper_loss=0.09757, over 22697.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01055, ecapa_loss=0.0001429, whisper_loss=0.0897, over 3871162.15 frames. ], batch size: 89, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:17:29,996 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 25 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-19 07:17:31,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4348090.0, ans=0.0 2024-08-19 07:17:35,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4348090.0, ans=0.125 2024-08-19 07:17:43,993 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.50 vs. limit=10.0 2024-08-19 07:17:49,308 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.81 vs. limit=12.0 2024-08-19 07:17:49,334 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.92 vs. limit=22.5 2024-08-19 07:17:51,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4348190.0, ans=0.2 2024-08-19 07:17:54,430 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.376e+01 2.578e+01 2.929e+01 1.160e+02, threshold=5.156e+01, percent-clipped=1.0 2024-08-19 07:17:58,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4348290.0, ans=0.0 2024-08-19 07:17:59,773 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 29 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-19 07:18:08,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4348390.0, ans=0.1 2024-08-19 07:18:11,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4348390.0, ans=0.125 2024-08-19 07:18:21,948 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 4750, loss[loss=0.08856, beats_loss=0.01181, ecapa_loss=0.0001485, whisper_loss=0.07526, over 18703.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01048, ecapa_loss=0.0001434, whisper_loss=0.09016, over 3845137.98 frames. ], batch size: 77, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:18:35,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4348590.0, ans=0.125 2024-08-19 07:18:43,406 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 17 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-19 07:18:49,633 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 35 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-19 07:18:51,114 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4348690.0, ans=10.0 2024-08-19 07:18:52,037 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-19 07:19:16,691 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 22 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-19 07:19:20,862 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.50 vs. limit=15.0 2024-08-19 07:19:23,671 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 4800, loss[loss=0.1002, beats_loss=0.01176, ecapa_loss=0.0001309, whisper_loss=0.08714, over 16344.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01048, ecapa_loss=0.0001433, whisper_loss=0.09056, over 3845423.15 frames. ], batch size: 67, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:19:24,276 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.82 vs. limit=15.0 2024-08-19 07:19:42,353 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.26 vs. limit=22.5 2024-08-19 07:19:47,594 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 27 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-19 07:19:51,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4349190.0, ans=0.2 2024-08-19 07:19:51,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4349190.0, ans=0.125 2024-08-19 07:19:56,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4349190.0, ans=0.0 2024-08-19 07:19:58,676 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.754e+01 2.397e+01 2.595e+01 2.947e+01 3.968e+01, threshold=5.190e+01, percent-clipped=1.0 2024-08-19 07:19:59,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4349190.0, ans=0.125 2024-08-19 07:20:08,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4349290.0, ans=0.125 2024-08-19 07:20:16,855 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 19 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-19 07:20:25,320 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 22 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-19 07:20:26,498 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 4850, loss[loss=0.08977, beats_loss=0.01106, ecapa_loss=0.0001334, whisper_loss=0.07737, over 19886.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01053, ecapa_loss=0.0001436, whisper_loss=0.08965, over 3865423.18 frames. ], batch size: 82, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:20:35,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4349490.0, ans=0.125 2024-08-19 07:20:39,955 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.19 vs. limit=10.0 2024-08-19 07:21:30,616 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 4900, loss[loss=0.1187, beats_loss=0.01068, ecapa_loss=0.0001507, whisper_loss=0.1065, over 23051.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01049, ecapa_loss=0.0001431, whisper_loss=0.09034, over 3863556.31 frames. ], batch size: 92, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:21:43,904 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.43 vs. limit=22.5 2024-08-19 07:21:50,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4350090.0, ans=0.125 2024-08-19 07:21:54,093 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.42 vs. limit=15.0 2024-08-19 07:21:54,109 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.79 vs. limit=22.5 2024-08-19 07:21:55,034 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-19 07:21:58,247 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.69 vs. limit=15.0 2024-08-19 07:22:06,630 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.310e+01 2.480e+01 2.749e+01 3.874e+01, threshold=4.961e+01, percent-clipped=0.0 2024-08-19 07:22:15,291 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.35 vs. limit=22.5 2024-08-19 07:22:18,615 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 20 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-19 07:22:18,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4350290.0, ans=0.125 2024-08-19 07:22:19,846 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-19 07:22:20,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4350290.0, ans=0.125 2024-08-19 07:22:32,963 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-19 07:22:35,134 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 4950, loss[loss=0.09502, beats_loss=0.01225, ecapa_loss=0.0001223, whisper_loss=0.08155, over 16396.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01051, ecapa_loss=0.0001425, whisper_loss=0.08998, over 3850918.50 frames. ], batch size: 65, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:22:36,639 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-19 07:22:51,404 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.75 vs. limit=15.0 2024-08-19 07:22:56,703 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.21 vs. limit=15.0 2024-08-19 07:22:57,816 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-19 07:23:03,767 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.55 vs. limit=15.0 2024-08-19 07:23:15,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4350790.0, ans=0.125 2024-08-19 07:23:29,074 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-19 07:23:38,718 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.26 vs. limit=15.0 2024-08-19 07:23:41,460 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 5000, loss[loss=0.1045, beats_loss=0.009384, ecapa_loss=0.0001326, whisper_loss=0.09381, over 19802.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01058, ecapa_loss=0.0001421, whisper_loss=0.09011, over 3897769.01 frames. ], batch size: 78, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:23:58,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4351090.0, ans=0.09899494936611666 2024-08-19 07:24:01,660 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-19 07:24:09,090 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 14 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-19 07:24:11,873 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-19 07:24:18,092 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.749e+01 2.255e+01 2.539e+01 2.731e+01 4.622e+01, threshold=5.077e+01, percent-clipped=0.0 2024-08-19 07:24:37,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4351390.0, ans=0.0 2024-08-19 07:24:40,015 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 29 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-19 07:24:48,470 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 5050, loss[loss=0.09426, beats_loss=0.01003, ecapa_loss=0.000136, whisper_loss=0.08287, over 14653.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01063, ecapa_loss=0.0001419, whisper_loss=0.08989, over 3889532.02 frames. ], batch size: 56, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:24:51,094 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 28 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-19 07:24:51,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4351490.0, ans=0.2 2024-08-19 07:24:53,352 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.82 vs. limit=8.0 2024-08-19 07:25:15,844 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 29 from LS+wenet, 12 from Vox, 40 fro AS 2024-08-19 07:25:25,577 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-19 07:25:37,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4351790.0, ans=0.1 2024-08-19 07:25:40,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4351790.0, ans=0.1 2024-08-19 07:25:55,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4351890.0, ans=0.1 2024-08-19 07:25:56,048 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.35 vs. limit=15.0 2024-08-19 07:25:58,126 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 5100, loss[loss=0.1195, beats_loss=0.008937, ecapa_loss=0.0001459, whisper_loss=0.1091, over 20160.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01059, ecapa_loss=0.0001425, whisper_loss=0.09018, over 3879646.76 frames. ], batch size: 77, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:25:58,268 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 26 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-19 07:26:33,202 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.383e+01 2.590e+01 2.899e+01 2.370e+02, threshold=5.180e+01, percent-clipped=1.0 2024-08-19 07:26:33,841 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.91 vs. limit=15.0 2024-08-19 07:26:37,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4352290.0, ans=0.125 2024-08-19 07:26:43,522 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-19 07:26:51,959 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.65 vs. limit=15.0 2024-08-19 07:27:01,060 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 5150, loss[loss=0.09779, beats_loss=0.01157, ecapa_loss=0.0001231, whisper_loss=0.08498, over 22844.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01057, ecapa_loss=0.0001418, whisper_loss=0.09092, over 3901636.84 frames. ], batch size: 90, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:27:05,972 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 29 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-19 07:27:07,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4352490.0, ans=0.125 2024-08-19 07:27:10,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4352490.0, ans=0.025 2024-08-19 07:27:12,664 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.806e+05 2024-08-19 07:27:26,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=4352690.0, ans=0.05 2024-08-19 07:27:29,752 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 16 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-19 07:27:32,232 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-19 07:27:46,855 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-19 07:27:52,545 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.82 vs. limit=10.0 2024-08-19 07:27:57,836 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.91 vs. limit=6.0 2024-08-19 07:28:01,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4352890.0, ans=0.125 2024-08-19 07:28:03,162 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 5200, loss[loss=0.1304, beats_loss=0.008919, ecapa_loss=0.000152, whisper_loss=0.12, over 19385.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01056, ecapa_loss=0.0001418, whisper_loss=0.09072, over 3877747.78 frames. ], batch size: 76, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:28:08,856 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.98 vs. limit=10.0 2024-08-19 07:28:09,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4352990.0, ans=0.0 2024-08-19 07:28:10,804 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 21 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-19 07:28:13,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4352990.0, ans=0.125 2024-08-19 07:28:22,166 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 23 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-19 07:28:23,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4353090.0, ans=0.125 2024-08-19 07:28:25,902 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 25 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-19 07:28:32,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4353190.0, ans=0.125 2024-08-19 07:28:38,982 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.425e+01 2.708e+01 3.005e+01 4.495e+01, threshold=5.416e+01, percent-clipped=0.0 2024-08-19 07:28:55,181 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.22 vs. limit=22.5 2024-08-19 07:29:00,603 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 28 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-19 07:29:06,546 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 5250, loss[loss=0.08795, beats_loss=0.01257, ecapa_loss=0.0001395, whisper_loss=0.07398, over 21218.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01045, ecapa_loss=0.0001413, whisper_loss=0.09098, over 3886363.93 frames. ], batch size: 90, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:29:41,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4353690.0, ans=0.125 2024-08-19 07:30:05,005 WARNING [optim.py:496] (0/4) Scaling gradients by 0.035648033022880554, model_norm_threshold=54.15937423706055 2024-08-19 07:30:05,169 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.2.out_combiner.bypass_scale with proportion 0.11, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.460e+05, grad_sumsq=4.288e+05, orig_rms_sq=5.737e-01 2024-08-19 07:30:08,983 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 5300, loss[loss=0.09794, beats_loss=0.0101, ecapa_loss=0.0001401, whisper_loss=0.08645, over 21929.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01046, ecapa_loss=0.0001423, whisper_loss=0.09034, over 3883385.70 frames. ], batch size: 89, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:30:09,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4353990.0, ans=0.125 2024-08-19 07:30:12,959 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-19 07:30:15,359 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-19 07:30:26,179 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 16 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-19 07:30:36,796 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.47 vs. limit=6.0 2024-08-19 07:30:40,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4354190.0, ans=0.0 2024-08-19 07:30:43,235 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.382e+01 2.697e+01 3.037e+01 1.519e+03, threshold=5.395e+01, percent-clipped=1.0 2024-08-19 07:30:51,120 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-19 07:31:00,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=4354390.0, ans=0.1 2024-08-19 07:31:05,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4354390.0, ans=0.1 2024-08-19 07:31:11,094 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 5350, loss[loss=0.09521, beats_loss=0.01075, ecapa_loss=0.0001341, whisper_loss=0.08312, over 15769.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01052, ecapa_loss=0.0001422, whisper_loss=0.08942, over 3865644.98 frames. ], batch size: 64, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:31:44,053 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 23 from LS+wenet, 29 from Vox, 27 fro AS 2024-08-19 07:31:58,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4354790.0, ans=0.2 2024-08-19 07:32:07,954 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 29 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-19 07:32:08,155 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4354890.0, ans=0.0 2024-08-19 07:32:14,142 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 5400, loss[loss=0.07141, beats_loss=0.01239, ecapa_loss=0.0001334, whisper_loss=0.05769, over 18063.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01044, ecapa_loss=0.0001425, whisper_loss=0.08934, over 3868244.53 frames. ], batch size: 74, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:32:18,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4354990.0, ans=0.125 2024-08-19 07:32:23,007 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 14 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-19 07:32:24,208 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 29 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-19 07:32:33,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4355090.0, ans=0.125 2024-08-19 07:32:49,115 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.281e+01 2.508e+01 2.776e+01 3.569e+02, threshold=5.016e+01, percent-clipped=2.0 2024-08-19 07:32:55,484 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 25 from Vox, 19 fro AS 2024-08-19 07:33:16,259 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 5450, loss[loss=0.09554, beats_loss=0.009868, ecapa_loss=0.000116, whisper_loss=0.08451, over 16210.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01041, ecapa_loss=0.0001421, whisper_loss=0.08943, over 3804401.77 frames. ], batch size: 62, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:33:23,150 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4355490.0, ans=0.1 2024-08-19 07:33:23,476 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.92 vs. limit=15.0 2024-08-19 07:33:32,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4355590.0, ans=0.95 2024-08-19 07:33:34,768 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.78 vs. limit=15.0 2024-08-19 07:33:39,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4355590.0, ans=0.1 2024-08-19 07:33:40,758 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 16 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-19 07:33:41,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4355690.0, ans=0.125 2024-08-19 07:33:46,341 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.41 vs. limit=22.5 2024-08-19 07:33:56,076 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.10 vs. limit=10.0 2024-08-19 07:33:56,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4355790.0, ans=0.1 2024-08-19 07:33:59,294 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-19 07:34:04,273 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-19 07:34:07,831 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-19 07:34:12,873 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 25 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-19 07:34:19,404 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 5500, loss[loss=0.1097, beats_loss=0.008803, ecapa_loss=0.0001314, whisper_loss=0.09963, over 17014.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01038, ecapa_loss=0.0001417, whisper_loss=0.08999, over 3817741.53 frames. ], batch size: 65, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:34:19,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4355990.0, ans=0.1 2024-08-19 07:34:20,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4355990.0, ans=0.125 2024-08-19 07:34:23,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4355990.0, ans=0.0 2024-08-19 07:34:24,826 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.268e+01 2024-08-19 07:34:48,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4356190.0, ans=0.2 2024-08-19 07:34:53,917 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.332e+01 2.514e+01 2.807e+01 3.996e+01, threshold=5.028e+01, percent-clipped=0.0 2024-08-19 07:35:03,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4356290.0, ans=0.0 2024-08-19 07:35:14,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4356390.0, ans=0.2 2024-08-19 07:35:15,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4356390.0, ans=0.125 2024-08-19 07:35:21,823 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 5550, loss[loss=0.1194, beats_loss=0.009415, ecapa_loss=0.0001324, whisper_loss=0.1087, over 22394.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01046, ecapa_loss=0.0001424, whisper_loss=0.08974, over 3815354.19 frames. ], batch size: 89, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:35:33,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4356590.0, ans=0.0 2024-08-19 07:35:42,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4356590.0, ans=0.125 2024-08-19 07:35:43,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4356590.0, ans=0.125 2024-08-19 07:35:49,307 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2024-08-19 07:35:50,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4356690.0, ans=0.07 2024-08-19 07:35:53,207 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.90 vs. limit=22.5 2024-08-19 07:36:23,778 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 5600, loss[loss=0.08887, beats_loss=0.01118, ecapa_loss=0.0001542, whisper_loss=0.07616, over 16538.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01044, ecapa_loss=0.0001423, whisper_loss=0.0899, over 3852976.66 frames. ], batch size: 68, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:36:31,287 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-19 07:36:56,840 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.79 vs. limit=15.0 2024-08-19 07:36:58,537 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.958e+01 2.345e+01 2.547e+01 2.734e+01 3.198e+02, threshold=5.093e+01, percent-clipped=3.0 2024-08-19 07:37:14,714 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 35 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-19 07:37:23,192 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-19 07:37:23,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4357390.0, ans=0.125 2024-08-19 07:37:25,628 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 5650, loss[loss=0.09693, beats_loss=0.0118, ecapa_loss=0.0001302, whisper_loss=0.08383, over 22577.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01057, ecapa_loss=0.0001415, whisper_loss=0.0895, over 3894162.69 frames. ], batch size: 92, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:37:28,332 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-19 07:37:40,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4357590.0, ans=0.125 2024-08-19 07:37:45,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4357590.0, ans=0.0 2024-08-19 07:37:58,494 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.76 vs. limit=8.0 2024-08-19 07:38:02,891 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4357790.0, ans=0.0 2024-08-19 07:38:02,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4357790.0, ans=0.07 2024-08-19 07:38:25,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4357890.0, ans=0.0 2024-08-19 07:38:27,516 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 5700, loss[loss=0.1098, beats_loss=0.009184, ecapa_loss=0.0001529, whisper_loss=0.09904, over 16017.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01062, ecapa_loss=0.0001421, whisper_loss=0.08922, over 3910521.05 frames. ], batch size: 63, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:38:54,044 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2024-08-19 07:38:56,187 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 29 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-19 07:38:57,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4358190.0, ans=0.125 2024-08-19 07:39:02,165 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.849e+01 2.294e+01 2.534e+01 2.806e+01 5.396e+01, threshold=5.067e+01, percent-clipped=1.0 2024-08-19 07:39:10,794 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-19 07:39:16,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4358390.0, ans=0.125 2024-08-19 07:39:17,930 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2024-08-19 07:39:29,531 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 5750, loss[loss=0.09981, beats_loss=0.01161, ecapa_loss=0.0001128, whisper_loss=0.08706, over 15193.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01057, ecapa_loss=0.0001416, whisper_loss=0.08954, over 3943024.07 frames. ], batch size: 59, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:39:30,914 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 28 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-19 07:39:44,630 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.34 vs. limit=10.0 2024-08-19 07:39:51,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4358590.0, ans=0.125 2024-08-19 07:39:57,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4358690.0, ans=0.0 2024-08-19 07:40:11,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=4358790.0, ans=10.0 2024-08-19 07:40:13,003 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2024-08-19 07:40:26,355 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-19 07:40:32,178 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 5800, loss[loss=0.1014, beats_loss=0.01044, ecapa_loss=0.0001599, whisper_loss=0.08932, over 20468.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01058, ecapa_loss=0.000141, whisper_loss=0.08934, over 3935827.15 frames. ], batch size: 84, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:40:33,188 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.84 vs. limit=15.0 2024-08-19 07:40:42,362 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 16 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-19 07:40:43,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4359090.0, ans=0.09899494936611666 2024-08-19 07:40:59,791 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 14 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-19 07:41:06,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4359190.0, ans=0.2 2024-08-19 07:41:06,914 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.217e+01 2.506e+01 2.797e+01 5.801e+01, threshold=5.013e+01, percent-clipped=1.0 2024-08-19 07:41:14,601 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 25 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-19 07:41:14,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4359290.0, ans=0.0 2024-08-19 07:41:34,055 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.45 vs. limit=10.0 2024-08-19 07:41:34,307 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 5850, loss[loss=0.1065, beats_loss=0.01118, ecapa_loss=0.0001141, whisper_loss=0.09417, over 19452.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01056, ecapa_loss=0.0001403, whisper_loss=0.08964, over 3899285.86 frames. ], batch size: 77, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:41:37,716 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.71 vs. limit=15.0 2024-08-19 07:41:38,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4359490.0, ans=0.1 2024-08-19 07:41:43,264 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-19 07:41:44,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4359490.0, ans=0.1 2024-08-19 07:41:58,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4359690.0, ans=0.125 2024-08-19 07:42:15,284 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-19 07:42:27,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4359890.0, ans=0.125 2024-08-19 07:42:34,678 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.563e-03 2024-08-19 07:42:36,983 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-436000.pt 2024-08-19 07:42:39,357 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 5900, loss[loss=0.1074, beats_loss=0.01081, ecapa_loss=0.0001259, whisper_loss=0.09537, over 23820.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0105, ecapa_loss=0.0001406, whisper_loss=0.08979, over 3891356.53 frames. ], batch size: 92, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:42:44,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4359990.0, ans=0.2 2024-08-19 07:42:52,927 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 07:42:57,860 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 24 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-19 07:43:13,513 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.336e+01 2.599e+01 2.904e+01 5.543e+01, threshold=5.198e+01, percent-clipped=1.0 2024-08-19 07:43:20,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4360290.0, ans=0.125 2024-08-19 07:43:40,784 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 5950, loss[loss=0.1121, beats_loss=0.009596, ecapa_loss=0.000157, whisper_loss=0.1009, over 23052.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01052, ecapa_loss=0.0001405, whisper_loss=0.08995, over 3881450.46 frames. ], batch size: 91, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:43:42,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4360490.0, ans=0.04949747468305833 2024-08-19 07:43:49,681 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 33 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-19 07:43:59,198 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.86 vs. limit=6.0 2024-08-19 07:44:40,185 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 19 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-19 07:44:43,770 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 6000, loss[loss=0.09804, beats_loss=0.01229, ecapa_loss=0.0001238, whisper_loss=0.08452, over 22204.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01058, ecapa_loss=0.0001406, whisper_loss=0.08905, over 3848836.28 frames. ], batch size: 88, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:44:43,771 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-19 07:45:18,095 INFO [train_multi_KD3.py:1149] (0/4) Epoch 30, validation on ASR_libri: loss=0.2522, beats_loss=0, ecapa_loss=0.0005153, whisper_loss=0.2471, over 922467.00 frames. 2024-08-19 07:45:36,093 INFO [train_multi_KD3.py:1149] (0/4) Epoch 30, validation on SV_voxceleb1: loss=0.004094, beats_loss=0, ecapa_loss=0.0004094, whisper_loss=0, over 939242.00 frames. 2024-08-19 07:47:13,252 INFO [train_multi_KD3.py:1149] (0/4) Epoch 30, validation on AT_audioset: loss=0.02301, beats_loss=0.02301, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 07:47:13,256 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32389MB 2024-08-19 07:47:18,653 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.48 vs. limit=15.0 2024-08-19 07:47:19,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4360990.0, ans=0.1 2024-08-19 07:47:33,226 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 16 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-19 07:47:34,471 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-19 07:47:44,520 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 13 from Vox, 43 fro AS 2024-08-19 07:47:48,005 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.272e+01 2.512e+01 2.834e+01 4.204e+02, threshold=5.024e+01, percent-clipped=2.0 2024-08-19 07:47:57,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4361290.0, ans=0.125 2024-08-19 07:47:58,246 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-19 07:48:08,099 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 28 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-19 07:48:10,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4361390.0, ans=0.0 2024-08-19 07:48:15,556 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 6050, loss[loss=0.09695, beats_loss=0.01106, ecapa_loss=0.0001843, whisper_loss=0.08405, over 17905.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01054, ecapa_loss=0.0001407, whisper_loss=0.08911, over 3838772.74 frames. ], batch size: 77, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:48:17,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4361490.0, ans=0.1 2024-08-19 07:48:45,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4361690.0, ans=0.125 2024-08-19 07:49:01,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=4361790.0, ans=15.0 2024-08-19 07:49:08,973 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-19 07:49:17,187 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 6100, loss[loss=0.1088, beats_loss=0.00846, ecapa_loss=0.0001518, whisper_loss=0.0988, over 15221.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01063, ecapa_loss=0.0001408, whisper_loss=0.0889, over 3853919.52 frames. ], batch size: 59, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:49:22,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4361990.0, ans=0.0 2024-08-19 07:49:37,240 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-19 07:49:50,881 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 22 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-19 07:49:51,996 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.656e+01 2.346e+01 2.570e+01 2.899e+01 1.665e+02, threshold=5.140e+01, percent-clipped=1.0 2024-08-19 07:49:54,676 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 28 from LS+wenet, 9 from Vox, 33 fro AS 2024-08-19 07:49:58,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4362290.0, ans=0.0 2024-08-19 07:50:04,670 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 24 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-19 07:50:15,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4362390.0, ans=0.125 2024-08-19 07:50:15,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4362390.0, ans=0.0 2024-08-19 07:50:19,221 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 6150, loss[loss=0.1232, beats_loss=0.008376, ecapa_loss=0.0001143, whisper_loss=0.1137, over 22655.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0106, ecapa_loss=0.0001418, whisper_loss=0.08936, over 3857330.44 frames. ], batch size: 84, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:50:22,008 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-19 07:50:28,059 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 40 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-19 07:51:12,463 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.64 vs. limit=15.0 2024-08-19 07:51:21,800 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 6200, loss[loss=0.1201, beats_loss=0.008838, ecapa_loss=0.0001193, whisper_loss=0.11, over 22927.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01058, ecapa_loss=0.0001415, whisper_loss=0.08859, over 3807330.96 frames. ], batch size: 85, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:51:33,257 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 26 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-19 07:51:36,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4363090.0, ans=0.05 2024-08-19 07:51:40,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4363090.0, ans=0.125 2024-08-19 07:51:52,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4363190.0, ans=0.125 2024-08-19 07:51:53,381 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 27 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-19 07:51:57,099 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.005e+01 2.318e+01 2.578e+01 2.832e+01 1.795e+02, threshold=5.155e+01, percent-clipped=1.0 2024-08-19 07:52:16,136 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 20 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-19 07:52:19,714 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 19 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-19 07:52:24,454 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 6250, loss[loss=0.08592, beats_loss=0.01208, ecapa_loss=0.0001388, whisper_loss=0.07245, over 22646.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01052, ecapa_loss=0.0001416, whisper_loss=0.08889, over 3812757.96 frames. ], batch size: 94, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:52:33,756 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.61 vs. limit=15.0 2024-08-19 07:52:36,112 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-19 07:52:40,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4363590.0, ans=0.1 2024-08-19 07:52:48,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4363690.0, ans=0.0 2024-08-19 07:53:03,467 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 14 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-19 07:53:09,862 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 31 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-19 07:53:16,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4363890.0, ans=0.125 2024-08-19 07:53:21,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4363890.0, ans=0.2 2024-08-19 07:53:23,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4363890.0, ans=0.125 2024-08-19 07:53:27,142 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 6300, loss[loss=0.06788, beats_loss=0.01348, ecapa_loss=0.0001502, whisper_loss=0.0529, over 16262.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01045, ecapa_loss=0.0001424, whisper_loss=0.08918, over 3804216.78 frames. ], batch size: 70, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:53:47,171 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 24 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-19 07:54:02,111 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.975e+01 2.421e+01 2.740e+01 3.001e+01 4.558e+01, threshold=5.480e+01, percent-clipped=0.0 2024-08-19 07:54:02,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4364190.0, ans=0.2 2024-08-19 07:54:06,661 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.40 vs. limit=15.0 2024-08-19 07:54:15,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4364290.0, ans=0.0 2024-08-19 07:54:29,817 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 6350, loss[loss=0.1224, beats_loss=0.009713, ecapa_loss=0.0001353, whisper_loss=0.1113, over 22755.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01041, ecapa_loss=0.0001419, whisper_loss=0.0895, over 3801335.05 frames. ], batch size: 90, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:54:29,928 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 25 from LS+wenet, 12 from Vox, 38 fro AS 2024-08-19 07:54:52,681 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=14.99 vs. limit=15.0 2024-08-19 07:54:54,623 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 26 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-19 07:55:00,682 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 31 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-19 07:55:07,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4364790.0, ans=0.07 2024-08-19 07:55:10,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4364790.0, ans=0.2 2024-08-19 07:55:13,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4364790.0, ans=0.035 2024-08-19 07:55:19,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4364890.0, ans=0.2 2024-08-19 07:55:21,435 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-19 07:55:29,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4364890.0, ans=0.125 2024-08-19 07:55:33,098 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 6400, loss[loss=0.1035, beats_loss=0.01114, ecapa_loss=0.0001229, whisper_loss=0.0911, over 19853.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01046, ecapa_loss=0.0001421, whisper_loss=0.08989, over 3828913.38 frames. ], batch size: 78, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:55:34,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4364990.0, ans=0.125 2024-08-19 07:55:43,015 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 16 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-19 07:55:44,925 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.296e+01 2024-08-19 07:56:13,970 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-19 07:56:14,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4365190.0, ans=0.1 2024-08-19 07:56:15,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4365190.0, ans=0.0 2024-08-19 07:56:16,381 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.014e+01 2.326e+01 2.535e+01 2.915e+01 6.831e+01, threshold=5.071e+01, percent-clipped=1.0 2024-08-19 07:56:21,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4365290.0, ans=0.125 2024-08-19 07:56:21,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4365290.0, ans=0.125 2024-08-19 07:56:25,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4365290.0, ans=0.125 2024-08-19 07:56:52,572 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-19 07:56:56,170 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 6450, loss[loss=0.1122, beats_loss=0.01056, ecapa_loss=0.0001294, whisper_loss=0.1003, over 23782.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01046, ecapa_loss=0.0001423, whisper_loss=0.0904, over 3866664.60 frames. ], batch size: 94, lr: 2.04e-03, grad_scale: 1.152921504606847e+18 2024-08-19 07:56:58,409 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.29 vs. limit=15.0 2024-08-19 07:57:08,265 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.78 vs. limit=15.0 2024-08-19 07:57:10,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4365490.0, ans=0.125 2024-08-19 07:57:15,925 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.06 vs. limit=22.5 2024-08-19 07:57:39,458 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 20 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-19 07:57:43,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4365790.0, ans=0.2 2024-08-19 07:57:48,292 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-19 07:57:51,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4365790.0, ans=0.015 2024-08-19 07:57:54,581 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.39 vs. limit=15.0 2024-08-19 07:58:24,176 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-19 07:58:26,386 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 6500, loss[loss=0.09938, beats_loss=0.01158, ecapa_loss=0.0001572, whisper_loss=0.08623, over 21370.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01051, ecapa_loss=0.0001424, whisper_loss=0.08994, over 3872396.23 frames. ], batch size: 91, lr: 2.04e-03, grad_scale: 1.152921504606847e+18 2024-08-19 07:58:30,116 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.62 vs. limit=15.0 2024-08-19 07:58:31,620 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.48 vs. limit=22.5 2024-08-19 07:58:35,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4365990.0, ans=0.125 2024-08-19 07:58:35,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4365990.0, ans=0.125 2024-08-19 07:58:35,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4365990.0, ans=0.125 2024-08-19 07:58:55,069 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-19 07:59:02,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=4366090.0, ans=0.025 2024-08-19 07:59:06,767 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 15 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-19 07:59:07,378 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.31 vs. limit=22.5 2024-08-19 07:59:29,241 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.467e+01 2.714e+01 3.135e+01 4.455e+01, threshold=5.427e+01, percent-clipped=0.0 2024-08-19 07:59:32,961 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.91 vs. limit=22.5 2024-08-19 08:00:15,393 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 6550, loss[loss=0.07945, beats_loss=0.01228, ecapa_loss=0.0001382, whisper_loss=0.06579, over 22121.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01057, ecapa_loss=0.0001417, whisper_loss=0.08978, over 3903495.22 frames. ], batch size: 96, lr: 2.04e-03, grad_scale: 1.152921504606847e+18 2024-08-19 08:01:17,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4366690.0, ans=0.125 2024-08-19 08:01:42,301 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 24 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-19 08:01:45,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4366890.0, ans=0.0 2024-08-19 08:01:57,591 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 08:02:06,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4366990.0, ans=0.0 2024-08-19 08:02:07,318 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 6600, loss[loss=0.09736, beats_loss=0.01054, ecapa_loss=0.0001517, whisper_loss=0.08531, over 21370.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01055, ecapa_loss=0.0001409, whisper_loss=0.09014, over 3922385.88 frames. ], batch size: 87, lr: 2.04e-03, grad_scale: 1.152921504606847e+18 2024-08-19 08:02:18,220 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.60 vs. limit=10.0 2024-08-19 08:02:21,012 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.51 vs. limit=15.0 2024-08-19 08:02:43,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4367090.0, ans=0.0 2024-08-19 08:02:53,505 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 15 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-19 08:03:00,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4367190.0, ans=0.125 2024-08-19 08:03:12,082 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 20 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-19 08:03:13,615 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.306e+01 2.532e+01 2.841e+01 4.066e+01, threshold=5.063e+01, percent-clipped=0.0 2024-08-19 08:03:15,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4367290.0, ans=0.125 2024-08-19 08:03:22,430 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 28 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-19 08:03:24,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4367290.0, ans=0.125 2024-08-19 08:03:36,174 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.91 vs. limit=15.0 2024-08-19 08:03:38,440 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.46 vs. limit=15.0 2024-08-19 08:03:43,480 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 6650, loss[loss=0.09062, beats_loss=0.01055, ecapa_loss=0.0001543, whisper_loss=0.07853, over 20045.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01048, ecapa_loss=0.0001424, whisper_loss=0.09074, over 3906673.83 frames. ], batch size: 78, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:04:02,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4367590.0, ans=0.0 2024-08-19 08:04:06,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4367590.0, ans=0.0 2024-08-19 08:04:08,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4367590.0, ans=0.125 2024-08-19 08:04:33,285 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-19 08:04:41,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4367890.0, ans=0.125 2024-08-19 08:04:57,446 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 6700, loss[loss=0.1141, beats_loss=0.009418, ecapa_loss=0.0001178, whisper_loss=0.1035, over 19214.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01042, ecapa_loss=0.000142, whisper_loss=0.09103, over 3900400.25 frames. ], batch size: 72, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:05:12,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4368090.0, ans=0.0 2024-08-19 08:05:21,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4368090.0, ans=0.1 2024-08-19 08:05:28,616 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 11 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-19 08:05:37,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4368190.0, ans=0.125 2024-08-19 08:05:42,030 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.605e+01 2.280e+01 2.491e+01 2.766e+01 3.799e+01, threshold=4.981e+01, percent-clipped=0.0 2024-08-19 08:05:54,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4368290.0, ans=0.125 2024-08-19 08:05:56,239 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 08:06:12,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4368490.0, ans=0.2 2024-08-19 08:06:13,701 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 6750, loss[loss=0.09201, beats_loss=0.009963, ecapa_loss=0.0001315, whisper_loss=0.08073, over 22473.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01037, ecapa_loss=0.0001423, whisper_loss=0.09053, over 3840819.14 frames. ], batch size: 90, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:06:25,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4368490.0, ans=0.1 2024-08-19 08:06:41,899 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.10 vs. limit=15.0 2024-08-19 08:06:57,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4368790.0, ans=0.2 2024-08-19 08:07:13,541 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 20 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-19 08:07:14,162 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.85 vs. limit=15.0 2024-08-19 08:07:23,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4368890.0, ans=0.125 2024-08-19 08:07:28,584 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 6800, loss[loss=0.1065, beats_loss=0.01035, ecapa_loss=0.0001413, whisper_loss=0.09478, over 19996.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01038, ecapa_loss=0.0001424, whisper_loss=0.09008, over 3847937.52 frames. ], batch size: 77, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:07:31,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4368990.0, ans=0.0 2024-08-19 08:07:34,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4368990.0, ans=0.125 2024-08-19 08:07:46,019 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.78 vs. limit=15.0 2024-08-19 08:08:04,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4369190.0, ans=0.125 2024-08-19 08:08:11,803 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.446e+01 2.587e+01 2.881e+01 4.116e+01, threshold=5.174e+01, percent-clipped=0.0 2024-08-19 08:08:12,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4369290.0, ans=0.0 2024-08-19 08:08:13,345 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 15 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-19 08:08:35,019 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-19 08:08:41,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4369490.0, ans=0.0 2024-08-19 08:08:42,300 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 6850, loss[loss=0.102, beats_loss=0.007768, ecapa_loss=0.0001665, whisper_loss=0.09254, over 19738.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01044, ecapa_loss=0.0001414, whisper_loss=0.0895, over 3843381.85 frames. ], batch size: 76, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:08:42,463 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-19 08:08:44,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4369490.0, ans=0.125 2024-08-19 08:08:45,982 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.02 vs. limit=22.5 2024-08-19 08:09:19,189 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 20 from LS+wenet, 7 from Vox, 35 fro AS 2024-08-19 08:09:20,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4369690.0, ans=0.125 2024-08-19 08:09:25,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4369690.0, ans=0.125 2024-08-19 08:09:29,314 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.216e+05 2024-08-19 08:09:36,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4369790.0, ans=0.125 2024-08-19 08:09:49,452 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4369890.0, ans=0.125 2024-08-19 08:09:57,693 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 6900, loss[loss=0.09496, beats_loss=0.01107, ecapa_loss=0.000168, whisper_loss=0.0822, over 18445.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01042, ecapa_loss=0.0001422, whisper_loss=0.09025, over 3838043.07 frames. ], batch size: 77, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:09:59,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4369990.0, ans=0.125 2024-08-19 08:10:03,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4369990.0, ans=0.125 2024-08-19 08:10:09,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4369990.0, ans=0.1 2024-08-19 08:10:19,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4370090.0, ans=0.125 2024-08-19 08:10:23,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4370090.0, ans=0.125 2024-08-19 08:10:29,177 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 29 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-19 08:10:37,377 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.45 vs. limit=12.0 2024-08-19 08:10:40,727 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.856e+01 2.283e+01 2.474e+01 2.721e+01 3.268e+01, threshold=4.948e+01, percent-clipped=0.0 2024-08-19 08:10:59,994 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.43 vs. limit=10.0 2024-08-19 08:11:09,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4370390.0, ans=0.1 2024-08-19 08:11:11,192 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-19 08:11:12,281 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 6950, loss[loss=0.1063, beats_loss=0.009816, ecapa_loss=0.0001422, whisper_loss=0.09505, over 17883.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01054, ecapa_loss=0.0001404, whisper_loss=0.09019, over 3885319.33 frames. ], batch size: 71, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:11:19,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4370490.0, ans=0.0 2024-08-19 08:11:38,870 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 22 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-19 08:11:39,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4370590.0, ans=0.125 2024-08-19 08:12:08,541 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.96 vs. limit=10.0 2024-08-19 08:12:24,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4370890.0, ans=0.125 2024-08-19 08:12:28,950 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 7000, loss[loss=0.09994, beats_loss=0.01017, ecapa_loss=0.0001493, whisper_loss=0.08828, over 18090.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01047, ecapa_loss=0.0001413, whisper_loss=0.09031, over 3859425.47 frames. ], batch size: 73, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:12:33,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4370990.0, ans=0.0 2024-08-19 08:12:49,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4371090.0, ans=0.125 2024-08-19 08:13:02,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4371190.0, ans=0.0 2024-08-19 08:13:11,935 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.771e+01 2.284e+01 2.585e+01 2.897e+01 5.224e+01, threshold=5.169e+01, percent-clipped=1.0 2024-08-19 08:13:20,615 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 17 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-19 08:13:42,975 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 7050, loss[loss=0.09586, beats_loss=0.0131, ecapa_loss=0.0001432, whisper_loss=0.08133, over 21143.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01051, ecapa_loss=0.0001412, whisper_loss=0.08984, over 3872924.90 frames. ], batch size: 90, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:13:44,528 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-19 08:13:55,050 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.31 vs. limit=10.0 2024-08-19 08:13:55,918 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 19 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-19 08:14:23,475 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 31 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-19 08:15:00,156 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.77 vs. limit=22.5 2024-08-19 08:15:02,413 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 7100, loss[loss=0.09873, beats_loss=0.009328, ecapa_loss=0.0001307, whisper_loss=0.08809, over 19462.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01053, ecapa_loss=0.0001409, whisper_loss=0.08953, over 3882484.58 frames. ], batch size: 74, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:15:10,555 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-19 08:15:11,896 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-19 08:15:28,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4372090.0, ans=0.0 2024-08-19 08:15:46,161 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.36 vs. limit=22.5 2024-08-19 08:15:48,446 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.334e+01 2.574e+01 2.776e+01 4.254e+01, threshold=5.149e+01, percent-clipped=0.0 2024-08-19 08:15:51,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4372290.0, ans=0.125 2024-08-19 08:16:18,489 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 7150, loss[loss=0.09552, beats_loss=0.01066, ecapa_loss=0.0001738, whisper_loss=0.08312, over 21152.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01049, ecapa_loss=0.0001404, whisper_loss=0.09069, over 3897380.61 frames. ], batch size: 91, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:16:26,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4372490.0, ans=0.0 2024-08-19 08:16:27,423 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-19 08:16:29,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4372490.0, ans=0.125 2024-08-19 08:16:30,019 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 22 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-19 08:16:37,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4372590.0, ans=0.1 2024-08-19 08:16:48,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4372690.0, ans=0.0 2024-08-19 08:16:53,398 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 36 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-19 08:17:19,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4372890.0, ans=0.1 2024-08-19 08:17:21,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4372890.0, ans=0.1 2024-08-19 08:17:37,070 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 7200, loss[loss=0.08907, beats_loss=0.01181, ecapa_loss=0.0001392, whisper_loss=0.07587, over 22279.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0104, ecapa_loss=0.0001408, whisper_loss=0.09085, over 3892474.66 frames. ], batch size: 91, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:17:37,431 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-19 08:17:44,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4372990.0, ans=0.0 2024-08-19 08:17:46,109 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 20 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-19 08:17:48,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4372990.0, ans=0.125 2024-08-19 08:18:05,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4373090.0, ans=0.1 2024-08-19 08:18:07,220 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 22 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-19 08:18:09,852 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 26 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-19 08:18:12,188 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 18 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-19 08:18:14,185 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 23 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-19 08:18:23,257 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.339e+01 2.591e+01 2.933e+01 7.006e+01, threshold=5.182e+01, percent-clipped=1.0 2024-08-19 08:18:25,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4373290.0, ans=0.0 2024-08-19 08:18:32,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4373290.0, ans=0.0 2024-08-19 08:18:39,188 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 19 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-19 08:18:41,022 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2024-08-19 08:18:49,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4373390.0, ans=0.2 2024-08-19 08:18:52,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=4373390.0, ans=0.025 2024-08-19 08:18:56,173 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 7250, loss[loss=0.08712, beats_loss=0.01091, ecapa_loss=0.0001568, whisper_loss=0.07464, over 21217.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01045, ecapa_loss=0.0001408, whisper_loss=0.09041, over 3886615.69 frames. ], batch size: 91, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:19:09,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4373490.0, ans=0.125 2024-08-19 08:19:09,473 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4373490.0, ans=0.125 2024-08-19 08:19:09,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4373490.0, ans=0.1 2024-08-19 08:19:09,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4373490.0, ans=0.125 2024-08-19 08:19:10,541 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-19 08:19:23,716 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 14 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-19 08:19:50,514 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4373790.0, ans=0.125 2024-08-19 08:19:53,077 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-19 08:20:01,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4373890.0, ans=0.1 2024-08-19 08:20:16,099 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 7300, loss[loss=0.08592, beats_loss=0.01172, ecapa_loss=0.0001465, whisper_loss=0.07273, over 21981.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01043, ecapa_loss=0.000142, whisper_loss=0.08982, over 3857710.51 frames. ], batch size: 91, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:20:29,468 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-19 08:20:37,642 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-19 08:20:45,323 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 22 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-19 08:20:47,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4374190.0, ans=0.125 2024-08-19 08:21:04,707 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.340e+01 2.529e+01 2.737e+01 3.250e+01, threshold=5.059e+01, percent-clipped=0.0 2024-08-19 08:21:04,957 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 35 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-19 08:21:07,417 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2024-08-19 08:21:10,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4374290.0, ans=0.125 2024-08-19 08:21:22,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4374390.0, ans=0.1 2024-08-19 08:21:38,100 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 7350, loss[loss=0.1059, beats_loss=0.01131, ecapa_loss=0.0001527, whisper_loss=0.09305, over 18373.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01044, ecapa_loss=0.0001432, whisper_loss=0.0896, over 3846979.63 frames. ], batch size: 73, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:21:56,358 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 21 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-19 08:22:00,398 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 27 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-19 08:22:05,401 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-19 08:22:08,400 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.76 vs. limit=15.0 2024-08-19 08:22:09,082 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-19 08:22:17,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4374690.0, ans=0.2 2024-08-19 08:22:21,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4374690.0, ans=0.07 2024-08-19 08:22:24,996 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-19 08:22:44,856 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 27 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-19 08:22:54,676 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 7400, loss[loss=0.09641, beats_loss=0.01141, ecapa_loss=0.0001619, whisper_loss=0.08339, over 20911.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0105, ecapa_loss=0.0001419, whisper_loss=0.08937, over 3864853.60 frames. ], batch size: 87, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:22:57,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4374990.0, ans=0.125 2024-08-19 08:23:14,959 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 14 from Vox, 46 fro AS 2024-08-19 08:23:33,020 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-19 08:23:43,114 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.350e+01 2.542e+01 2.861e+01 4.984e+02, threshold=5.085e+01, percent-clipped=1.0 2024-08-19 08:23:46,905 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.68 vs. limit=15.0 2024-08-19 08:24:02,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4375390.0, ans=0.125 2024-08-19 08:24:15,403 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 7450, loss[loss=0.09924, beats_loss=0.01157, ecapa_loss=0.0001562, whisper_loss=0.08611, over 19156.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01057, ecapa_loss=0.0001417, whisper_loss=0.08877, over 3870527.22 frames. ], batch size: 81, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:24:17,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4375490.0, ans=0.125 2024-08-19 08:24:18,473 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 26 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-19 08:24:28,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4375490.0, ans=0.2 2024-08-19 08:24:38,672 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 23 from LS+wenet, 33 from Vox, 36 fro AS 2024-08-19 08:24:42,006 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.56 vs. limit=15.0 2024-08-19 08:24:49,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4375690.0, ans=0.0 2024-08-19 08:24:49,621 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.60 vs. limit=15.0 2024-08-19 08:24:55,448 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.03 vs. limit=15.0 2024-08-19 08:24:58,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4375690.0, ans=0.125 2024-08-19 08:24:58,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4375690.0, ans=0.125 2024-08-19 08:25:07,095 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-19 08:25:14,783 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.38 vs. limit=22.5 2024-08-19 08:25:30,184 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 7500, loss[loss=0.1191, beats_loss=0.00844, ecapa_loss=0.0001648, whisper_loss=0.109, over 21309.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01049, ecapa_loss=0.0001426, whisper_loss=0.0895, over 3883978.55 frames. ], batch size: 89, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:25:34,625 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 13 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-19 08:25:37,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4375990.0, ans=0.125 2024-08-19 08:25:38,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4375990.0, ans=0.125 2024-08-19 08:25:46,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4376090.0, ans=0.0 2024-08-19 08:26:12,076 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.516e+01 2.304e+01 2.520e+01 2.744e+01 4.658e+01, threshold=5.040e+01, percent-clipped=0.0 2024-08-19 08:26:12,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4376290.0, ans=0.125 2024-08-19 08:26:15,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4376290.0, ans=0.125 2024-08-19 08:26:19,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4376290.0, ans=0.125 2024-08-19 08:26:19,884 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.64 vs. limit=22.5 2024-08-19 08:26:25,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4376290.0, ans=0.2 2024-08-19 08:26:33,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4376390.0, ans=0.0 2024-08-19 08:26:44,128 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 7550, loss[loss=0.1183, beats_loss=0.009138, ecapa_loss=0.0001479, whisper_loss=0.1077, over 16110.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01052, ecapa_loss=0.0001432, whisper_loss=0.08844, over 3832875.78 frames. ], batch size: 62, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:26:54,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4376490.0, ans=0.1 2024-08-19 08:26:56,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=4376490.0, ans=15.0 2024-08-19 08:26:59,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4376590.0, ans=0.0 2024-08-19 08:27:03,536 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.146e+01 2024-08-19 08:27:21,536 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-19 08:27:29,526 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-19 08:27:55,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4376890.0, ans=0.2 2024-08-19 08:28:00,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4376990.0, ans=0.1 2024-08-19 08:28:01,320 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 7600, loss[loss=0.09478, beats_loss=0.01053, ecapa_loss=0.0001215, whisper_loss=0.08303, over 19734.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01051, ecapa_loss=0.0001424, whisper_loss=0.08854, over 3832087.77 frames. ], batch size: 75, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:28:03,217 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 18 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-19 08:28:06,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4376990.0, ans=0.2 2024-08-19 08:28:18,172 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.18 vs. limit=10.0 2024-08-19 08:28:36,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4377190.0, ans=0.0 2024-08-19 08:28:40,853 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-19 08:28:45,026 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.278e+01 2.433e+01 2.694e+01 5.084e+01, threshold=4.867e+01, percent-clipped=1.0 2024-08-19 08:28:53,493 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-19 08:29:02,367 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.53 vs. limit=15.0 2024-08-19 08:29:03,019 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 42 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-19 08:29:06,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4377390.0, ans=0.125 2024-08-19 08:29:13,736 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 22 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-19 08:29:15,029 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 7650, loss[loss=0.1096, beats_loss=0.009007, ecapa_loss=0.0001542, whisper_loss=0.09908, over 16268.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01043, ecapa_loss=0.0001427, whisper_loss=0.08906, over 3833790.15 frames. ], batch size: 62, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:29:38,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4377590.0, ans=0.09899494936611666 2024-08-19 08:29:59,253 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.48 vs. limit=15.0 2024-08-19 08:30:04,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4377790.0, ans=0.1 2024-08-19 08:30:24,516 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 7700, loss[loss=0.1079, beats_loss=0.01077, ecapa_loss=0.0001437, whisper_loss=0.09567, over 22525.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01049, ecapa_loss=0.0001425, whisper_loss=0.08868, over 3837482.96 frames. ], batch size: 90, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:30:38,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4378090.0, ans=0.0 2024-08-19 08:30:45,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4378090.0, ans=0.0 2024-08-19 08:30:49,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4378090.0, ans=0.1 2024-08-19 08:30:58,051 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 20 from LS+wenet, 37 from Vox, 35 fro AS 2024-08-19 08:31:03,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4378190.0, ans=0.125 2024-08-19 08:31:05,964 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.820e+01 2.389e+01 2.553e+01 2.809e+01 4.632e+01, threshold=5.107e+01, percent-clipped=0.0 2024-08-19 08:31:07,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4378290.0, ans=0.0 2024-08-19 08:31:10,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4378290.0, ans=0.125 2024-08-19 08:31:15,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4378290.0, ans=0.2 2024-08-19 08:31:26,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4378390.0, ans=0.125 2024-08-19 08:31:27,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4378390.0, ans=0.125 2024-08-19 08:31:30,551 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 30 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-19 08:31:34,546 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 7750, loss[loss=0.09437, beats_loss=0.01038, ecapa_loss=0.000158, whisper_loss=0.08241, over 21034.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01046, ecapa_loss=0.0001427, whisper_loss=0.0892, over 3855464.90 frames. ], batch size: 89, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:31:38,777 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 31 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-19 08:31:44,190 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 20 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-19 08:32:06,183 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.60 vs. limit=15.0 2024-08-19 08:32:12,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4378690.0, ans=10.0 2024-08-19 08:32:14,636 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.75 vs. limit=10.0 2024-08-19 08:32:23,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4378790.0, ans=0.125 2024-08-19 08:32:27,280 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-19 08:32:33,433 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 23 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-19 08:32:35,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4378890.0, ans=0.125 2024-08-19 08:32:41,238 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.11 vs. limit=10.0 2024-08-19 08:32:41,815 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 7800, loss[loss=0.1128, beats_loss=0.00972, ecapa_loss=0.0001377, whisper_loss=0.1017, over 21537.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01043, ecapa_loss=0.000141, whisper_loss=0.08949, over 3843103.27 frames. ], batch size: 86, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:32:47,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=4378990.0, ans=15.0 2024-08-19 08:33:20,977 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.315e+01 2.563e+01 2.893e+01 6.411e+01, threshold=5.126e+01, percent-clipped=2.0 2024-08-19 08:33:34,696 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 21 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-19 08:33:40,456 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=29.80 vs. limit=22.5 2024-08-19 08:33:48,983 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 7850, loss[loss=0.1026, beats_loss=0.01079, ecapa_loss=0.0001837, whisper_loss=0.08999, over 14365.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01042, ecapa_loss=0.0001413, whisper_loss=0.08952, over 3828903.02 frames. ], batch size: 60, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:33:51,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4379490.0, ans=0.125 2024-08-19 08:33:57,053 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 21 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-19 08:34:05,618 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.17 vs. limit=12.0 2024-08-19 08:34:07,540 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 23 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-19 08:34:08,339 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.93 vs. limit=10.0 2024-08-19 08:34:11,256 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-19 08:34:16,689 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 32 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-19 08:34:20,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4379690.0, ans=0.125 2024-08-19 08:34:29,577 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 17 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-19 08:34:39,301 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 08:34:40,187 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-19 08:34:48,389 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-19 08:34:53,628 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 27 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-19 08:34:54,852 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 7900, loss[loss=0.1054, beats_loss=0.01064, ecapa_loss=0.0001339, whisper_loss=0.09346, over 21551.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01048, ecapa_loss=0.00014, whisper_loss=0.08993, over 3825963.20 frames. ], batch size: 83, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:34:58,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4379990.0, ans=0.0 2024-08-19 08:35:15,277 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.79 vs. limit=22.5 2024-08-19 08:35:17,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4380090.0, ans=0.125 2024-08-19 08:35:34,025 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.772e+01 2.323e+01 2.601e+01 2.970e+01 4.832e+01, threshold=5.202e+01, percent-clipped=0.0 2024-08-19 08:35:49,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4380390.0, ans=0.125 2024-08-19 08:35:55,964 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 23 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-19 08:36:01,057 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 7950, loss[loss=0.0989, beats_loss=0.01068, ecapa_loss=0.0001436, whisper_loss=0.08679, over 15106.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01053, ecapa_loss=0.0001408, whisper_loss=0.08975, over 3807971.97 frames. ], batch size: 61, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:36:09,448 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 14 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-19 08:36:35,230 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-19 08:36:39,291 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-19 08:36:39,980 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.05 vs. limit=10.0 2024-08-19 08:36:47,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=4380790.0, ans=0.05 2024-08-19 08:36:58,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4380890.0, ans=0.0 2024-08-19 08:36:59,508 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-19 08:37:08,549 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 8000, loss[loss=0.0907, beats_loss=0.01064, ecapa_loss=0.0001424, whisper_loss=0.07864, over 22010.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01051, ecapa_loss=0.0001405, whisper_loss=0.09001, over 3830890.55 frames. ], batch size: 93, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:37:09,207 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.66 vs. limit=10.0 2024-08-19 08:37:27,327 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 22 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-19 08:37:29,064 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.89 vs. limit=15.0 2024-08-19 08:37:30,241 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 16 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-19 08:37:37,491 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.34 vs. limit=15.0 2024-08-19 08:37:45,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4381190.0, ans=0.125 2024-08-19 08:37:48,864 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.335e+01 2.541e+01 2.792e+01 1.974e+02, threshold=5.082e+01, percent-clipped=1.0 2024-08-19 08:38:01,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4381390.0, ans=0.125 2024-08-19 08:38:06,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4381390.0, ans=0.5 2024-08-19 08:38:11,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4381390.0, ans=0.015 2024-08-19 08:38:16,351 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 8050, loss[loss=0.09431, beats_loss=0.01165, ecapa_loss=0.0001056, whisper_loss=0.0816, over 20938.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0105, ecapa_loss=0.0001405, whisper_loss=0.09039, over 3843228.19 frames. ], batch size: 80, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:38:16,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4381490.0, ans=0.125 2024-08-19 08:38:25,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4381490.0, ans=0.125 2024-08-19 08:38:40,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4381590.0, ans=0.125 2024-08-19 08:38:44,772 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 26 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-19 08:38:49,916 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2024-08-19 08:38:50,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4381690.0, ans=0.125 2024-08-19 08:38:52,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4381690.0, ans=0.125 2024-08-19 08:38:54,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4381690.0, ans=0.125 2024-08-19 08:39:05,133 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.61 vs. limit=15.0 2024-08-19 08:39:10,286 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 31 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-19 08:39:29,875 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 8100, loss[loss=0.114, beats_loss=0.007298, ecapa_loss=0.0001105, whisper_loss=0.1056, over 16527.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01051, ecapa_loss=0.0001403, whisper_loss=0.09017, over 3841532.61 frames. ], batch size: 58, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:39:40,654 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.73 vs. limit=22.5 2024-08-19 08:39:43,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4382090.0, ans=0.0 2024-08-19 08:40:02,637 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.17 vs. limit=15.0 2024-08-19 08:40:12,651 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.771e+01 2.219e+01 2.443e+01 2.808e+01 4.973e+01, threshold=4.885e+01, percent-clipped=0.0 2024-08-19 08:40:12,860 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-19 08:40:14,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4382290.0, ans=0.0 2024-08-19 08:40:15,657 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 24 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-19 08:40:34,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4382390.0, ans=0.125 2024-08-19 08:40:38,602 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.02 vs. limit=12.0 2024-08-19 08:40:41,174 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 8150, loss[loss=0.08831, beats_loss=0.01184, ecapa_loss=0.0001448, whisper_loss=0.07503, over 19884.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01047, ecapa_loss=0.0001415, whisper_loss=0.09045, over 3867272.04 frames. ], batch size: 85, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:40:54,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4382590.0, ans=0.0 2024-08-19 08:41:02,191 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-19 08:41:10,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=4382690.0, ans=0.05 2024-08-19 08:41:28,711 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.52 vs. limit=15.0 2024-08-19 08:41:33,874 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 21 from Vox, 17 fro AS 2024-08-19 08:41:36,793 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 08:41:52,721 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 8200, loss[loss=0.1112, beats_loss=0.01185, ecapa_loss=0.0001483, whisper_loss=0.09786, over 21531.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01049, ecapa_loss=0.0001423, whisper_loss=0.0907, over 3870743.71 frames. ], batch size: 86, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:42:05,632 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 31 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-19 08:42:11,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4383090.0, ans=0.0 2024-08-19 08:42:30,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4383190.0, ans=0.125 2024-08-19 08:42:35,682 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.301e+01 2.611e+01 2.872e+01 3.807e+01, threshold=5.223e+01, percent-clipped=0.0 2024-08-19 08:42:35,928 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-19 08:42:43,706 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.00 vs. limit=15.0 2024-08-19 08:42:50,016 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 26 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-19 08:42:53,300 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 22 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-19 08:42:59,698 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.79 vs. limit=5.0 2024-08-19 08:43:03,986 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 8250, loss[loss=0.08477, beats_loss=0.01222, ecapa_loss=0.0001494, whisper_loss=0.07105, over 21004.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0105, ecapa_loss=0.000143, whisper_loss=0.09072, over 3897506.58 frames. ], batch size: 91, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:43:04,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4383490.0, ans=0.125 2024-08-19 08:43:13,814 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 20 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-19 08:43:16,875 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-19 08:43:17,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=4383490.0, ans=0.05 2024-08-19 08:43:29,240 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-19 08:43:34,521 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.25 vs. limit=6.0 2024-08-19 08:43:38,482 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 16 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-19 08:43:51,021 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.65 vs. limit=22.5 2024-08-19 08:43:54,539 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 36 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-19 08:43:55,816 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-19 08:44:22,757 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 8300, loss[loss=0.1011, beats_loss=0.01175, ecapa_loss=0.0001453, whisper_loss=0.08794, over 13661.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0105, ecapa_loss=0.0001418, whisper_loss=0.09043, over 3925514.52 frames. ], batch size: 54, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:44:28,517 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.34 vs. limit=15.0 2024-08-19 08:44:36,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4383990.0, ans=0.1 2024-08-19 08:44:48,647 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 22 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-19 08:44:49,899 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 34 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-19 08:44:54,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4384190.0, ans=0.125 2024-08-19 08:45:07,633 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.704e+01 2.452e+01 2.719e+01 3.109e+01 1.763e+02, threshold=5.438e+01, percent-clipped=2.0 2024-08-19 08:45:09,884 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.02 vs. limit=15.0 2024-08-19 08:45:12,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4384290.0, ans=0.1 2024-08-19 08:45:13,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4384290.0, ans=0.125 2024-08-19 08:45:15,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4384290.0, ans=0.2 2024-08-19 08:45:20,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4384390.0, ans=0.125 2024-08-19 08:45:27,012 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 28 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-19 08:45:28,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4384390.0, ans=0.015 2024-08-19 08:45:36,742 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 8350, loss[loss=0.1027, beats_loss=0.01296, ecapa_loss=0.0001568, whisper_loss=0.08812, over 21036.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01056, ecapa_loss=0.000141, whisper_loss=0.08982, over 3931839.56 frames. ], batch size: 88, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:45:42,407 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 22 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-19 08:45:59,674 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-19 08:46:11,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4384690.0, ans=0.125 2024-08-19 08:46:35,756 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.10 vs. limit=15.0 2024-08-19 08:46:45,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4384890.0, ans=0.2 2024-08-19 08:46:48,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4384890.0, ans=0.0 2024-08-19 08:46:50,718 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 8400, loss[loss=0.1251, beats_loss=0.00706, ecapa_loss=0.0001604, whisper_loss=0.1165, over 15518.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01053, ecapa_loss=0.0001418, whisper_loss=0.08997, over 3887897.89 frames. ], batch size: 61, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:46:51,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=4384990.0, ans=0.5 2024-08-19 08:46:52,814 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 33 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-19 08:47:00,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4384990.0, ans=0.0 2024-08-19 08:47:38,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4385290.0, ans=0.1 2024-08-19 08:47:39,325 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.283e+01 2.484e+01 2.806e+01 4.179e+01, threshold=4.968e+01, percent-clipped=0.0 2024-08-19 08:47:42,850 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 18 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-19 08:47:47,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4385290.0, ans=0.125 2024-08-19 08:47:47,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4385290.0, ans=0.125 2024-08-19 08:48:10,981 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 8450, loss[loss=0.11, beats_loss=0.009572, ecapa_loss=0.000133, whisper_loss=0.09913, over 21617.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01048, ecapa_loss=0.0001405, whisper_loss=0.09012, over 3872565.91 frames. ], batch size: 84, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:48:20,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4385490.0, ans=0.125 2024-08-19 08:48:26,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4385590.0, ans=0.0 2024-08-19 08:48:29,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4385590.0, ans=0.0 2024-08-19 08:48:35,331 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 30 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-19 08:48:53,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4385790.0, ans=0.125 2024-08-19 08:48:59,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4385790.0, ans=0.0 2024-08-19 08:49:01,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4385790.0, ans=0.0 2024-08-19 08:49:10,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=4385890.0, ans=0.5 2024-08-19 08:49:11,093 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-19 08:49:19,550 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 14 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-19 08:49:22,878 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 8500, loss[loss=0.1014, beats_loss=0.01167, ecapa_loss=0.0001534, whisper_loss=0.0882, over 21685.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01041, ecapa_loss=0.0001414, whisper_loss=0.09033, over 3851618.31 frames. ], batch size: 90, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:49:31,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4385990.0, ans=0.125 2024-08-19 08:49:40,270 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-19 08:50:06,339 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+01 2.300e+01 2.586e+01 2.886e+01 4.322e+01, threshold=5.173e+01, percent-clipped=0.0 2024-08-19 08:50:20,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4386390.0, ans=0.2 2024-08-19 08:50:21,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4386390.0, ans=0.05 2024-08-19 08:50:27,635 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 08:50:36,439 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 8550, loss[loss=0.1114, beats_loss=0.009692, ecapa_loss=0.0001728, whisper_loss=0.1, over 22133.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01036, ecapa_loss=0.0001408, whisper_loss=0.09088, over 3878311.89 frames. ], batch size: 91, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:50:41,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4386490.0, ans=0.09899494936611666 2024-08-19 08:50:43,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4386490.0, ans=0.0 2024-08-19 08:50:45,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4386490.0, ans=0.1 2024-08-19 08:50:52,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4386590.0, ans=0.125 2024-08-19 08:51:06,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4386590.0, ans=0.125 2024-08-19 08:51:14,861 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 13 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-19 08:51:30,632 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 33 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-19 08:51:53,696 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 8600, loss[loss=0.1103, beats_loss=0.009838, ecapa_loss=0.0001229, whisper_loss=0.09922, over 22972.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01037, ecapa_loss=0.0001413, whisper_loss=0.09094, over 3894330.68 frames. ], batch size: 88, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:51:57,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4386990.0, ans=0.1 2024-08-19 08:51:59,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4386990.0, ans=0.2 2024-08-19 08:52:18,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=4387090.0, ans=0.05 2024-08-19 08:52:39,001 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.282e+01 2.546e+01 2.881e+01 4.529e+01, threshold=5.091e+01, percent-clipped=0.0 2024-08-19 08:52:39,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4387290.0, ans=0.1 2024-08-19 08:52:57,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4387390.0, ans=0.0 2024-08-19 08:53:04,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4387390.0, ans=0.0 2024-08-19 08:53:06,419 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 8650, loss[loss=0.07111, beats_loss=0.01221, ecapa_loss=0.0001602, whisper_loss=0.05729, over 17145.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01036, ecapa_loss=0.0001414, whisper_loss=0.09081, over 3901626.47 frames. ], batch size: 70, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:53:08,405 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-19 08:53:13,590 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=15.33 vs. limit=15.0 2024-08-19 08:53:16,063 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.36 vs. limit=15.0 2024-08-19 08:53:18,680 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-19 08:53:20,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4387590.0, ans=0.0 2024-08-19 08:53:40,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4387690.0, ans=0.125 2024-08-19 08:54:03,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4387890.0, ans=0.125 2024-08-19 08:54:09,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4387890.0, ans=0.0 2024-08-19 08:54:18,209 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 8700, loss[loss=0.09188, beats_loss=0.01137, ecapa_loss=0.0001388, whisper_loss=0.07912, over 22717.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01033, ecapa_loss=0.0001417, whisper_loss=0.09044, over 3903524.66 frames. ], batch size: 93, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:54:21,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4387990.0, ans=0.0 2024-08-19 08:54:27,680 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 36 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-19 08:54:31,743 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-19 08:54:37,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4388090.0, ans=0.2 2024-08-19 08:54:47,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4388190.0, ans=0.0 2024-08-19 08:54:57,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4388190.0, ans=0.125 2024-08-19 08:54:58,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4388290.0, ans=0.0 2024-08-19 08:54:59,157 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.796e+01 2.272e+01 2.455e+01 2.713e+01 3.409e+01, threshold=4.910e+01, percent-clipped=0.0 2024-08-19 08:55:00,923 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 16 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-19 08:55:04,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4388290.0, ans=0.035 2024-08-19 08:55:22,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4388390.0, ans=0.1 2024-08-19 08:55:26,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4388390.0, ans=0.04949747468305833 2024-08-19 08:55:29,316 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 8750, loss[loss=0.09128, beats_loss=0.01171, ecapa_loss=0.0001208, whisper_loss=0.07836, over 15127.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01036, ecapa_loss=0.0001408, whisper_loss=0.09048, over 3863725.75 frames. ], batch size: 60, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:55:30,322 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.49 vs. limit=12.0 2024-08-19 08:55:51,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4388590.0, ans=0.125 2024-08-19 08:55:55,932 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 15 from LS+wenet, 22 from Vox, 16 fro AS 2024-08-19 08:55:58,213 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.52 vs. limit=12.0 2024-08-19 08:56:44,314 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 8800, loss[loss=0.1252, beats_loss=0.01056, ecapa_loss=0.000154, whisper_loss=0.1131, over 23224.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0105, ecapa_loss=0.000139, whisper_loss=0.09044, over 3893517.82 frames. ], batch size: 91, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:57:06,467 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-19 08:57:07,395 WARNING [optim.py:496] (0/4) Scaling gradients by 0.08632224053144455, model_norm_threshold=49.099090576171875 2024-08-19 08:57:07,576 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.22, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.037e+04, grad_sumsq=6.733e+06, orig_rms_sq=1.045e-02 2024-08-19 08:57:07,904 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 08:57:10,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4389090.0, ans=0.125 2024-08-19 08:57:10,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4389090.0, ans=0.2 2024-08-19 08:57:11,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4389190.0, ans=0.0 2024-08-19 08:57:21,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4389190.0, ans=0.125 2024-08-19 08:57:27,449 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.642e+01 2.295e+01 2.616e+01 2.854e+01 5.688e+02, threshold=5.231e+01, percent-clipped=2.0 2024-08-19 08:57:57,261 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 8850, loss[loss=0.101, beats_loss=0.01045, ecapa_loss=0.000123, whisper_loss=0.08935, over 22068.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01049, ecapa_loss=0.0001403, whisper_loss=0.09021, over 3897579.09 frames. ], batch size: 86, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:57:59,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4389490.0, ans=0.0 2024-08-19 08:58:03,605 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.04 vs. limit=15.0 2024-08-19 08:58:19,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4389590.0, ans=0.125 2024-08-19 08:58:19,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4389590.0, ans=0.125 2024-08-19 08:58:30,104 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.76 vs. limit=15.0 2024-08-19 08:58:33,128 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-19 08:58:42,414 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.89 vs. limit=15.0 2024-08-19 08:59:02,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4389890.0, ans=0.125 2024-08-19 08:59:06,429 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 15 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-19 08:59:10,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4389890.0, ans=0.125 2024-08-19 08:59:12,915 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 8900, loss[loss=0.08797, beats_loss=0.01155, ecapa_loss=0.0001245, whisper_loss=0.07517, over 18014.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01051, ecapa_loss=0.0001409, whisper_loss=0.0898, over 3866853.85 frames. ], batch size: 71, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:59:27,441 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-19 08:59:28,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=4390090.0, ans=0.05 2024-08-19 08:59:32,083 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.13 vs. limit=12.0 2024-08-19 08:59:37,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4390090.0, ans=0.125 2024-08-19 08:59:38,675 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 15 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-19 08:59:42,322 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 08:59:42,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4390190.0, ans=0.0 2024-08-19 08:59:47,709 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-19 08:59:55,729 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.89 vs. limit=12.0 2024-08-19 08:59:56,108 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.234e+01 2.571e+01 2.897e+01 3.544e+02, threshold=5.141e+01, percent-clipped=1.0 2024-08-19 08:59:56,247 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 30 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-19 09:00:14,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4390390.0, ans=0.125 2024-08-19 09:00:24,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4390490.0, ans=0.2 2024-08-19 09:00:25,796 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 8950, loss[loss=0.0778, beats_loss=0.01436, ecapa_loss=9.632e-05, whisper_loss=0.06249, over 17067.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01053, ecapa_loss=0.0001404, whisper_loss=0.08975, over 3846197.22 frames. ], batch size: 67, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:00:29,871 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.94 vs. limit=5.0 2024-08-19 09:00:44,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4390590.0, ans=0.1 2024-08-19 09:01:20,555 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 28 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-19 09:01:24,632 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-19 09:01:29,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4390890.0, ans=0.125 2024-08-19 09:01:36,618 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 9000, loss[loss=0.11, beats_loss=0.01207, ecapa_loss=0.0001025, whisper_loss=0.09687, over 22740.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01054, ecapa_loss=0.0001407, whisper_loss=0.08916, over 3838636.59 frames. ], batch size: 87, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:01:36,619 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-19 09:02:14,343 INFO [train_multi_KD3.py:1149] (0/4) Epoch 30, validation on ASR_libri: loss=0.2532, beats_loss=0, ecapa_loss=0.0005125, whisper_loss=0.2481, over 922467.00 frames. 2024-08-19 09:02:33,160 INFO [train_multi_KD3.py:1149] (0/4) Epoch 30, validation on SV_voxceleb1: loss=0.003997, beats_loss=0, ecapa_loss=0.0003997, whisper_loss=0, over 939242.00 frames. 2024-08-19 09:04:17,132 INFO [train_multi_KD3.py:1149] (0/4) Epoch 30, validation on AT_audioset: loss=0.02307, beats_loss=0.02307, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 09:04:17,137 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32389MB 2024-08-19 09:04:28,659 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 21 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-19 09:04:34,324 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 17 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-19 09:04:44,411 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.91 vs. limit=12.0 2024-08-19 09:05:01,278 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.350e+01 2.627e+01 2.965e+01 6.113e+01, threshold=5.254e+01, percent-clipped=2.0 2024-08-19 09:05:29,170 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.725e+05 2024-08-19 09:05:33,573 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 9050, loss[loss=0.08943, beats_loss=0.009944, ecapa_loss=0.0001711, whisper_loss=0.07777, over 18911.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01049, ecapa_loss=0.000141, whisper_loss=0.09023, over 3856284.89 frames. ], batch size: 80, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:05:50,720 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-19 09:05:50,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4391590.0, ans=0.2 2024-08-19 09:06:15,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4391690.0, ans=0.125 2024-08-19 09:06:15,150 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4391690.0, ans=0.0 2024-08-19 09:06:29,113 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.28 vs. limit=10.0 2024-08-19 09:06:30,291 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 17 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-19 09:06:54,197 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 9100, loss[loss=0.09736, beats_loss=0.009885, ecapa_loss=0.0001379, whisper_loss=0.0861, over 19637.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0105, ecapa_loss=0.0001415, whisper_loss=0.08995, over 3861437.51 frames. ], batch size: 76, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:06:56,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4391990.0, ans=0.0 2024-08-19 09:07:18,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4392090.0, ans=0.95 2024-08-19 09:07:42,487 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.265e+01 2.525e+01 2.711e+01 1.200e+02, threshold=5.050e+01, percent-clipped=1.0 2024-08-19 09:07:43,174 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.05 vs. limit=15.0 2024-08-19 09:07:58,736 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-19 09:08:08,199 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 34 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-19 09:08:13,355 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 9150, loss[loss=0.1307, beats_loss=0.006829, ecapa_loss=0.0001597, whisper_loss=0.1223, over 18803.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0104, ecapa_loss=0.0001411, whisper_loss=0.09062, over 3882461.97 frames. ], batch size: 73, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:08:30,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4392590.0, ans=0.125 2024-08-19 09:08:35,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4392590.0, ans=0.1 2024-08-19 09:09:01,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4392790.0, ans=0.2 2024-08-19 09:09:07,409 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 34 from Vox, 24 fro AS 2024-08-19 09:09:14,836 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 39 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-19 09:09:17,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4392890.0, ans=0.0 2024-08-19 09:09:25,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4392890.0, ans=0.1 2024-08-19 09:09:28,368 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 9200, loss[loss=0.1058, beats_loss=0.01217, ecapa_loss=0.000117, whisper_loss=0.09249, over 16935.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01048, ecapa_loss=0.0001408, whisper_loss=0.09038, over 3911269.50 frames. ], batch size: 64, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:09:30,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4392990.0, ans=0.0 2024-08-19 09:10:02,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4393190.0, ans=0.09899494936611666 2024-08-19 09:10:07,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4393190.0, ans=0.0 2024-08-19 09:10:11,935 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.633e+01 2.355e+01 2.575e+01 2.865e+01 1.533e+02, threshold=5.149e+01, percent-clipped=1.0 2024-08-19 09:10:31,084 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 20 from LS+wenet, 14 from Vox, 44 fro AS 2024-08-19 09:10:32,264 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 20 from LS+wenet, 32 from Vox, 35 fro AS 2024-08-19 09:10:36,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4393390.0, ans=0.0 2024-08-19 09:10:39,004 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.167e-02 2024-08-19 09:10:41,567 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 9250, loss[loss=0.08186, beats_loss=0.0125, ecapa_loss=0.0001695, whisper_loss=0.06766, over 20714.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01055, ecapa_loss=0.0001409, whisper_loss=0.08957, over 3897673.58 frames. ], batch size: 88, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:10:41,756 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-19 09:10:42,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4393490.0, ans=0.0 2024-08-19 09:10:48,461 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.630e+01 2024-08-19 09:10:53,926 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.08 vs. limit=12.0 2024-08-19 09:10:54,772 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-19 09:10:57,825 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-19 09:11:17,969 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 20 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-19 09:11:18,838 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.32 vs. limit=15.0 2024-08-19 09:11:30,181 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 27 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-19 09:11:34,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4393790.0, ans=0.1 2024-08-19 09:11:41,587 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.52 vs. limit=15.0 2024-08-19 09:11:44,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4393890.0, ans=0.2 2024-08-19 09:11:45,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4393890.0, ans=0.125 2024-08-19 09:11:46,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4393890.0, ans=0.125 2024-08-19 09:11:52,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4393890.0, ans=0.125 2024-08-19 09:11:55,439 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 9300, loss[loss=0.1145, beats_loss=0.009097, ecapa_loss=0.00014, whisper_loss=0.104, over 20627.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01057, ecapa_loss=0.0001395, whisper_loss=0.08977, over 3937527.88 frames. ], batch size: 78, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:12:17,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4394090.0, ans=0.125 2024-08-19 09:12:18,968 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.38 vs. limit=15.0 2024-08-19 09:12:30,838 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 20 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-19 09:12:31,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4394190.0, ans=0.0 2024-08-19 09:12:35,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4394190.0, ans=0.04949747468305833 2024-08-19 09:12:35,697 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.78 vs. limit=12.0 2024-08-19 09:12:37,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4394290.0, ans=0.125 2024-08-19 09:12:38,105 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.989e+01 2.432e+01 2.581e+01 2.836e+01 1.723e+02, threshold=5.163e+01, percent-clipped=2.0 2024-08-19 09:12:39,038 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.77 vs. limit=15.0 2024-08-19 09:12:51,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4394390.0, ans=0.125 2024-08-19 09:12:52,298 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-19 09:12:53,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4394390.0, ans=0.125 2024-08-19 09:12:53,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4394390.0, ans=0.1 2024-08-19 09:12:58,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4394390.0, ans=0.0 2024-08-19 09:12:59,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4394390.0, ans=0.125 2024-08-19 09:13:05,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4394490.0, ans=0.2 2024-08-19 09:13:06,077 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 9350, loss[loss=0.09838, beats_loss=0.01238, ecapa_loss=0.0001138, whisper_loss=0.08485, over 20151.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01057, ecapa_loss=0.0001396, whisper_loss=0.08975, over 3905818.36 frames. ], batch size: 77, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:13:17,293 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.202e-01 2024-08-19 09:13:27,623 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4394590.0, ans=0.2 2024-08-19 09:13:28,644 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-19 09:13:28,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4394590.0, ans=0.125 2024-08-19 09:13:36,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4394690.0, ans=0.0 2024-08-19 09:13:39,289 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-19 09:14:01,613 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.17 vs. limit=15.0 2024-08-19 09:14:03,971 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=5.098e+00 2024-08-19 09:14:06,228 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 17 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-19 09:14:11,519 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 14 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-19 09:14:12,792 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 26 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-19 09:14:13,915 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 9400, loss[loss=0.112, beats_loss=0.009542, ecapa_loss=0.000142, whisper_loss=0.1011, over 18632.00 frames. ], tot_loss[loss=0.101, beats_loss=0.0106, ecapa_loss=0.0001407, whisper_loss=0.08897, over 3893564.95 frames. ], batch size: 73, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:14:26,504 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 19 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-19 09:14:32,660 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 09:14:44,275 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4395190.0, ans=0.125 2024-08-19 09:14:47,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4395190.0, ans=0.125 2024-08-19 09:14:48,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4395190.0, ans=0.125 2024-08-19 09:14:51,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4395190.0, ans=0.125 2024-08-19 09:14:54,365 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.883e+01 2.333e+01 2.565e+01 2.846e+01 4.265e+02, threshold=5.130e+01, percent-clipped=2.0 2024-08-19 09:14:58,454 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-19 09:15:02,375 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-19 09:15:02,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4395290.0, ans=0.1 2024-08-19 09:15:09,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4395390.0, ans=0.125 2024-08-19 09:15:12,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4395390.0, ans=0.125 2024-08-19 09:15:19,871 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 9450, loss[loss=0.09402, beats_loss=0.01223, ecapa_loss=8.605e-05, whisper_loss=0.08093, over 21948.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01065, ecapa_loss=0.0001404, whisper_loss=0.08878, over 3893899.10 frames. ], batch size: 81, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:15:21,239 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 16 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-19 09:15:21,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4395490.0, ans=0.125 2024-08-19 09:15:38,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4395590.0, ans=0.0 2024-08-19 09:15:45,122 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 24 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-19 09:16:03,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4395790.0, ans=0.125 2024-08-19 09:16:07,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4395790.0, ans=0.125 2024-08-19 09:16:11,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4395890.0, ans=0.125 2024-08-19 09:16:13,947 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 18 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-19 09:16:15,613 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.49 vs. limit=15.0 2024-08-19 09:16:26,351 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 9500, loss[loss=0.1162, beats_loss=0.009693, ecapa_loss=0.0001357, whisper_loss=0.1051, over 17873.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01055, ecapa_loss=0.0001403, whisper_loss=0.08936, over 3892874.05 frames. ], batch size: 68, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:16:34,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4395990.0, ans=0.125 2024-08-19 09:16:59,529 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 17 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-19 09:17:06,175 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.770e+01 2.270e+01 2.577e+01 2.905e+01 4.057e+01, threshold=5.153e+01, percent-clipped=0.0 2024-08-19 09:17:11,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4396290.0, ans=0.125 2024-08-19 09:17:24,635 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.85 vs. limit=22.5 2024-08-19 09:17:32,746 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 9550, loss[loss=0.08924, beats_loss=0.0112, ecapa_loss=0.0001282, whisper_loss=0.07676, over 16882.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01047, ecapa_loss=0.0001404, whisper_loss=0.08972, over 3880481.47 frames. ], batch size: 71, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:17:34,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4396490.0, ans=0.0 2024-08-19 09:17:54,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4396590.0, ans=0.1 2024-08-19 09:18:22,964 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 24 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-19 09:18:24,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4396890.0, ans=0.125 2024-08-19 09:18:35,591 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 09:18:37,673 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 9600, loss[loss=0.08356, beats_loss=0.01116, ecapa_loss=0.0001595, whisper_loss=0.0708, over 14840.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01044, ecapa_loss=0.0001415, whisper_loss=0.08998, over 3882635.73 frames. ], batch size: 57, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:18:47,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4396990.0, ans=0.0 2024-08-19 09:18:59,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4397090.0, ans=0.09899494936611666 2024-08-19 09:19:07,861 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.23 vs. limit=15.0 2024-08-19 09:19:12,741 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-19 09:19:17,723 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.332e+01 2.537e+01 2.796e+01 5.515e+01, threshold=5.073e+01, percent-clipped=1.0 2024-08-19 09:19:26,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4397290.0, ans=0.125 2024-08-19 09:19:27,889 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.81 vs. limit=10.0 2024-08-19 09:19:47,043 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 9650, loss[loss=0.1045, beats_loss=0.01063, ecapa_loss=0.0001249, whisper_loss=0.09259, over 20333.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01045, ecapa_loss=0.000143, whisper_loss=0.08928, over 3877309.24 frames. ], batch size: 80, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:19:54,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4397490.0, ans=0.0 2024-08-19 09:19:55,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4397490.0, ans=0.125 2024-08-19 09:20:00,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4397590.0, ans=0.125 2024-08-19 09:20:06,913 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 18 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-19 09:20:28,107 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-19 09:20:33,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4397790.0, ans=0.125 2024-08-19 09:20:49,120 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.88 vs. limit=15.0 2024-08-19 09:20:57,297 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 9700, loss[loss=0.1037, beats_loss=0.01126, ecapa_loss=0.0001205, whisper_loss=0.09126, over 20245.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01044, ecapa_loss=0.0001436, whisper_loss=0.08898, over 3831504.83 frames. ], batch size: 78, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:21:37,819 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.414e+01 2.657e+01 3.096e+01 1.946e+02, threshold=5.314e+01, percent-clipped=1.0 2024-08-19 09:21:47,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4398290.0, ans=0.125 2024-08-19 09:21:48,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4398290.0, ans=0.125 2024-08-19 09:21:50,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4398390.0, ans=0.0 2024-08-19 09:22:04,340 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 9750, loss[loss=0.09523, beats_loss=0.01075, ecapa_loss=0.000174, whisper_loss=0.08275, over 22327.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01058, ecapa_loss=0.0001431, whisper_loss=0.08807, over 3830428.80 frames. ], batch size: 95, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:22:04,454 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 24 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-19 09:22:15,911 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 16 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-19 09:22:22,550 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-19 09:22:44,573 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 20 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-19 09:22:58,572 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 26 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-19 09:23:01,076 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 28 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-19 09:23:08,528 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 9800, loss[loss=0.09798, beats_loss=0.01129, ecapa_loss=0.0001232, whisper_loss=0.08545, over 22133.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01056, ecapa_loss=0.0001428, whisper_loss=0.08858, over 3838077.96 frames. ], batch size: 89, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:23:15,556 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4398990.0, ans=0.125 2024-08-19 09:23:19,692 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.76 vs. limit=12.0 2024-08-19 09:23:47,998 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.301e+01 2.526e+01 2.757e+01 3.952e+01, threshold=5.052e+01, percent-clipped=0.0 2024-08-19 09:23:50,653 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 18 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-19 09:23:55,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4399290.0, ans=0.0 2024-08-19 09:23:57,140 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 25 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-19 09:23:58,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4399290.0, ans=0.0 2024-08-19 09:24:12,256 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 30 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-19 09:24:13,328 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 9850, loss[loss=0.1196, beats_loss=0.008651, ecapa_loss=0.0001317, whisper_loss=0.1096, over 20527.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01051, ecapa_loss=0.0001427, whisper_loss=0.08976, over 3851884.09 frames. ], batch size: 79, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:24:16,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4399490.0, ans=0.0 2024-08-19 09:24:29,006 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.78 vs. limit=15.0 2024-08-19 09:24:46,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4399690.0, ans=0.125 2024-08-19 09:25:03,581 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 25 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-19 09:25:06,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4399890.0, ans=0.0 2024-08-19 09:25:11,723 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2024-08-19 09:25:17,814 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-440000.pt 2024-08-19 09:25:20,291 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 9900, loss[loss=0.08264, beats_loss=0.01206, ecapa_loss=0.0001812, whisper_loss=0.06877, over 20459.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01059, ecapa_loss=0.0001411, whisper_loss=0.09001, over 3858939.85 frames. ], batch size: 90, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:25:27,307 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.01 vs. limit=15.0 2024-08-19 09:25:41,657 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 20 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-19 09:25:43,076 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 27 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-19 09:25:44,313 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 30 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-19 09:25:47,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4400190.0, ans=0.2 2024-08-19 09:25:52,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4400190.0, ans=0.0 2024-08-19 09:25:56,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4400190.0, ans=0.0 2024-08-19 09:25:58,861 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-19 09:25:59,919 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.262e+01 2.560e+01 2.824e+01 4.177e+01, threshold=5.120e+01, percent-clipped=0.0 2024-08-19 09:26:15,484 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-19 09:26:18,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4400390.0, ans=0.0 2024-08-19 09:26:24,505 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-19 09:26:25,645 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 9950, loss[loss=0.1101, beats_loss=0.01028, ecapa_loss=0.0001478, whisper_loss=0.0983, over 22019.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01062, ecapa_loss=0.0001419, whisper_loss=0.0899, over 3867644.85 frames. ], batch size: 88, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:26:36,221 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2024-08-19 09:26:43,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4400590.0, ans=0.125 2024-08-19 09:27:11,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4400790.0, ans=0.07 2024-08-19 09:27:17,428 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 17 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-19 09:27:21,115 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-19 09:27:32,800 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 10000, loss[loss=0.08662, beats_loss=0.01073, ecapa_loss=0.0001511, whisper_loss=0.07438, over 21591.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01061, ecapa_loss=0.000141, whisper_loss=0.08965, over 3870163.21 frames. ], batch size: 90, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:27:35,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4400990.0, ans=0.1 2024-08-19 09:28:07,274 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 24 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-19 09:28:09,618 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 16 from LS+wenet, 7 from Vox, 32 fro AS 2024-08-19 09:28:13,420 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.203e+01 2.418e+01 2.701e+01 3.828e+01, threshold=4.836e+01, percent-clipped=0.0 2024-08-19 09:28:21,127 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-19 09:28:21,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4401290.0, ans=0.2 2024-08-19 09:28:28,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4401390.0, ans=0.2 2024-08-19 09:28:40,076 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 10050, loss[loss=0.1064, beats_loss=0.00749, ecapa_loss=0.0001672, whisper_loss=0.0972, over 16987.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01056, ecapa_loss=0.0001397, whisper_loss=0.08967, over 3889112.90 frames. ], batch size: 69, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:29:31,419 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.70 vs. limit=15.0 2024-08-19 09:29:31,794 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-19 09:29:36,675 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 28 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-19 09:29:40,580 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 20 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-19 09:29:43,096 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-19 09:29:45,727 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 10100, loss[loss=0.1064, beats_loss=0.009464, ecapa_loss=0.000137, whisper_loss=0.09552, over 16764.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01047, ecapa_loss=0.0001403, whisper_loss=0.08998, over 3872169.66 frames. ], batch size: 66, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:29:52,952 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.55 vs. limit=22.5 2024-08-19 09:29:59,271 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.10 vs. limit=6.0 2024-08-19 09:30:00,894 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.24 vs. limit=22.5 2024-08-19 09:30:14,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4402190.0, ans=0.125 2024-08-19 09:30:26,348 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.34 vs. limit=15.0 2024-08-19 09:30:27,053 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.294e+01 2.554e+01 2.790e+01 3.607e+01, threshold=5.108e+01, percent-clipped=0.0 2024-08-19 09:30:31,013 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 22 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-19 09:30:41,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4402290.0, ans=0.125 2024-08-19 09:30:58,394 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 10150, loss[loss=0.09501, beats_loss=0.01249, ecapa_loss=9.928e-05, whisper_loss=0.08153, over 18691.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01044, ecapa_loss=0.0001408, whisper_loss=0.08995, over 3859684.02 frames. ], batch size: 71, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:30:58,595 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-19 09:31:03,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4402490.0, ans=0.1 2024-08-19 09:31:09,703 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.54 vs. limit=12.0 2024-08-19 09:31:15,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4402590.0, ans=0.125 2024-08-19 09:31:21,678 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 19 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-19 09:31:26,073 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.42 vs. limit=15.0 2024-08-19 09:31:36,615 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-19 09:31:58,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4402890.0, ans=0.125 2024-08-19 09:32:05,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4402890.0, ans=0.125 2024-08-19 09:32:10,066 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.23 vs. limit=10.0 2024-08-19 09:32:12,461 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 10200, loss[loss=0.101, beats_loss=0.009989, ecapa_loss=0.0001471, whisper_loss=0.08958, over 16566.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01041, ecapa_loss=0.0001415, whisper_loss=0.08933, over 3862631.19 frames. ], batch size: 64, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:32:17,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4402990.0, ans=0.0 2024-08-19 09:32:17,611 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.51 vs. limit=22.5 2024-08-19 09:32:26,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4403090.0, ans=0.125 2024-08-19 09:32:44,480 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4403190.0, ans=0.125 2024-08-19 09:32:56,924 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.315e+01 2.558e+01 2.832e+01 4.132e+01, threshold=5.117e+01, percent-clipped=0.0 2024-08-19 09:33:01,471 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 32 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-19 09:33:25,364 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 10250, loss[loss=0.09832, beats_loss=0.01016, ecapa_loss=0.0001711, whisper_loss=0.08645, over 17791.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01043, ecapa_loss=0.0001413, whisper_loss=0.08971, over 3893624.31 frames. ], batch size: 70, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:33:28,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=4403490.0, ans=22.5 2024-08-19 09:33:32,618 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 27 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-19 09:33:43,235 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.51 vs. limit=15.0 2024-08-19 09:33:47,868 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-19 09:33:52,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4403590.0, ans=0.2 2024-08-19 09:34:06,052 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 09:34:31,259 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 17 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-19 09:34:50,454 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 10300, loss[loss=0.09045, beats_loss=0.01037, ecapa_loss=0.0001408, whisper_loss=0.07867, over 14556.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01035, ecapa_loss=0.0001413, whisper_loss=0.08998, over 3874293.89 frames. ], batch size: 59, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:34:52,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4403990.0, ans=0.1 2024-08-19 09:35:10,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4404090.0, ans=0.125 2024-08-19 09:35:17,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4404090.0, ans=0.125 2024-08-19 09:35:25,777 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-19 09:35:39,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4404190.0, ans=0.07 2024-08-19 09:35:41,625 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.30 vs. limit=22.5 2024-08-19 09:35:42,353 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.424e+01 2.703e+01 3.011e+01 5.965e+01, threshold=5.405e+01, percent-clipped=1.0 2024-08-19 09:36:09,193 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 21 from LS+wenet, 24 from Vox, 47 fro AS 2024-08-19 09:36:20,097 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 10350, loss[loss=0.09792, beats_loss=0.009701, ecapa_loss=0.0001521, whisper_loss=0.0867, over 22045.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01035, ecapa_loss=0.0001419, whisper_loss=0.09073, over 3896966.99 frames. ], batch size: 89, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:36:20,226 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-19 09:36:35,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4404590.0, ans=0.125 2024-08-19 09:36:35,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4404590.0, ans=0.125 2024-08-19 09:36:37,532 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 23 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-19 09:37:19,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4404790.0, ans=0.1 2024-08-19 09:37:41,866 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 39 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-19 09:37:51,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4404990.0, ans=0.2 2024-08-19 09:37:52,731 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 10400, loss[loss=0.09323, beats_loss=0.012, ecapa_loss=0.0001459, whisper_loss=0.07977, over 19019.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01033, ecapa_loss=0.0001415, whisper_loss=0.09114, over 3901115.55 frames. ], batch size: 80, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:37:53,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4404990.0, ans=0.5 2024-08-19 09:37:54,852 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 26 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-19 09:37:59,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4404990.0, ans=0.1 2024-08-19 09:38:13,997 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.98 vs. limit=15.0 2024-08-19 09:38:15,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4405090.0, ans=0.0 2024-08-19 09:38:18,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4405090.0, ans=0.125 2024-08-19 09:38:41,984 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.65 vs. limit=10.0 2024-08-19 09:38:47,001 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.316e+01 2.550e+01 2.840e+01 5.090e+01, threshold=5.101e+01, percent-clipped=0.0 2024-08-19 09:38:47,210 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-19 09:38:50,157 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.12 vs. limit=15.0 2024-08-19 09:39:00,981 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.02 vs. limit=15.0 2024-08-19 09:39:13,772 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 10450, loss[loss=0.1299, beats_loss=0.007198, ecapa_loss=0.0001592, whisper_loss=0.1211, over 19220.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01029, ecapa_loss=0.0001407, whisper_loss=0.09156, over 3906565.78 frames. ], batch size: 74, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:39:18,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4405490.0, ans=0.2 2024-08-19 09:39:27,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4405590.0, ans=0.125 2024-08-19 09:39:31,559 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 34 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-19 09:39:51,925 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 19 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-19 09:39:53,256 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-19 09:40:04,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4405790.0, ans=0.0 2024-08-19 09:40:07,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4405790.0, ans=0.125 2024-08-19 09:40:08,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4405890.0, ans=0.2 2024-08-19 09:40:09,503 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 14 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-19 09:40:23,612 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 10500, loss[loss=0.1125, beats_loss=0.009478, ecapa_loss=0.0001312, whisper_loss=0.1017, over 20488.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01032, ecapa_loss=0.0001403, whisper_loss=0.09134, over 3901425.46 frames. ], batch size: 80, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:40:34,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4405990.0, ans=0.125 2024-08-19 09:40:35,842 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.56 vs. limit=15.0 2024-08-19 09:41:02,927 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.686e+01 2.206e+01 2.360e+01 2.626e+01 1.632e+02, threshold=4.720e+01, percent-clipped=1.0 2024-08-19 09:41:03,125 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 22 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-19 09:41:20,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4406390.0, ans=0.1 2024-08-19 09:41:26,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4406390.0, ans=0.125 2024-08-19 09:41:29,355 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 10550, loss[loss=0.09263, beats_loss=0.0107, ecapa_loss=0.0001618, whisper_loss=0.08031, over 17986.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01035, ecapa_loss=0.000141, whisper_loss=0.09039, over 3855965.15 frames. ], batch size: 73, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:41:29,598 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-19 09:41:35,135 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 25 from LS+wenet, 12 from Vox, 38 fro AS 2024-08-19 09:41:47,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4406590.0, ans=0.1 2024-08-19 09:42:15,633 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 25 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-19 09:42:20,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4406790.0, ans=0.125 2024-08-19 09:42:20,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4406790.0, ans=0.2 2024-08-19 09:42:39,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4406990.0, ans=0.2 2024-08-19 09:42:40,603 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 10600, loss[loss=0.09877, beats_loss=0.01135, ecapa_loss=0.0001413, whisper_loss=0.086, over 20364.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01037, ecapa_loss=0.0001406, whisper_loss=0.09052, over 3856088.75 frames. ], batch size: 82, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:42:53,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4407090.0, ans=0.1 2024-08-19 09:43:09,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4407190.0, ans=0.0 2024-08-19 09:43:13,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4407190.0, ans=0.0 2024-08-19 09:43:14,868 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-19 09:43:22,875 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.325e+01 2.532e+01 2.911e+01 7.295e+01, threshold=5.064e+01, percent-clipped=1.0 2024-08-19 09:43:29,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4407290.0, ans=0.0 2024-08-19 09:43:31,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4407290.0, ans=0.2 2024-08-19 09:43:48,860 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.88 vs. limit=10.0 2024-08-19 09:43:50,723 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 10650, loss[loss=0.1281, beats_loss=0.008753, ecapa_loss=0.0001327, whisper_loss=0.118, over 17288.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01034, ecapa_loss=0.0001412, whisper_loss=0.09047, over 3852515.91 frames. ], batch size: 66, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:44:04,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4407590.0, ans=0.125 2024-08-19 09:44:21,707 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 16 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-19 09:44:22,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=4407690.0, ans=15.0 2024-08-19 09:44:38,840 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 19 from LS+wenet, 19 from Vox, 51 fro AS 2024-08-19 09:44:51,470 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-19 09:45:02,816 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 10700, loss[loss=0.09534, beats_loss=0.008251, ecapa_loss=0.0001264, whisper_loss=0.08583, over 17387.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01031, ecapa_loss=0.000141, whisper_loss=0.09148, over 3864467.75 frames. ], batch size: 65, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:45:04,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4407990.0, ans=0.0 2024-08-19 09:45:14,569 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.043e-01 2024-08-19 09:45:26,146 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-19 09:45:43,429 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.323e+01 2.559e+01 2.783e+01 4.084e+01, threshold=5.118e+01, percent-clipped=0.0 2024-08-19 09:45:46,203 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 14 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-19 09:45:51,327 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 19 from LS+wenet, 18 from Vox, 51 fro AS 2024-08-19 09:45:53,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4408290.0, ans=0.0 2024-08-19 09:46:05,503 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.66 vs. limit=10.0 2024-08-19 09:46:09,982 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 10750, loss[loss=0.09019, beats_loss=0.01118, ecapa_loss=0.0001249, whisper_loss=0.07776, over 18494.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01037, ecapa_loss=0.00014, whisper_loss=0.09187, over 3871972.57 frames. ], batch size: 72, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:46:18,040 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-19 09:46:23,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4408590.0, ans=0.1 2024-08-19 09:46:39,696 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.19 vs. limit=15.0 2024-08-19 09:47:00,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4408890.0, ans=0.125 2024-08-19 09:47:05,299 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-19 09:47:14,127 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 10800, loss[loss=0.1057, beats_loss=0.009756, ecapa_loss=0.000156, whisper_loss=0.09442, over 20184.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01039, ecapa_loss=0.0001399, whisper_loss=0.09214, over 3872206.07 frames. ], batch size: 84, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:47:14,308 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 26 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-19 09:47:36,441 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.29 vs. limit=15.0 2024-08-19 09:47:44,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4409190.0, ans=0.125 2024-08-19 09:47:51,576 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.386e+01 2.632e+01 3.001e+01 4.725e+01, threshold=5.265e+01, percent-clipped=0.0 2024-08-19 09:48:08,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=4409390.0, ans=0.5 2024-08-19 09:48:17,660 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 10850, loss[loss=0.1062, beats_loss=0.01077, ecapa_loss=0.0001184, whisper_loss=0.09424, over 23724.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0104, ecapa_loss=0.0001407, whisper_loss=0.09188, over 3874364.79 frames. ], batch size: 91, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:48:30,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4409590.0, ans=0.0 2024-08-19 09:48:31,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4409590.0, ans=0.125 2024-08-19 09:48:31,439 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.79 vs. limit=15.0 2024-08-19 09:48:53,702 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 12 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-19 09:49:21,479 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 10900, loss[loss=0.1419, beats_loss=0.009458, ecapa_loss=0.0001093, whisper_loss=0.1313, over 21629.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0104, ecapa_loss=0.0001413, whisper_loss=0.09147, over 3893993.15 frames. ], batch size: 77, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:49:24,134 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 27 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-19 09:49:26,760 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 27 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-19 09:49:32,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4409990.0, ans=0.125 2024-08-19 09:49:34,127 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 23 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-19 09:49:53,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4410190.0, ans=0.125 2024-08-19 09:49:57,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=4410190.0, ans=6.0 2024-08-19 09:49:59,523 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.400e+01 2.615e+01 2.977e+01 1.064e+02, threshold=5.230e+01, percent-clipped=2.0 2024-08-19 09:49:59,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=4410290.0, ans=0.1 2024-08-19 09:50:03,573 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 09:50:12,482 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 19 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-19 09:50:14,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4410390.0, ans=0.0 2024-08-19 09:50:20,642 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.41 vs. limit=10.0 2024-08-19 09:50:21,455 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 26 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-19 09:50:25,141 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 10950, loss[loss=0.0797, beats_loss=0.01069, ecapa_loss=0.0001311, whisper_loss=0.0677, over 22532.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01045, ecapa_loss=0.0001408, whisper_loss=0.09058, over 3906224.22 frames. ], batch size: 92, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:50:25,986 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2024-08-19 09:50:36,261 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-19 09:50:40,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4410590.0, ans=0.1 2024-08-19 09:50:49,398 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 09:51:03,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4410790.0, ans=0.125 2024-08-19 09:51:12,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4410790.0, ans=0.95 2024-08-19 09:51:30,080 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 11000, loss[loss=0.1087, beats_loss=0.008309, ecapa_loss=0.0001492, whisper_loss=0.09889, over 16170.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01043, ecapa_loss=0.0001412, whisper_loss=0.09092, over 3904329.56 frames. ], batch size: 64, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:51:36,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4410990.0, ans=0.0 2024-08-19 09:51:38,719 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.92 vs. limit=8.0 2024-08-19 09:51:45,561 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-19 09:51:46,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4411090.0, ans=0.0 2024-08-19 09:51:52,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4411090.0, ans=0.0 2024-08-19 09:51:55,838 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 22 from LS+wenet, 32 from Vox, 37 fro AS 2024-08-19 09:52:00,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4411190.0, ans=0.1 2024-08-19 09:52:12,346 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.368e+01 2.497e+01 2.768e+01 3.279e+02, threshold=4.993e+01, percent-clipped=1.0 2024-08-19 09:52:12,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4411290.0, ans=0.0 2024-08-19 09:52:16,496 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 24 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-19 09:52:36,437 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 16 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-19 09:52:38,977 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 11050, loss[loss=0.1088, beats_loss=0.0113, ecapa_loss=0.0001332, whisper_loss=0.0962, over 22525.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01044, ecapa_loss=0.0001417, whisper_loss=0.0906, over 3891617.61 frames. ], batch size: 93, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:53:10,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4411690.0, ans=0.2 2024-08-19 09:53:26,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4411790.0, ans=0.125 2024-08-19 09:53:29,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4411790.0, ans=0.0 2024-08-19 09:53:30,673 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.936e+01 2024-08-19 09:53:38,232 WARNING [optim.py:496] (0/4) Scaling gradients by 0.06586580723524094, model_norm_threshold=49.93263626098633 2024-08-19 09:53:38,408 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.23, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.337e+05, grad_sumsq=1.337e+05, orig_rms_sq=1.000e+00 2024-08-19 09:53:41,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4411890.0, ans=0.125 2024-08-19 09:53:45,452 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 11100, loss[loss=0.1002, beats_loss=0.01049, ecapa_loss=0.000153, whisper_loss=0.08814, over 21989.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01043, ecapa_loss=0.0001422, whisper_loss=0.09071, over 3917707.43 frames. ], batch size: 88, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:53:56,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4411990.0, ans=0.0 2024-08-19 09:54:22,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4412190.0, ans=0.1 2024-08-19 09:54:24,686 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 14 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-19 09:54:28,092 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.38 vs. limit=15.0 2024-08-19 09:54:28,472 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.940e+01 2.398e+01 2.719e+01 3.101e+01 7.581e+02, threshold=5.438e+01, percent-clipped=4.0 2024-08-19 09:54:34,987 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 19 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-19 09:54:48,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4412390.0, ans=0.2 2024-08-19 09:54:58,146 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 11150, loss[loss=0.08531, beats_loss=0.01215, ecapa_loss=0.0001191, whisper_loss=0.07197, over 14695.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01046, ecapa_loss=0.0001414, whisper_loss=0.09116, over 3912867.68 frames. ], batch size: 59, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:55:14,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4412590.0, ans=0.07 2024-08-19 09:55:52,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4412790.0, ans=0.125 2024-08-19 09:56:06,027 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-19 09:56:09,788 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 11200, loss[loss=0.1027, beats_loss=0.009417, ecapa_loss=0.0001267, whisper_loss=0.09205, over 23494.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01048, ecapa_loss=0.0001416, whisper_loss=0.09066, over 3893832.34 frames. ], batch size: 91, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:56:12,456 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-19 09:56:20,863 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.86 vs. limit=22.5 2024-08-19 09:56:21,755 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-19 09:56:23,365 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 26 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-19 09:56:26,110 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 17 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-19 09:56:26,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4413090.0, ans=0.125 2024-08-19 09:56:50,879 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.289e+01 2.521e+01 2.778e+01 3.744e+01, threshold=5.043e+01, percent-clipped=0.0 2024-08-19 09:56:58,906 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 09:57:20,486 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 11250, loss[loss=0.1018, beats_loss=0.009813, ecapa_loss=0.0001125, whisper_loss=0.0909, over 21531.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01053, ecapa_loss=0.0001409, whisper_loss=0.09021, over 3882710.87 frames. ], batch size: 80, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:57:29,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4413490.0, ans=0.1 2024-08-19 09:57:35,826 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=15.0 2024-08-19 09:57:46,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4413590.0, ans=0.125 2024-08-19 09:58:01,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=4413790.0, ans=0.5 2024-08-19 09:58:14,422 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-19 09:58:28,808 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 11300, loss[loss=0.09654, beats_loss=0.01112, ecapa_loss=0.0001389, whisper_loss=0.08403, over 22696.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01039, ecapa_loss=0.0001406, whisper_loss=0.09117, over 3925090.55 frames. ], batch size: 90, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:58:45,186 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 22 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-19 09:58:46,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4414090.0, ans=0.125 2024-08-19 09:58:49,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4414090.0, ans=0.0 2024-08-19 09:58:55,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4414190.0, ans=0.05 2024-08-19 09:58:56,330 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 27 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-19 09:59:02,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4414190.0, ans=0.0 2024-08-19 09:59:03,337 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=19.20 vs. limit=22.5 2024-08-19 09:59:08,501 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.992e+01 2.325e+01 2.531e+01 2.748e+01 4.040e+01, threshold=5.062e+01, percent-clipped=0.0 2024-08-19 09:59:15,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4414290.0, ans=0.125 2024-08-19 09:59:24,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4414390.0, ans=0.125 2024-08-19 09:59:33,990 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.50 vs. limit=15.0 2024-08-19 09:59:34,836 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 11350, loss[loss=0.1139, beats_loss=0.008773, ecapa_loss=0.0001389, whisper_loss=0.1038, over 19499.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01033, ecapa_loss=0.0001409, whisper_loss=0.09177, over 3917137.98 frames. ], batch size: 74, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:59:35,017 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 14 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-19 09:59:36,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4414490.0, ans=0.0 2024-08-19 09:59:46,388 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 22 from LS+wenet, 17 from Vox, 17 fro AS 2024-08-19 09:59:48,847 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 25 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-19 10:00:01,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=4414690.0, ans=0.5 2024-08-19 10:00:19,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4414790.0, ans=0.2 2024-08-19 10:00:19,757 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.94 vs. limit=15.0 2024-08-19 10:00:21,776 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 27 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-19 10:00:29,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4414890.0, ans=0.09899494936611666 2024-08-19 10:00:37,604 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 11400, loss[loss=0.1074, beats_loss=0.01093, ecapa_loss=0.0001319, whisper_loss=0.09513, over 19558.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01031, ecapa_loss=0.0001414, whisper_loss=0.09195, over 3879491.00 frames. ], batch size: 76, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 10:00:38,930 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 18 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-19 10:00:45,365 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 18 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-19 10:00:46,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4414990.0, ans=0.0 2024-08-19 10:00:55,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4415090.0, ans=0.1 2024-08-19 10:01:04,198 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-19 10:01:07,680 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-19 10:01:15,086 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.359e+01 2.595e+01 2.973e+01 3.861e+01, threshold=5.190e+01, percent-clipped=0.0 2024-08-19 10:01:20,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4415290.0, ans=0.125 2024-08-19 10:01:20,814 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.05 vs. limit=10.0 2024-08-19 10:01:34,109 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.77 vs. limit=12.0 2024-08-19 10:01:36,675 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.36 vs. limit=15.0 2024-08-19 10:01:39,603 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 11450, loss[loss=0.1029, beats_loss=0.01025, ecapa_loss=0.0001451, whisper_loss=0.0912, over 17931.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01035, ecapa_loss=0.0001409, whisper_loss=0.09126, over 3894101.67 frames. ], batch size: 71, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 10:01:42,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4415490.0, ans=0.125 2024-08-19 10:01:44,057 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.53 vs. limit=15.0 2024-08-19 10:01:47,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4415490.0, ans=0.125 2024-08-19 10:01:49,928 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-19 10:01:51,801 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.77 vs. limit=15.0 2024-08-19 10:01:52,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4415590.0, ans=0.04949747468305833 2024-08-19 10:02:06,092 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 33 from Vox, 35 fro AS 2024-08-19 10:02:07,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4415690.0, ans=0.5 2024-08-19 10:02:14,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4415690.0, ans=0.0 2024-08-19 10:02:25,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4415790.0, ans=0.2 2024-08-19 10:02:26,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4415790.0, ans=0.0 2024-08-19 10:02:27,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4415790.0, ans=0.2 2024-08-19 10:02:33,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4415890.0, ans=0.1 2024-08-19 10:02:38,196 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 16 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-19 10:02:42,142 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 11500, loss[loss=0.1093, beats_loss=0.007095, ecapa_loss=0.0001339, whisper_loss=0.1009, over 15600.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01037, ecapa_loss=0.0001414, whisper_loss=0.09117, over 3880616.01 frames. ], batch size: 57, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 10:02:58,116 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 23 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-19 10:03:00,567 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 20 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-19 10:03:07,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4416190.0, ans=0.125 2024-08-19 10:03:07,671 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.82 vs. limit=15.0 2024-08-19 10:03:18,813 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.451e+01 2.614e+01 2.882e+01 3.813e+01, threshold=5.229e+01, percent-clipped=0.0 2024-08-19 10:03:22,508 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.75 vs. limit=15.0 2024-08-19 10:03:23,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4416290.0, ans=0.1 2024-08-19 10:03:31,696 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 40 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-19 10:03:31,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4416390.0, ans=0.1 2024-08-19 10:03:35,156 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 19 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-19 10:03:40,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4416390.0, ans=0.125 2024-08-19 10:03:41,552 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-19 10:03:43,773 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 11550, loss[loss=0.08923, beats_loss=0.009937, ecapa_loss=0.0001312, whisper_loss=0.07798, over 20895.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0104, ecapa_loss=0.000141, whisper_loss=0.09057, over 3895521.05 frames. ], batch size: 83, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 10:03:50,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4416490.0, ans=0.0 2024-08-19 10:03:51,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4416490.0, ans=0.125 2024-08-19 10:04:03,570 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 40 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-19 10:04:12,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4416690.0, ans=0.125 2024-08-19 10:04:21,721 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.84 vs. limit=15.0 2024-08-19 10:04:24,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4416790.0, ans=0.125 2024-08-19 10:04:31,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4416790.0, ans=0.0 2024-08-19 10:04:34,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4416890.0, ans=0.0 2024-08-19 10:04:38,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4416890.0, ans=0.125 2024-08-19 10:04:41,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4416890.0, ans=0.0 2024-08-19 10:04:42,657 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.51 vs. limit=10.0 2024-08-19 10:04:45,843 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 11600, loss[loss=0.0918, beats_loss=0.01166, ecapa_loss=0.0001528, whisper_loss=0.07861, over 13939.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01042, ecapa_loss=0.0001403, whisper_loss=0.09051, over 3915395.43 frames. ], batch size: 59, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 10:04:46,023 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-19 10:04:50,884 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-19 10:04:54,279 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-19 10:04:56,255 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.69 vs. limit=22.5 2024-08-19 10:05:04,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4417090.0, ans=0.125 2024-08-19 10:05:05,249 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 24 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-19 10:05:12,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4417190.0, ans=0.1 2024-08-19 10:05:12,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4417190.0, ans=0.1 2024-08-19 10:05:22,513 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.308e+01 2.624e+01 2.875e+01 7.013e+01, threshold=5.249e+01, percent-clipped=1.0 2024-08-19 10:05:24,301 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 22 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-19 10:05:25,375 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-19 10:05:32,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4417290.0, ans=0.125 2024-08-19 10:05:39,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4417390.0, ans=0.09899494936611666 2024-08-19 10:05:39,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4417390.0, ans=0.125 2024-08-19 10:05:42,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4417390.0, ans=0.2 2024-08-19 10:05:48,088 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 11650, loss[loss=0.09161, beats_loss=0.01134, ecapa_loss=0.000153, whisper_loss=0.07874, over 21806.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01043, ecapa_loss=0.0001407, whisper_loss=0.09049, over 3924475.49 frames. ], batch size: 92, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:05:50,740 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 29 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-19 10:05:58,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4417490.0, ans=0.125 2024-08-19 10:06:16,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4417690.0, ans=0.0 2024-08-19 10:06:22,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4417690.0, ans=0.0 2024-08-19 10:06:25,713 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-19 10:06:30,695 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 37 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-19 10:06:42,039 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 22 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-19 10:06:50,769 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 11700, loss[loss=0.1212, beats_loss=0.01129, ecapa_loss=0.0001488, whisper_loss=0.1084, over 21780.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01038, ecapa_loss=0.0001411, whisper_loss=0.09164, over 3932297.53 frames. ], batch size: 85, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:07:11,449 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 19 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-19 10:07:29,319 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.361e+01 2.643e+01 2.921e+01 4.842e+01, threshold=5.287e+01, percent-clipped=0.0 2024-08-19 10:07:36,938 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 22 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-19 10:07:52,564 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 11750, loss[loss=0.08706, beats_loss=0.01413, ecapa_loss=0.0001127, whisper_loss=0.0718, over 22272.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01049, ecapa_loss=0.000141, whisper_loss=0.09082, over 3944606.00 frames. ], batch size: 89, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:07:55,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4418490.0, ans=0.0 2024-08-19 10:07:58,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4418490.0, ans=0.0 2024-08-19 10:08:05,097 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 27 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-19 10:08:05,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4418590.0, ans=0.0 2024-08-19 10:08:07,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4418590.0, ans=0.125 2024-08-19 10:08:08,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4418590.0, ans=0.04949747468305833 2024-08-19 10:08:20,985 INFO [train_multi_KD3.py:844] (0/4) A total of 98 cuts. 28 from LS+wenet, 36 from Vox, 34 fro AS 2024-08-19 10:08:24,265 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.11 vs. limit=15.0 2024-08-19 10:08:27,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4418690.0, ans=0.2 2024-08-19 10:08:41,450 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.63 vs. limit=15.0 2024-08-19 10:08:51,731 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-19 10:08:53,206 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.04 vs. limit=15.0 2024-08-19 10:08:53,914 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 11800, loss[loss=0.08839, beats_loss=0.01317, ecapa_loss=0.000118, whisper_loss=0.07403, over 22836.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01057, ecapa_loss=0.0001402, whisper_loss=0.09053, over 3958757.84 frames. ], batch size: 93, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:09:06,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4419090.0, ans=10.0 2024-08-19 10:09:07,787 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 20 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-19 10:09:08,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4419090.0, ans=0.2 2024-08-19 10:09:20,292 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-19 10:09:22,737 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-19 10:09:23,996 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 17 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-19 10:09:29,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4419190.0, ans=0.125 2024-08-19 10:09:32,070 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.601e+01 2.354e+01 2.552e+01 2.696e+01 6.400e+01, threshold=5.104e+01, percent-clipped=1.0 2024-08-19 10:09:47,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4419390.0, ans=0.1 2024-08-19 10:09:49,573 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-19 10:09:55,732 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 11850, loss[loss=0.1032, beats_loss=0.01031, ecapa_loss=0.000156, whisper_loss=0.09134, over 21375.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01062, ecapa_loss=0.0001401, whisper_loss=0.08943, over 3957631.86 frames. ], batch size: 90, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:10:12,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4419590.0, ans=0.125 2024-08-19 10:10:13,155 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4419590.0, ans=0.1 2024-08-19 10:10:24,535 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 25 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-19 10:10:24,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4419690.0, ans=0.1 2024-08-19 10:10:28,265 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 15 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-19 10:10:43,436 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-19 10:10:45,834 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 25 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-19 10:10:58,553 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 11900, loss[loss=0.1093, beats_loss=0.01022, ecapa_loss=0.0001511, whisper_loss=0.09755, over 20895.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01058, ecapa_loss=0.0001405, whisper_loss=0.0901, over 3929931.67 frames. ], batch size: 84, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:10:59,756 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-19 10:11:01,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4419990.0, ans=0.0 2024-08-19 10:11:02,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4419990.0, ans=0.125 2024-08-19 10:11:02,964 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.73 vs. limit=6.0 2024-08-19 10:11:07,000 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 13 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-19 10:11:17,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4420090.0, ans=0.125 2024-08-19 10:11:23,321 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 28 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-19 10:11:36,738 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.653e+01 2.301e+01 2.576e+01 2.913e+01 4.968e+01, threshold=5.153e+01, percent-clipped=0.0 2024-08-19 10:11:38,677 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.68 vs. limit=12.0 2024-08-19 10:11:39,613 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-19 10:11:47,979 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 26 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-19 10:11:51,944 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-19 10:12:00,703 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 11950, loss[loss=0.08418, beats_loss=0.01174, ecapa_loss=0.0001232, whisper_loss=0.07122, over 20839.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01048, ecapa_loss=0.000142, whisper_loss=0.09048, over 3896748.65 frames. ], batch size: 84, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:12:03,360 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 33 from LS+wenet, 17 from Vox, 17 fro AS 2024-08-19 10:12:03,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4420490.0, ans=0.1 2024-08-19 10:12:09,037 WARNING [optim.py:496] (0/4) Scaling gradients by 0.09744685143232346, model_norm_threshold=51.527191162109375 2024-08-19 10:12:09,210 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.20, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.481e+04, grad_sumsq=5.241e+06, orig_rms_sq=1.046e-02 2024-08-19 10:12:10,789 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 22 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-19 10:12:13,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4420590.0, ans=0.125 2024-08-19 10:12:15,014 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.87 vs. limit=15.0 2024-08-19 10:12:15,906 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-19 10:12:18,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4420590.0, ans=0.09899494936611666 2024-08-19 10:12:23,480 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4420590.0, ans=0.125 2024-08-19 10:12:32,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4420690.0, ans=0.1 2024-08-19 10:12:34,678 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-19 10:12:54,145 WARNING [optim.py:496] (0/4) Scaling gradients by 0.08515588939189911, model_norm_threshold=51.527191162109375 2024-08-19 10:12:54,317 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.12, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.362e+04, grad_sumsq=4.362e+04, orig_rms_sq=1.000e+00 2024-08-19 10:12:54,445 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-19 10:13:02,911 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 12000, loss[loss=0.0746, beats_loss=0.01425, ecapa_loss=0.0001379, whisper_loss=0.05897, over 14176.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01048, ecapa_loss=0.000142, whisper_loss=0.09033, over 3867561.39 frames. ], batch size: 61, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:13:02,913 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-19 10:13:40,100 INFO [train_multi_KD3.py:1149] (0/4) Epoch 30, validation on ASR_libri: loss=0.2551, beats_loss=0, ecapa_loss=0.0005098, whisper_loss=0.25, over 922467.00 frames. 2024-08-19 10:13:57,455 INFO [train_multi_KD3.py:1149] (0/4) Epoch 30, validation on SV_voxceleb1: loss=0.003986, beats_loss=0, ecapa_loss=0.0003986, whisper_loss=0, over 939242.00 frames. 2024-08-19 10:15:43,654 INFO [train_multi_KD3.py:1149] (0/4) Epoch 30, validation on AT_audioset: loss=0.02302, beats_loss=0.02302, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 10:15:43,658 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32389MB 2024-08-19 10:15:49,726 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 24 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-19 10:15:52,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4420990.0, ans=0.125 2024-08-19 10:15:52,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4420990.0, ans=0.1 2024-08-19 10:15:56,240 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-19 10:16:14,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4421190.0, ans=0.125 2024-08-19 10:16:22,831 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.555e+01 2.285e+01 2.523e+01 2.857e+01 6.051e+02, threshold=5.046e+01, percent-clipped=3.0 2024-08-19 10:16:30,378 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 18 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-19 10:16:46,479 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 12050, loss[loss=0.1083, beats_loss=0.009874, ecapa_loss=0.0001421, whisper_loss=0.09704, over 20146.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0105, ecapa_loss=0.0001427, whisper_loss=0.09012, over 3863170.99 frames. ], batch size: 76, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:16:50,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4421490.0, ans=0.0 2024-08-19 10:16:58,366 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 27 from LS+wenet, 29 from Vox, 22 fro AS 2024-08-19 10:17:13,528 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-19 10:17:14,773 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-19 10:17:15,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4421690.0, ans=0.125 2024-08-19 10:17:16,191 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-19 10:17:20,432 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.80 vs. limit=10.0 2024-08-19 10:17:39,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4421890.0, ans=0.125 2024-08-19 10:17:49,874 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 12100, loss[loss=0.1151, beats_loss=0.01051, ecapa_loss=0.0001466, whisper_loss=0.1031, over 18814.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01047, ecapa_loss=0.0001425, whisper_loss=0.0901, over 3856905.40 frames. ], batch size: 76, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:17:50,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4421990.0, ans=0.125 2024-08-19 10:18:03,507 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 18 from LS+wenet, 32 from Vox, 38 fro AS 2024-08-19 10:18:04,262 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.67 vs. limit=10.0 2024-08-19 10:18:12,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4422090.0, ans=0.2 2024-08-19 10:18:19,152 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2024-08-19 10:18:28,091 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.327e+01 2.569e+01 2.945e+01 4.765e+01, threshold=5.137e+01, percent-clipped=0.0 2024-08-19 10:18:31,961 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-19 10:18:39,247 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.67 vs. limit=22.5 2024-08-19 10:18:41,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4422390.0, ans=0.125 2024-08-19 10:18:51,767 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 12150, loss[loss=0.1083, beats_loss=0.008465, ecapa_loss=0.0001674, whisper_loss=0.0982, over 22509.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01049, ecapa_loss=0.0001417, whisper_loss=0.08982, over 3862961.99 frames. ], batch size: 93, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:18:54,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4422490.0, ans=0.125 2024-08-19 10:18:56,221 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-19 10:19:01,247 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.43 vs. limit=15.0 2024-08-19 10:19:12,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4422590.0, ans=0.125 2024-08-19 10:19:29,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4422790.0, ans=0.0 2024-08-19 10:19:35,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4422790.0, ans=0.125 2024-08-19 10:19:41,253 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.85 vs. limit=15.0 2024-08-19 10:19:42,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4422890.0, ans=0.0 2024-08-19 10:19:47,551 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.94 vs. limit=22.5 2024-08-19 10:19:54,295 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 12200, loss[loss=0.1326, beats_loss=0.008035, ecapa_loss=0.0001487, whisper_loss=0.1231, over 22470.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01055, ecapa_loss=0.0001412, whisper_loss=0.08965, over 3858299.84 frames. ], batch size: 84, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:20:09,147 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-19 10:20:26,608 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 30 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-19 10:20:32,465 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.316e+01 2.610e+01 2.989e+01 7.361e+01, threshold=5.220e+01, percent-clipped=1.0 2024-08-19 10:20:53,516 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-19 10:20:56,002 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 12250, loss[loss=0.08342, beats_loss=0.01125, ecapa_loss=0.0001923, whisper_loss=0.07025, over 20977.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01046, ecapa_loss=0.0001425, whisper_loss=0.09047, over 3894312.74 frames. ], batch size: 92, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:20:57,772 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.37 vs. limit=12.0 2024-08-19 10:21:12,019 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 27 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-19 10:21:21,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4423690.0, ans=0.125 2024-08-19 10:21:33,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4423790.0, ans=0.125 2024-08-19 10:21:33,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4423790.0, ans=0.125 2024-08-19 10:21:38,305 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 20 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-19 10:21:39,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4423790.0, ans=0.125 2024-08-19 10:21:45,281 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.07 vs. limit=15.0 2024-08-19 10:21:58,707 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 12300, loss[loss=0.09047, beats_loss=0.01394, ecapa_loss=0.0001471, whisper_loss=0.07506, over 21378.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01052, ecapa_loss=0.0001417, whisper_loss=0.09, over 3893819.70 frames. ], batch size: 87, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:22:02,518 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 35 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-19 10:22:15,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4424090.0, ans=0.125 2024-08-19 10:22:17,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4424090.0, ans=0.0 2024-08-19 10:22:20,054 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 30 from LS+wenet, 32 from Vox, 23 fro AS 2024-08-19 10:22:23,932 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4424190.0, ans=0.0 2024-08-19 10:22:38,493 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+01 2.350e+01 2.584e+01 3.104e+01 8.227e+01, threshold=5.169e+01, percent-clipped=2.0 2024-08-19 10:22:45,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4424290.0, ans=0.0 2024-08-19 10:22:58,852 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-19 10:23:03,570 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 12350, loss[loss=0.09933, beats_loss=0.009729, ecapa_loss=0.0001349, whisper_loss=0.08826, over 22796.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01047, ecapa_loss=0.0001419, whisper_loss=0.09049, over 3893294.87 frames. ], batch size: 90, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:23:10,928 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-19 10:23:17,447 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 29 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-19 10:23:19,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4424590.0, ans=0.125 2024-08-19 10:23:26,152 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-19 10:23:26,473 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4424590.0, ans=0.125 2024-08-19 10:23:41,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4424690.0, ans=0.1 2024-08-19 10:23:41,254 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.61 vs. limit=15.0 2024-08-19 10:23:46,525 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 22 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-19 10:23:57,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4424890.0, ans=0.125 2024-08-19 10:24:10,898 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 10:24:12,980 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 12400, loss[loss=0.09503, beats_loss=0.01131, ecapa_loss=0.0001633, whisper_loss=0.08209, over 14637.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01045, ecapa_loss=0.0001415, whisper_loss=0.09069, over 3869529.31 frames. ], batch size: 63, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:24:26,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4425090.0, ans=0.0 2024-08-19 10:24:30,919 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-19 10:24:42,699 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.50 vs. limit=15.0 2024-08-19 10:24:43,838 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.17 vs. limit=22.5 2024-08-19 10:24:55,404 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 26 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-19 10:24:56,421 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.733e+01 2.365e+01 2.557e+01 2.842e+01 4.073e+01, threshold=5.113e+01, percent-clipped=0.0 2024-08-19 10:25:04,898 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-19 10:25:06,925 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.518e-03 2024-08-19 10:25:24,584 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 12450, loss[loss=0.08323, beats_loss=0.01094, ecapa_loss=0.0001504, whisper_loss=0.07078, over 19624.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01046, ecapa_loss=0.0001396, whisper_loss=0.09101, over 3868813.65 frames. ], batch size: 82, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:25:27,396 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 30 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-19 10:25:41,935 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 23 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-19 10:25:42,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4425590.0, ans=0.125 2024-08-19 10:25:46,271 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 20 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-19 10:25:47,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4425590.0, ans=0.125 2024-08-19 10:25:56,216 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-19 10:26:02,827 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.21 vs. limit=22.5 2024-08-19 10:26:23,857 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-19 10:26:32,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4425890.0, ans=0.1 2024-08-19 10:26:34,759 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 12500, loss[loss=0.09422, beats_loss=0.009133, ecapa_loss=0.0001357, whisper_loss=0.08373, over 16615.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01037, ecapa_loss=0.0001404, whisper_loss=0.09103, over 3844005.32 frames. ], batch size: 65, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:26:36,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4425990.0, ans=0.125 2024-08-19 10:26:40,219 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-19 10:27:04,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4426190.0, ans=0.0 2024-08-19 10:27:17,645 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.209e+01 2.418e+01 2.656e+01 4.014e+01, threshold=4.837e+01, percent-clipped=0.0 2024-08-19 10:27:19,308 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-19 10:27:19,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4426290.0, ans=0.0 2024-08-19 10:27:31,460 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 25 from LS+wenet, 30 from Vox, 20 fro AS 2024-08-19 10:27:37,439 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.54 vs. limit=15.0 2024-08-19 10:27:43,676 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 12550, loss[loss=0.09922, beats_loss=0.01101, ecapa_loss=0.0001335, whisper_loss=0.08687, over 16454.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01032, ecapa_loss=0.0001412, whisper_loss=0.09165, over 3878235.22 frames. ], batch size: 66, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:28:05,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4426590.0, ans=0.0 2024-08-19 10:28:19,461 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 18 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-19 10:28:19,760 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4426690.0, ans=0.125 2024-08-19 10:28:26,917 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.82 vs. limit=15.0 2024-08-19 10:28:30,846 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.70 vs. limit=22.5 2024-08-19 10:28:52,397 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 12600, loss[loss=0.1177, beats_loss=0.009251, ecapa_loss=0.0001587, whisper_loss=0.1068, over 15906.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01036, ecapa_loss=0.0001418, whisper_loss=0.09139, over 3870291.23 frames. ], batch size: 60, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:28:53,926 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-19 10:28:54,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4426990.0, ans=0.1 2024-08-19 10:29:01,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4426990.0, ans=0.125 2024-08-19 10:29:10,163 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-19 10:29:14,099 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-19 10:29:14,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4427090.0, ans=0.035 2024-08-19 10:29:21,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4427190.0, ans=0.2 2024-08-19 10:29:27,372 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-19 10:29:33,323 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.965e+01 2.274e+01 2.527e+01 2.762e+01 5.696e+01, threshold=5.054e+01, percent-clipped=1.0 2024-08-19 10:29:33,570 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-19 10:29:40,454 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.09 vs. limit=15.0 2024-08-19 10:29:41,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4427290.0, ans=0.0 2024-08-19 10:29:47,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4427390.0, ans=0.0 2024-08-19 10:29:50,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4427390.0, ans=0.09899494936611666 2024-08-19 10:29:55,672 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-19 10:29:55,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=4427390.0, ans=10.0 2024-08-19 10:29:58,670 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 12650, loss[loss=0.1058, beats_loss=0.009909, ecapa_loss=0.0001363, whisper_loss=0.0945, over 23481.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01038, ecapa_loss=0.000143, whisper_loss=0.091, over 3863761.33 frames. ], batch size: 92, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:30:03,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4427490.0, ans=0.2 2024-08-19 10:30:05,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4427490.0, ans=0.0 2024-08-19 10:30:36,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4427790.0, ans=0.0 2024-08-19 10:30:53,492 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 14 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-19 10:30:59,152 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.02 vs. limit=6.0 2024-08-19 10:31:03,752 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 12700, loss[loss=0.09539, beats_loss=0.01167, ecapa_loss=0.0001617, whisper_loss=0.0821, over 20849.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01049, ecapa_loss=0.0001426, whisper_loss=0.09064, over 3869129.46 frames. ], batch size: 90, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:31:11,651 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-19 10:31:21,132 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.91 vs. limit=6.0 2024-08-19 10:31:29,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4428190.0, ans=0.1 2024-08-19 10:31:45,158 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.368e+01 2.557e+01 2.871e+01 4.652e+01, threshold=5.113e+01, percent-clipped=0.0 2024-08-19 10:31:45,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4428290.0, ans=0.0 2024-08-19 10:31:45,736 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.76 vs. limit=15.0 2024-08-19 10:31:50,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4428290.0, ans=0.0 2024-08-19 10:31:55,777 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 17 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-19 10:32:04,897 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.96 vs. limit=12.0 2024-08-19 10:32:08,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4428390.0, ans=0.125 2024-08-19 10:32:10,643 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 12750, loss[loss=0.09576, beats_loss=0.01123, ecapa_loss=0.0001273, whisper_loss=0.08326, over 18045.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01055, ecapa_loss=0.0001425, whisper_loss=0.09036, over 3862282.89 frames. ], batch size: 72, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:32:17,381 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-19 10:32:21,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4428490.0, ans=0.0 2024-08-19 10:32:26,538 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.69 vs. limit=15.0 2024-08-19 10:32:28,517 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 23 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-19 10:32:34,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=4428590.0, ans=0.025 2024-08-19 10:32:44,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4428690.0, ans=0.125 2024-08-19 10:32:53,509 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 22 from LS+wenet, 32 from Vox, 38 fro AS 2024-08-19 10:33:05,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4428890.0, ans=0.05 2024-08-19 10:33:08,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4428890.0, ans=0.125 2024-08-19 10:33:13,811 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.75 vs. limit=6.0 2024-08-19 10:33:15,413 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 12800, loss[loss=0.1038, beats_loss=0.008788, ecapa_loss=0.0001315, whisper_loss=0.09369, over 15387.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0105, ecapa_loss=0.0001432, whisper_loss=0.09024, over 3863306.89 frames. ], batch size: 58, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:33:28,578 WARNING [optim.py:496] (0/4) Scaling gradients by 0.025292346253991127, model_norm_threshold=51.13230514526367 2024-08-19 10:33:28,748 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.17, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.119e+05, grad_sumsq=7.119e+05, orig_rms_sq=1.000e+00 2024-08-19 10:33:33,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4429090.0, ans=0.125 2024-08-19 10:33:38,806 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-19 10:33:46,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4429190.0, ans=0.125 2024-08-19 10:33:52,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=4429190.0, ans=0.05 2024-08-19 10:33:52,690 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.23 vs. limit=22.5 2024-08-19 10:33:53,477 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-19 10:33:57,297 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.329e+01 2.594e+01 2.877e+01 2.022e+03, threshold=5.187e+01, percent-clipped=2.0 2024-08-19 10:34:01,951 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.08 vs. limit=10.0 2024-08-19 10:34:09,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4429390.0, ans=0.95 2024-08-19 10:34:21,571 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 12850, loss[loss=0.06711, beats_loss=0.009234, ecapa_loss=9.278e-05, whisper_loss=0.05695, over 15549.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01052, ecapa_loss=0.0001433, whisper_loss=0.08997, over 3866838.81 frames. ], batch size: 53, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:34:21,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4429490.0, ans=0.1 2024-08-19 10:34:33,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4429490.0, ans=0.0 2024-08-19 10:34:34,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4429590.0, ans=0.125 2024-08-19 10:34:49,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4429690.0, ans=0.0 2024-08-19 10:34:51,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4429690.0, ans=0.125 2024-08-19 10:35:00,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4429690.0, ans=0.1 2024-08-19 10:35:16,801 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.08 vs. limit=10.0 2024-08-19 10:35:24,642 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.98 vs. limit=8.0 2024-08-19 10:35:26,664 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 26 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-19 10:35:29,212 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 12900, loss[loss=0.09058, beats_loss=0.01085, ecapa_loss=0.0001188, whisper_loss=0.07854, over 15034.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01061, ecapa_loss=0.0001423, whisper_loss=0.08945, over 3847813.13 frames. ], batch size: 59, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:35:33,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4429990.0, ans=0.125 2024-08-19 10:35:51,023 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 21 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-19 10:36:10,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4430290.0, ans=0.0 2024-08-19 10:36:11,889 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.655e+01 2.251e+01 2.498e+01 2.810e+01 4.118e+01, threshold=4.997e+01, percent-clipped=0.0 2024-08-19 10:36:23,924 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.35 vs. limit=12.0 2024-08-19 10:36:24,693 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 26 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-19 10:36:28,243 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.43 vs. limit=15.0 2024-08-19 10:36:36,120 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-19 10:36:37,367 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 12950, loss[loss=0.1153, beats_loss=0.007582, ecapa_loss=0.0001633, whisper_loss=0.1061, over 19595.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01054, ecapa_loss=0.0001418, whisper_loss=0.0895, over 3852822.61 frames. ], batch size: 75, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:36:37,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4430490.0, ans=0.125 2024-08-19 10:36:40,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4430490.0, ans=0.09899494936611666 2024-08-19 10:36:45,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4430490.0, ans=0.125 2024-08-19 10:36:54,298 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.22 vs. limit=15.0 2024-08-19 10:36:59,600 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.48 vs. limit=22.5 2024-08-19 10:37:15,666 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-19 10:37:19,954 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 25 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-19 10:37:22,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4430790.0, ans=0.125 2024-08-19 10:37:28,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4430790.0, ans=0.125 2024-08-19 10:37:28,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4430790.0, ans=0.125 2024-08-19 10:37:40,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4430890.0, ans=0.125 2024-08-19 10:37:40,932 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-19 10:37:45,778 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 13000, loss[loss=0.09603, beats_loss=0.01212, ecapa_loss=0.0001328, whisper_loss=0.08258, over 22294.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01049, ecapa_loss=0.0001424, whisper_loss=0.08989, over 3864282.49 frames. ], batch size: 93, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:37:57,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4430990.0, ans=0.125 2024-08-19 10:37:59,227 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 31 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-19 10:38:01,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4431090.0, ans=0.0 2024-08-19 10:38:07,787 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.44 vs. limit=15.0 2024-08-19 10:38:22,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4431190.0, ans=0.125 2024-08-19 10:38:26,671 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.37 vs. limit=10.0 2024-08-19 10:38:29,948 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.790e+01 2.291e+01 2.501e+01 2.762e+01 5.240e+01, threshold=5.001e+01, percent-clipped=1.0 2024-08-19 10:38:32,576 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 14 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-19 10:38:54,512 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 13050, loss[loss=0.1089, beats_loss=0.009842, ecapa_loss=0.0001272, whisper_loss=0.0978, over 24197.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01052, ecapa_loss=0.0001421, whisper_loss=0.08954, over 3842412.06 frames. ], batch size: 94, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:39:06,205 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 23 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-19 10:39:14,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4431590.0, ans=0.2 2024-08-19 10:39:18,979 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.85 vs. limit=6.0 2024-08-19 10:39:24,962 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 16 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-19 10:39:33,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4431690.0, ans=0.0 2024-08-19 10:39:37,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4431790.0, ans=0.1 2024-08-19 10:39:38,791 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.476e+00 2024-08-19 10:39:50,084 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 36 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-19 10:39:59,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4431890.0, ans=0.1 2024-08-19 10:39:59,773 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.17 vs. limit=15.0 2024-08-19 10:40:02,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4431890.0, ans=0.125 2024-08-19 10:40:06,982 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 13100, loss[loss=0.09491, beats_loss=0.009256, ecapa_loss=0.0001513, whisper_loss=0.08415, over 22612.00 frames. ], tot_loss[loss=0.101, beats_loss=0.0105, ecapa_loss=0.0001418, whisper_loss=0.08912, over 3862921.75 frames. ], batch size: 92, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:40:09,484 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 25 from LS+wenet, 11 from Vox, 37 fro AS 2024-08-19 10:40:15,181 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 20 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-19 10:40:20,727 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 35 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-19 10:40:22,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4432090.0, ans=0.125 2024-08-19 10:40:28,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4432090.0, ans=0.0 2024-08-19 10:40:40,763 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-19 10:40:51,955 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.322e+01 2.530e+01 2.794e+01 4.175e+01, threshold=5.059e+01, percent-clipped=0.0 2024-08-19 10:41:03,420 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 27 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-19 10:41:13,648 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-19 10:41:17,799 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 13150, loss[loss=0.08875, beats_loss=0.01258, ecapa_loss=0.0001535, whisper_loss=0.07464, over 21667.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01051, ecapa_loss=0.000141, whisper_loss=0.089, over 3892920.08 frames. ], batch size: 94, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:41:38,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4432590.0, ans=0.125 2024-08-19 10:41:52,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4432690.0, ans=0.0 2024-08-19 10:42:08,382 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.53 vs. limit=15.0 2024-08-19 10:42:24,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4432890.0, ans=0.0 2024-08-19 10:42:27,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4432990.0, ans=0.09899494936611666 2024-08-19 10:42:28,870 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 13200, loss[loss=0.09839, beats_loss=0.01135, ecapa_loss=0.0001272, whisper_loss=0.08577, over 14582.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01049, ecapa_loss=0.0001412, whisper_loss=0.08937, over 3881363.33 frames. ], batch size: 57, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:42:41,405 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 29 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-19 10:42:45,446 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-19 10:42:55,203 WARNING [optim.py:496] (0/4) Scaling gradients by 0.07873938977718353, model_norm_threshold=50.591163635253906 2024-08-19 10:42:55,376 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.905e+04, grad_sumsq=5.905e+04, orig_rms_sq=1.000e+00 2024-08-19 10:43:02,578 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-19 10:43:10,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4433290.0, ans=0.125 2024-08-19 10:43:11,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4433290.0, ans=0.125 2024-08-19 10:43:13,601 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.266e+01 2.510e+01 2.814e+01 6.425e+02, threshold=5.020e+01, percent-clipped=2.0 2024-08-19 10:43:29,056 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 15 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-19 10:43:30,988 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.47 vs. limit=15.0 2024-08-19 10:43:40,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4433490.0, ans=0.0 2024-08-19 10:43:41,527 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 13250, loss[loss=0.09223, beats_loss=0.01221, ecapa_loss=0.0001607, whisper_loss=0.07841, over 20351.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01052, ecapa_loss=0.0001412, whisper_loss=0.08919, over 3862693.03 frames. ], batch size: 85, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:43:46,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4433490.0, ans=0.125 2024-08-19 10:43:59,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4433590.0, ans=0.2 2024-08-19 10:44:00,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4433590.0, ans=0.09899494936611666 2024-08-19 10:44:12,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4433690.0, ans=0.1 2024-08-19 10:44:13,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4433690.0, ans=0.1 2024-08-19 10:44:28,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4433790.0, ans=0.1 2024-08-19 10:44:31,594 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-19 10:44:37,791 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.15 vs. limit=15.0 2024-08-19 10:44:54,632 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 13300, loss[loss=0.09259, beats_loss=0.009957, ecapa_loss=0.0001693, whisper_loss=0.08094, over 21617.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01045, ecapa_loss=0.0001412, whisper_loss=0.0897, over 3863055.42 frames. ], batch size: 89, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:44:57,168 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-19 10:45:36,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4434190.0, ans=0.0 2024-08-19 10:45:39,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4434290.0, ans=0.0 2024-08-19 10:45:42,572 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.762e+01 2.304e+01 2.512e+01 2.762e+01 6.116e+01, threshold=5.024e+01, percent-clipped=2.0 2024-08-19 10:45:43,012 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 10:45:49,295 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.87 vs. limit=15.0 2024-08-19 10:45:51,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4434290.0, ans=0.1 2024-08-19 10:46:09,131 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 13350, loss[loss=0.1005, beats_loss=0.009471, ecapa_loss=0.0001321, whisper_loss=0.0897, over 16991.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01047, ecapa_loss=0.0001402, whisper_loss=0.09004, over 3848147.61 frames. ], batch size: 66, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:46:37,741 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.74 vs. limit=15.0 2024-08-19 10:46:50,670 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.06 vs. limit=22.5 2024-08-19 10:46:56,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4434790.0, ans=0.0 2024-08-19 10:47:11,261 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 27 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-19 10:47:21,304 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 13400, loss[loss=0.1219, beats_loss=0.01113, ecapa_loss=0.0001202, whisper_loss=0.1096, over 17084.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01045, ecapa_loss=0.0001405, whisper_loss=0.09025, over 3832471.66 frames. ], batch size: 66, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:47:22,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4434990.0, ans=0.125 2024-08-19 10:47:25,622 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.35 vs. limit=15.0 2024-08-19 10:47:30,305 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-19 10:47:33,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4435090.0, ans=0.125 2024-08-19 10:47:45,566 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-19 10:47:48,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4435190.0, ans=0.0 2024-08-19 10:47:49,943 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4435190.0, ans=0.125 2024-08-19 10:47:54,556 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4435190.0, ans=0.0 2024-08-19 10:47:54,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4435190.0, ans=0.125 2024-08-19 10:48:01,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4435290.0, ans=0.0 2024-08-19 10:48:05,888 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.287e+01 2.514e+01 2.765e+01 5.474e+01, threshold=5.028e+01, percent-clipped=1.0 2024-08-19 10:48:07,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4435290.0, ans=0.0 2024-08-19 10:48:10,339 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.73 vs. limit=15.0 2024-08-19 10:48:17,255 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 27 from LS+wenet, 29 from Vox, 25 fro AS 2024-08-19 10:48:21,660 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 23 from LS+wenet, 20 from Vox, 49 fro AS 2024-08-19 10:48:31,449 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 13450, loss[loss=0.09927, beats_loss=0.01137, ecapa_loss=0.0001252, whisper_loss=0.08665, over 14407.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0104, ecapa_loss=0.0001416, whisper_loss=0.08966, over 3829249.77 frames. ], batch size: 54, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:48:40,132 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-19 10:48:40,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4435490.0, ans=0.125 2024-08-19 10:49:12,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4435790.0, ans=0.2 2024-08-19 10:49:20,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4435790.0, ans=0.125 2024-08-19 10:49:31,606 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.90 vs. limit=22.5 2024-08-19 10:49:41,408 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 13500, loss[loss=0.1203, beats_loss=0.009698, ecapa_loss=0.0001409, whisper_loss=0.1092, over 23591.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01027, ecapa_loss=0.0001434, whisper_loss=0.09046, over 3834917.02 frames. ], batch size: 93, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:49:47,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4435990.0, ans=0.125 2024-08-19 10:49:52,284 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 24 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-19 10:49:56,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4436090.0, ans=0.0 2024-08-19 10:50:19,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4436190.0, ans=0.125 2024-08-19 10:50:24,357 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.393e+01 2.618e+01 2.905e+01 4.660e+01, threshold=5.237e+01, percent-clipped=0.0 2024-08-19 10:50:24,568 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-19 10:50:32,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4436290.0, ans=0.125 2024-08-19 10:50:47,741 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 13550, loss[loss=0.1028, beats_loss=0.009703, ecapa_loss=0.0001347, whisper_loss=0.09172, over 14846.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01037, ecapa_loss=0.0001421, whisper_loss=0.09033, over 3836890.29 frames. ], batch size: 58, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:50:49,407 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 22 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-19 10:50:50,416 WARNING [optim.py:496] (0/4) Scaling gradients by 0.0421869195997715, model_norm_threshold=52.36887741088867 2024-08-19 10:50:50,587 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.15, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.302e+05, grad_sumsq=2.302e+05, orig_rms_sq=1.000e+00 2024-08-19 10:51:07,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4436590.0, ans=0.125 2024-08-19 10:51:12,191 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 10 from Vox, 38 fro AS 2024-08-19 10:51:13,718 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 16 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-19 10:51:16,169 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 22 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-19 10:51:20,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4436690.0, ans=0.0 2024-08-19 10:51:23,488 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-19 10:51:31,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4436790.0, ans=0.0 2024-08-19 10:51:32,452 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4436790.0, ans=0.125 2024-08-19 10:51:43,636 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-19 10:51:49,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4436890.0, ans=0.2 2024-08-19 10:51:54,330 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.64 vs. limit=15.0 2024-08-19 10:51:55,154 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 13600, loss[loss=0.1038, beats_loss=0.01089, ecapa_loss=0.0001363, whisper_loss=0.09153, over 22437.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01048, ecapa_loss=0.0001413, whisper_loss=0.09061, over 3838578.42 frames. ], batch size: 90, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:52:10,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4437090.0, ans=0.0 2024-08-19 10:52:11,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4437090.0, ans=0.125 2024-08-19 10:52:12,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4437090.0, ans=0.0 2024-08-19 10:52:17,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=4437090.0, ans=0.1 2024-08-19 10:52:19,150 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4437090.0, ans=0.125 2024-08-19 10:52:22,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4437190.0, ans=0.1 2024-08-19 10:52:23,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4437190.0, ans=0.0 2024-08-19 10:52:27,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4437190.0, ans=0.125 2024-08-19 10:52:38,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4437290.0, ans=0.125 2024-08-19 10:52:39,019 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.971e+01 2.324e+01 2.607e+01 2.904e+01 1.241e+03, threshold=5.213e+01, percent-clipped=1.0 2024-08-19 10:52:40,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4437290.0, ans=0.0 2024-08-19 10:52:49,593 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4437390.0, ans=0.125 2024-08-19 10:53:03,397 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 13650, loss[loss=0.08352, beats_loss=0.01233, ecapa_loss=0.0001239, whisper_loss=0.06995, over 19798.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0105, ecapa_loss=0.0001408, whisper_loss=0.0904, over 3865764.42 frames. ], batch size: 81, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:53:10,959 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 24 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-19 10:53:12,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4437490.0, ans=0.0 2024-08-19 10:53:12,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4437490.0, ans=0.125 2024-08-19 10:53:31,910 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 19 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-19 10:53:36,370 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 19 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-19 10:53:53,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4437790.0, ans=0.125 2024-08-19 10:53:58,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4437790.0, ans=0.07 2024-08-19 10:53:59,319 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 17 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-19 10:54:06,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4437890.0, ans=0.0 2024-08-19 10:54:11,787 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-19 10:54:12,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4437890.0, ans=0.07 2024-08-19 10:54:14,315 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 13700, loss[loss=0.1026, beats_loss=0.01173, ecapa_loss=7.738e-05, whisper_loss=0.09008, over 17219.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0106, ecapa_loss=0.0001412, whisper_loss=0.08978, over 3869200.68 frames. ], batch size: 62, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:54:16,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4437990.0, ans=0.95 2024-08-19 10:54:24,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=4437990.0, ans=15.0 2024-08-19 10:55:04,005 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.299e+01 2.529e+01 2.834e+01 4.971e+01, threshold=5.058e+01, percent-clipped=0.0 2024-08-19 10:55:06,202 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.650e+00 2024-08-19 10:55:36,426 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 13750, loss[loss=0.1127, beats_loss=0.01203, ecapa_loss=0.000112, whisper_loss=0.09955, over 21981.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01053, ecapa_loss=0.0001411, whisper_loss=0.09049, over 3869171.33 frames. ], batch size: 87, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:55:46,810 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 15 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-19 10:55:50,439 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 17 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-19 10:55:58,946 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-19 10:56:00,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=4438590.0, ans=10.0 2024-08-19 10:56:33,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4438790.0, ans=0.0 2024-08-19 10:56:39,610 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 25 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-19 10:57:00,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4438890.0, ans=0.0 2024-08-19 10:57:04,787 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-19 10:57:13,442 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 13800, loss[loss=0.1191, beats_loss=0.007988, ecapa_loss=0.0001306, whisper_loss=0.1098, over 17248.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01043, ecapa_loss=0.0001415, whisper_loss=0.09083, over 3867109.27 frames. ], batch size: 65, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:57:22,679 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 24 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-19 10:57:24,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4438990.0, ans=0.1 2024-08-19 10:57:36,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4439090.0, ans=0.0 2024-08-19 10:57:40,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4439090.0, ans=0.0 2024-08-19 10:57:45,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4439090.0, ans=0.125 2024-08-19 10:58:06,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4439290.0, ans=0.07 2024-08-19 10:58:06,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4439290.0, ans=0.2 2024-08-19 10:58:10,791 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.280e+01 2.516e+01 2.824e+01 6.653e+01, threshold=5.033e+01, percent-clipped=2.0 2024-08-19 10:58:14,731 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 19 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-19 10:58:15,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4439290.0, ans=0.125 2024-08-19 10:58:25,692 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-19 10:58:27,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4439390.0, ans=0.1 2024-08-19 10:58:30,565 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 25 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-19 10:58:40,810 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 13850, loss[loss=0.1052, beats_loss=0.009548, ecapa_loss=7.819e-05, whisper_loss=0.09491, over 16166.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01052, ecapa_loss=0.0001401, whisper_loss=0.09018, over 3881504.86 frames. ], batch size: 55, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:58:46,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4439490.0, ans=0.125 2024-08-19 10:59:08,109 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 27 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-19 10:59:11,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4439590.0, ans=0.125 2024-08-19 10:59:12,548 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-19 10:59:15,569 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 24 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-19 10:59:22,531 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.07 vs. limit=15.0 2024-08-19 10:59:25,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4439690.0, ans=0.125 2024-08-19 10:59:30,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4439790.0, ans=0.1 2024-08-19 10:59:38,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4439790.0, ans=0.0 2024-08-19 10:59:38,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4439790.0, ans=0.1 2024-08-19 10:59:42,503 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-19 10:59:50,383 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.88 vs. limit=15.0 2024-08-19 11:00:06,035 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-444000.pt 2024-08-19 11:00:08,753 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 13900, loss[loss=0.09694, beats_loss=0.01083, ecapa_loss=0.0001642, whisper_loss=0.08446, over 16815.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01049, ecapa_loss=0.0001392, whisper_loss=0.09107, over 3897434.11 frames. ], batch size: 72, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:00:21,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4439990.0, ans=0.1 2024-08-19 11:00:52,242 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.82 vs. limit=15.0 2024-08-19 11:01:02,378 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 11:01:03,248 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.454e+01 2.690e+01 3.083e+01 4.559e+01, threshold=5.380e+01, percent-clipped=0.0 2024-08-19 11:01:25,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4440390.0, ans=0.125 2024-08-19 11:01:26,519 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 29 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-19 11:01:28,527 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 17 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-19 11:01:31,140 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 13950, loss[loss=0.116, beats_loss=0.009314, ecapa_loss=0.0001816, whisper_loss=0.1049, over 22255.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01047, ecapa_loss=0.0001398, whisper_loss=0.09089, over 3870745.45 frames. ], batch size: 90, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:01:44,352 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.67 vs. limit=12.0 2024-08-19 11:02:00,235 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-19 11:02:35,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4440890.0, ans=0.125 2024-08-19 11:02:35,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4440890.0, ans=0.0 2024-08-19 11:02:44,494 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-19 11:02:46,343 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 25 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-19 11:02:50,969 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 14000, loss[loss=0.095, beats_loss=0.01288, ecapa_loss=0.0001273, whisper_loss=0.08085, over 20908.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01046, ecapa_loss=0.0001403, whisper_loss=0.09005, over 3886575.72 frames. ], batch size: 86, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:02:52,843 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-19 11:02:53,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=4440990.0, ans=0.05 2024-08-19 11:02:54,551 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-19 11:02:57,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4440990.0, ans=0.125 2024-08-19 11:03:06,375 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-19 11:03:24,178 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 12 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-19 11:03:26,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4441190.0, ans=0.125 2024-08-19 11:03:47,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4441290.0, ans=0.2 2024-08-19 11:03:49,632 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+01 2.282e+01 2.482e+01 2.793e+01 5.797e+01, threshold=4.965e+01, percent-clipped=1.0 2024-08-19 11:03:57,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4441290.0, ans=0.0 2024-08-19 11:04:00,027 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 20 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-19 11:04:02,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4441390.0, ans=0.125 2024-08-19 11:04:07,048 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.32 vs. limit=15.0 2024-08-19 11:04:24,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4441490.0, ans=0.0 2024-08-19 11:04:25,775 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 14050, loss[loss=0.06678, beats_loss=0.0111, ecapa_loss=0.0001387, whisper_loss=0.05429, over 21179.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0104, ecapa_loss=0.0001404, whisper_loss=0.09052, over 3931812.03 frames. ], batch size: 90, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:04:26,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4441490.0, ans=0.125 2024-08-19 11:04:37,458 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.62 vs. limit=12.0 2024-08-19 11:04:42,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4441590.0, ans=0.1 2024-08-19 11:04:56,498 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-19 11:04:59,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4441690.0, ans=0.1 2024-08-19 11:05:23,522 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.92 vs. limit=12.0 2024-08-19 11:05:50,076 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 14100, loss[loss=0.1062, beats_loss=0.01088, ecapa_loss=0.0001443, whisper_loss=0.09384, over 17369.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01046, ecapa_loss=0.0001405, whisper_loss=0.09052, over 3929759.71 frames. ], batch size: 68, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:05:50,221 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 22 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-19 11:06:15,786 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.57 vs. limit=10.0 2024-08-19 11:06:47,665 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.904e+01 2.310e+01 2.577e+01 2.983e+01 3.825e+01, threshold=5.154e+01, percent-clipped=0.0 2024-08-19 11:07:02,388 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 21 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-19 11:07:06,838 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 14 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-19 11:07:25,254 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 14150, loss[loss=0.1052, beats_loss=0.0108, ecapa_loss=0.0001147, whisper_loss=0.09323, over 18045.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01055, ecapa_loss=0.000141, whisper_loss=0.09018, over 3918788.53 frames. ], batch size: 69, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:07:32,434 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 17 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-19 11:07:41,817 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 28 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-19 11:08:10,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4442690.0, ans=0.0 2024-08-19 11:08:39,103 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 32 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-19 11:08:52,559 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 14200, loss[loss=0.09886, beats_loss=0.01175, ecapa_loss=0.000139, whisper_loss=0.08572, over 22736.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01055, ecapa_loss=0.0001404, whisper_loss=0.09032, over 3936481.05 frames. ], batch size: 90, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:08:53,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4442990.0, ans=0.125 2024-08-19 11:08:53,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4442990.0, ans=0.125 2024-08-19 11:09:01,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4442990.0, ans=0.2 2024-08-19 11:09:32,137 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.79 vs. limit=15.0 2024-08-19 11:09:37,886 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 22 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-19 11:09:47,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=4443290.0, ans=0.05 2024-08-19 11:09:50,401 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.637e+01 2.398e+01 2.635e+01 3.009e+01 4.372e+01, threshold=5.270e+01, percent-clipped=0.0 2024-08-19 11:09:52,088 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-19 11:10:02,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4443290.0, ans=0.125 2024-08-19 11:10:08,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4443390.0, ans=0.0 2024-08-19 11:10:27,934 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 14250, loss[loss=0.09759, beats_loss=0.01103, ecapa_loss=0.0001327, whisper_loss=0.08524, over 18163.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01051, ecapa_loss=0.0001406, whisper_loss=0.09097, over 3941051.51 frames. ], batch size: 74, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:10:36,049 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.325e+00 2024-08-19 11:10:37,519 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 20 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-19 11:10:46,253 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.89 vs. limit=22.5 2024-08-19 11:10:57,254 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.48 vs. limit=15.0 2024-08-19 11:10:58,933 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.31 vs. limit=15.0 2024-08-19 11:11:09,388 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 19 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-19 11:11:15,491 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 23 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-19 11:11:40,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4443790.0, ans=0.1 2024-08-19 11:11:51,253 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-19 11:11:58,769 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 14300, loss[loss=0.1056, beats_loss=0.01258, ecapa_loss=0.0001111, whisper_loss=0.09189, over 23432.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01048, ecapa_loss=0.0001398, whisper_loss=0.09116, over 3951173.84 frames. ], batch size: 92, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:11:58,873 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 23 from LS+wenet, 31 from Vox, 29 fro AS 2024-08-19 11:12:20,564 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 19 from LS+wenet, 22 from Vox, 49 fro AS 2024-08-19 11:12:22,791 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 26 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-19 11:12:43,317 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=3.625e-02 2024-08-19 11:12:50,002 WARNING [optim.py:496] (0/4) Scaling gradients by 0.05855522304773331, model_norm_threshold=52.698848724365234 2024-08-19 11:12:50,284 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.25, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.023e+05, grad_sumsq=2.023e+05, orig_rms_sq=1.000e+00 2024-08-19 11:12:50,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4444190.0, ans=0.0 2024-08-19 11:12:56,611 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.271e+01 2.635e+01 2.896e+01 9.000e+02, threshold=5.270e+01, percent-clipped=1.0 2024-08-19 11:13:17,759 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 20 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-19 11:13:29,809 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 14350, loss[loss=0.08281, beats_loss=0.01355, ecapa_loss=0.00011, whisper_loss=0.06816, over 21989.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01057, ecapa_loss=0.0001393, whisper_loss=0.09022, over 3943136.21 frames. ], batch size: 89, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:13:38,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4444490.0, ans=0.1 2024-08-19 11:13:44,195 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 17 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-19 11:13:49,798 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.82 vs. limit=15.0 2024-08-19 11:14:21,785 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.94 vs. limit=22.5 2024-08-19 11:14:48,590 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.89 vs. limit=15.0 2024-08-19 11:15:04,225 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 14400, loss[loss=0.1097, beats_loss=0.00947, ecapa_loss=0.0001492, whisper_loss=0.09876, over 20584.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01059, ecapa_loss=0.0001401, whisper_loss=0.09048, over 3940993.50 frames. ], batch size: 84, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:15:06,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=4444990.0, ans=0.05 2024-08-19 11:15:56,854 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-19 11:15:57,583 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.99 vs. limit=12.0 2024-08-19 11:15:58,283 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 23 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-19 11:16:01,340 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+01 2.334e+01 2.545e+01 2.906e+01 1.418e+02, threshold=5.090e+01, percent-clipped=1.0 2024-08-19 11:16:10,106 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.40 vs. limit=15.0 2024-08-19 11:16:11,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4445290.0, ans=0.0 2024-08-19 11:16:22,904 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 31 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-19 11:16:29,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4445390.0, ans=0.125 2024-08-19 11:16:35,731 INFO [train_multi_KD3.py:1116] (0/4) Epoch 30, batch 14450, loss[loss=0.07331, beats_loss=0.0128, ecapa_loss=0.0001512, whisper_loss=0.059, over 20811.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01051, ecapa_loss=0.00014, whisper_loss=0.09065, over 3917867.87 frames. ], batch size: 86, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:16:49,087 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 26 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-19 11:16:54,640 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.32 vs. limit=22.5 2024-08-19 11:17:34,876 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.38 vs. limit=15.0 2024-08-19 11:17:45,638 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.34 vs. limit=10.0 2024-08-19 11:17:50,764 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/epoch-30.pt 2024-08-19 11:18:30,866 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 0, loss[loss=0.09, beats_loss=0.01076, ecapa_loss=0.0001103, whisper_loss=0.07814, over 16581.00 frames. ], tot_loss[loss=0.09, beats_loss=0.01076, ecapa_loss=0.0001103, whisper_loss=0.07814, over 16581.00 frames. ], batch size: 66, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:18:30,867 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-19 11:19:11,987 INFO [train_multi_KD3.py:1149] (0/4) Epoch 31, validation on ASR_libri: loss=0.2529, beats_loss=0, ecapa_loss=0.0005129, whisper_loss=0.2478, over 922467.00 frames. 2024-08-19 11:19:31,614 INFO [train_multi_KD3.py:1149] (0/4) Epoch 31, validation on SV_voxceleb1: loss=0.003975, beats_loss=0, ecapa_loss=0.0003975, whisper_loss=0, over 939242.00 frames. 2024-08-19 11:20:58,087 INFO [train_multi_KD3.py:1149] (0/4) Epoch 31, validation on AT_audioset: loss=0.02297, beats_loss=0.02297, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 11:20:58,091 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32389MB 2024-08-19 11:20:58,368 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-19 11:21:24,985 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-19 11:23:12,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4446090.0, ans=0.125 2024-08-19 11:23:39,430 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 21 from Vox, 17 fro AS 2024-08-19 11:24:18,265 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.992e+01 2.403e+01 2.773e+01 3.093e+01 8.282e+01, threshold=5.547e+01, percent-clipped=1.0 2024-08-19 11:24:55,635 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 50, loss[loss=0.1023, beats_loss=0.005997, ecapa_loss=0.0001527, whisper_loss=0.09473, over 17501.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.009232, ecapa_loss=0.0001406, whisper_loss=0.08996, over 864190.88 frames. ], batch size: 67, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:25:06,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4446390.0, ans=0.0 2024-08-19 11:25:45,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4446490.0, ans=0.125 2024-08-19 11:25:59,218 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-19 11:26:13,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4446490.0, ans=0.0 2024-08-19 11:26:44,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4446590.0, ans=0.0 2024-08-19 11:27:01,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4446590.0, ans=0.0 2024-08-19 11:27:58,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4446790.0, ans=0.04949747468305833 2024-08-19 11:28:19,925 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-19 11:28:34,186 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 100, loss[loss=0.1068, beats_loss=0.01004, ecapa_loss=0.0001089, whisper_loss=0.09567, over 16466.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.00928, ecapa_loss=0.0001407, whisper_loss=0.09005, over 1515406.06 frames. ], batch size: 62, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:29:20,242 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 21 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-19 11:29:23,935 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.88 vs. limit=22.5 2024-08-19 11:29:36,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4447090.0, ans=0.2 2024-08-19 11:29:43,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4447090.0, ans=0.125 2024-08-19 11:30:03,306 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 20 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-19 11:30:03,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4447190.0, ans=0.1 2024-08-19 11:30:08,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4447190.0, ans=0.0 2024-08-19 11:30:10,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4447190.0, ans=0.07 2024-08-19 11:30:18,412 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 27 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-19 11:30:21,165 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.140e+01 2.553e+01 2.874e+01 3.336e+01 1.667e+02, threshold=5.748e+01, percent-clipped=2.0 2024-08-19 11:30:33,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4447290.0, ans=0.125 2024-08-19 11:30:34,218 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.89 vs. limit=15.0 2024-08-19 11:30:39,354 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 150, loss[loss=0.1055, beats_loss=0.01088, ecapa_loss=0.0001484, whisper_loss=0.09312, over 23756.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.009479, ecapa_loss=0.0001395, whisper_loss=0.08955, over 2006286.29 frames. ], batch size: 92, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:30:50,181 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 14 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-19 11:30:54,031 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-19 11:31:24,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4447590.0, ans=0.2 2024-08-19 11:31:40,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4447690.0, ans=0.0 2024-08-19 11:31:44,920 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.45 vs. limit=22.5 2024-08-19 11:32:11,165 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.37 vs. limit=15.0 2024-08-19 11:32:17,631 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 200, loss[loss=0.09331, beats_loss=0.01074, ecapa_loss=0.0001215, whisper_loss=0.08136, over 20486.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.009731, ecapa_loss=0.0001386, whisper_loss=0.08914, over 2413936.93 frames. ], batch size: 78, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:32:20,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4447890.0, ans=0.0 2024-08-19 11:32:24,246 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 17 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-19 11:32:37,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4447990.0, ans=0.1 2024-08-19 11:32:40,172 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 28 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-19 11:32:41,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4447990.0, ans=0.125 2024-08-19 11:33:08,357 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.67 vs. limit=22.5 2024-08-19 11:33:09,501 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 11:33:22,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4448190.0, ans=0.0 2024-08-19 11:33:32,455 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.387e+01 2.633e+01 3.004e+01 1.700e+02, threshold=5.266e+01, percent-clipped=1.0 2024-08-19 11:33:36,363 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 25 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-19 11:33:38,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4448290.0, ans=0.2 2024-08-19 11:33:48,414 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 250, loss[loss=0.1103, beats_loss=0.008699, ecapa_loss=0.0001643, whisper_loss=0.09999, over 15278.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.009833, ecapa_loss=0.0001406, whisper_loss=0.0901, over 2691347.74 frames. ], batch size: 61, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:33:58,918 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=4448390.0, ans=0.025 2024-08-19 11:33:58,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4448390.0, ans=0.0 2024-08-19 11:34:08,018 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 25 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-19 11:34:20,887 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.26 vs. limit=22.5 2024-08-19 11:34:25,549 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-19 11:34:36,271 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2024-08-19 11:34:51,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4448690.0, ans=0.0 2024-08-19 11:34:55,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4448690.0, ans=0.125 2024-08-19 11:34:56,586 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 20 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-19 11:35:17,475 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 300, loss[loss=0.08144, beats_loss=0.01141, ecapa_loss=0.0001339, whisper_loss=0.06869, over 18152.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01009, ecapa_loss=0.0001406, whisper_loss=0.08894, over 2933990.43 frames. ], batch size: 71, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:35:20,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4448890.0, ans=0.0 2024-08-19 11:35:24,577 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 25 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-19 11:35:24,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4448890.0, ans=0.1 2024-08-19 11:35:32,612 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 24 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-19 11:36:23,023 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 14 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-19 11:36:23,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4449190.0, ans=0.07 2024-08-19 11:36:27,074 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 23 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-19 11:36:29,520 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 31 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-19 11:36:32,053 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.740e+01 2.162e+01 2.418e+01 2.637e+01 1.048e+02, threshold=4.837e+01, percent-clipped=1.0 2024-08-19 11:36:35,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4449290.0, ans=0.0 2024-08-19 11:36:37,855 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-19 11:36:40,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4449290.0, ans=0.2 2024-08-19 11:36:42,813 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-19 11:36:43,452 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.74 vs. limit=15.0 2024-08-19 11:36:45,211 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 350, loss[loss=0.1004, beats_loss=0.01176, ecapa_loss=0.0001211, whisper_loss=0.0874, over 15627.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01015, ecapa_loss=0.0001413, whisper_loss=0.08866, over 3123022.49 frames. ], batch size: 62, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:36:51,598 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 12 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-19 11:36:56,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4449390.0, ans=0.0 2024-08-19 11:37:04,051 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4449490.0, ans=0.2 2024-08-19 11:37:08,432 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 23 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-19 11:37:10,069 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-19 11:37:20,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4449590.0, ans=0.125 2024-08-19 11:37:26,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4449590.0, ans=0.125 2024-08-19 11:37:29,440 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-19 11:38:15,415 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 400, loss[loss=0.09899, beats_loss=0.01201, ecapa_loss=0.0001417, whisper_loss=0.08557, over 22270.00 frames. ], tot_loss[loss=0.09998, beats_loss=0.01022, ecapa_loss=0.0001412, whisper_loss=0.08834, over 3254762.59 frames. ], batch size: 88, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:38:26,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4449890.0, ans=0.125 2024-08-19 11:38:49,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4450090.0, ans=0.0 2024-08-19 11:38:57,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4450090.0, ans=0.0 2024-08-19 11:39:12,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4450190.0, ans=0.125 2024-08-19 11:39:16,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4450190.0, ans=0.1 2024-08-19 11:39:18,237 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 24 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-19 11:39:24,429 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 9 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-19 11:39:32,667 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.597e+01 2.340e+01 2.668e+01 2.900e+01 1.466e+02, threshold=5.335e+01, percent-clipped=2.0 2024-08-19 11:39:40,787 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 16 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-19 11:39:42,461 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 22 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-19 11:39:44,189 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 27 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-19 11:39:46,343 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 450, loss[loss=0.1104, beats_loss=0.0103, ecapa_loss=0.0001169, whisper_loss=0.09895, over 20447.00 frames. ], tot_loss[loss=0.09945, beats_loss=0.01031, ecapa_loss=0.00014, whisper_loss=0.08774, over 3376330.65 frames. ], batch size: 78, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:39:48,105 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-19 11:39:54,438 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2024-08-19 11:40:18,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4450490.0, ans=0.05 2024-08-19 11:40:29,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4450590.0, ans=0.0 2024-08-19 11:40:35,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4450590.0, ans=0.125 2024-08-19 11:40:42,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4450690.0, ans=0.2 2024-08-19 11:41:00,207 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.19 vs. limit=22.5 2024-08-19 11:41:14,699 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 500, loss[loss=0.1109, beats_loss=0.009247, ecapa_loss=0.0001421, whisper_loss=0.1002, over 22347.00 frames. ], tot_loss[loss=0.1, beats_loss=0.01024, ecapa_loss=0.0001394, whisper_loss=0.08838, over 3478426.01 frames. ], batch size: 88, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:41:33,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4450990.0, ans=0.1 2024-08-19 11:41:36,725 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 35 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-19 11:41:52,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4451090.0, ans=0.0 2024-08-19 11:41:58,495 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-19 11:42:31,925 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.295e+01 2.524e+01 2.813e+01 3.428e+01, threshold=5.047e+01, percent-clipped=0.0 2024-08-19 11:42:35,724 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 11:42:45,232 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 550, loss[loss=0.1062, beats_loss=0.009362, ecapa_loss=0.0001565, whisper_loss=0.0953, over 22492.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01034, ecapa_loss=0.0001382, whisper_loss=0.08896, over 3591568.65 frames. ], batch size: 91, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:42:52,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4451390.0, ans=0.125 2024-08-19 11:42:56,805 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 22 from LS+wenet, 17 from Vox, 17 fro AS 2024-08-19 11:43:00,408 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 27 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-19 11:43:00,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4451390.0, ans=0.0 2024-08-19 11:43:03,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=4451490.0, ans=0.1 2024-08-19 11:43:16,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4451490.0, ans=0.125 2024-08-19 11:43:24,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4451590.0, ans=0.0 2024-08-19 11:43:30,765 WARNING [optim.py:496] (0/4) Scaling gradients by 0.03087170422077179, model_norm_threshold=50.47096252441406 2024-08-19 11:43:30,965 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.400e+05, grad_sumsq=1.036e+05, orig_rms_sq=3.283e+00 2024-08-19 11:43:35,745 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 13 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-19 11:43:48,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4451690.0, ans=0.125 2024-08-19 11:43:50,659 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-19 11:43:59,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4451790.0, ans=0.0 2024-08-19 11:44:00,799 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-19 11:44:15,150 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 600, loss[loss=0.1104, beats_loss=0.01066, ecapa_loss=0.0001498, whisper_loss=0.09828, over 19093.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01027, ecapa_loss=0.0001394, whisper_loss=0.08989, over 3631214.18 frames. ], batch size: 77, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:44:19,334 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 27 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-19 11:44:41,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4451990.0, ans=0.125 2024-08-19 11:44:49,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4452090.0, ans=0.05 2024-08-19 11:45:03,593 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4452190.0, ans=0.2 2024-08-19 11:45:06,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4452190.0, ans=0.125 2024-08-19 11:45:11,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4452190.0, ans=0.2 2024-08-19 11:45:15,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4452190.0, ans=0.125 2024-08-19 11:45:17,858 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.08 vs. limit=6.0 2024-08-19 11:45:26,794 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.353e+01 2.580e+01 2.899e+01 1.635e+03, threshold=5.160e+01, percent-clipped=1.0 2024-08-19 11:45:30,914 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 35 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-19 11:45:37,609 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-19 11:45:40,115 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 650, loss[loss=0.117, beats_loss=0.007632, ecapa_loss=0.0002063, whisper_loss=0.1073, over 19291.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01021, ecapa_loss=0.0001401, whisper_loss=0.08944, over 3655786.67 frames. ], batch size: 80, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:45:40,712 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 34 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-19 11:45:41,979 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 13 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-19 11:45:43,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4452390.0, ans=0.0 2024-08-19 11:46:06,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4452490.0, ans=0.0 2024-08-19 11:46:12,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4452490.0, ans=0.2 2024-08-19 11:46:28,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4452590.0, ans=0.125 2024-08-19 11:47:06,643 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 36 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-19 11:47:08,488 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 700, loss[loss=0.118, beats_loss=0.009463, ecapa_loss=0.0001427, whisper_loss=0.1071, over 23766.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01026, ecapa_loss=0.00014, whisper_loss=0.08972, over 3752787.98 frames. ], batch size: 92, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:47:08,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4452890.0, ans=0.125 2024-08-19 11:47:13,720 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.09 vs. limit=15.0 2024-08-19 11:47:48,867 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.56 vs. limit=12.0 2024-08-19 11:48:06,710 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 23 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-19 11:48:06,992 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.09 vs. limit=15.0 2024-08-19 11:48:15,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4453190.0, ans=0.125 2024-08-19 11:48:19,951 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 38 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-19 11:48:28,046 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.237e+01 2.442e+01 2.764e+01 3.734e+01, threshold=4.884e+01, percent-clipped=0.0 2024-08-19 11:48:42,449 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 750, loss[loss=0.09827, beats_loss=0.01066, ecapa_loss=0.0001177, whisper_loss=0.08643, over 16323.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01032, ecapa_loss=0.00014, whisper_loss=0.08952, over 3730796.91 frames. ], batch size: 63, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:48:52,541 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 35 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-19 11:49:03,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4453490.0, ans=0.0 2024-08-19 11:49:11,675 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.63 vs. limit=15.0 2024-08-19 11:49:23,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4453590.0, ans=0.125 2024-08-19 11:49:25,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4453590.0, ans=0.035 2024-08-19 11:49:29,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4453590.0, ans=0.0 2024-08-19 11:49:32,855 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-19 11:49:42,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4453690.0, ans=0.1 2024-08-19 11:50:09,297 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-19 11:50:10,548 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 800, loss[loss=0.1135, beats_loss=0.01169, ecapa_loss=0.0001222, whisper_loss=0.1006, over 22933.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01026, ecapa_loss=0.0001409, whisper_loss=0.09019, over 3756133.67 frames. ], batch size: 89, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:50:24,968 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 14 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-19 11:50:31,341 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 21 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-19 11:50:51,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4454090.0, ans=0.125 2024-08-19 11:50:58,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4454090.0, ans=0.1 2024-08-19 11:51:03,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4454090.0, ans=0.1 2024-08-19 11:51:06,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4454190.0, ans=0.125 2024-08-19 11:51:11,132 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.12 vs. limit=15.0 2024-08-19 11:51:23,578 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-19 11:51:27,714 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.10 vs. limit=15.0 2024-08-19 11:51:28,476 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.213e+01 2.446e+01 2.635e+01 4.034e+01, threshold=4.891e+01, percent-clipped=0.0 2024-08-19 11:51:28,668 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 12 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-19 11:51:45,075 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 850, loss[loss=0.1016, beats_loss=0.009086, ecapa_loss=0.0001706, whisper_loss=0.09076, over 22639.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01022, ecapa_loss=0.0001399, whisper_loss=0.09026, over 3775961.32 frames. ], batch size: 94, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:51:49,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4454390.0, ans=0.0 2024-08-19 11:51:56,088 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 28 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-19 11:52:16,991 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 26 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-19 11:52:18,497 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 11:52:20,238 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 22 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-19 11:52:49,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4454690.0, ans=0.125 2024-08-19 11:52:53,646 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.745e+01 2024-08-19 11:53:15,961 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 900, loss[loss=0.07502, beats_loss=0.0103, ecapa_loss=0.0001363, whisper_loss=0.06335, over 14892.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01031, ecapa_loss=0.000139, whisper_loss=0.08951, over 3797084.75 frames. ], batch size: 58, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:53:29,463 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 22 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-19 11:53:31,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4454890.0, ans=0.0 2024-08-19 11:53:43,359 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 35 from LS+wenet, 12 from Vox, 39 fro AS 2024-08-19 11:53:52,957 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 16 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-19 11:54:11,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4455190.0, ans=0.0 2024-08-19 11:54:13,060 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 25 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-19 11:54:21,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4455190.0, ans=0.125 2024-08-19 11:54:28,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4455290.0, ans=0.125 2024-08-19 11:54:33,154 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.958e+01 2.411e+01 2.627e+01 3.077e+01 2.229e+02, threshold=5.255e+01, percent-clipped=1.0 2024-08-19 11:54:33,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4455290.0, ans=0.125 2024-08-19 11:54:49,810 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 950, loss[loss=0.1027, beats_loss=0.01379, ecapa_loss=0.0001013, whisper_loss=0.08787, over 20908.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01035, ecapa_loss=0.0001382, whisper_loss=0.08878, over 3777141.24 frames. ], batch size: 80, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:54:57,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4455390.0, ans=0.2 2024-08-19 11:55:02,207 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.75 vs. limit=15.0 2024-08-19 11:55:10,163 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=11.51 vs. limit=12.0 2024-08-19 11:55:11,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4455490.0, ans=0.05 2024-08-19 11:55:27,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4455590.0, ans=0.0 2024-08-19 11:56:06,793 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-19 11:56:08,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4455790.0, ans=0.07 2024-08-19 11:56:16,795 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 23 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-19 11:56:22,912 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 1000, loss[loss=0.09235, beats_loss=0.01215, ecapa_loss=0.0001452, whisper_loss=0.07875, over 14357.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01038, ecapa_loss=0.0001379, whisper_loss=0.089, over 3768364.01 frames. ], batch size: 60, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:56:26,313 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-19 11:56:28,128 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-19 11:56:36,781 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-19 11:56:43,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4455990.0, ans=0.0 2024-08-19 11:56:53,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4455990.0, ans=0.125 2024-08-19 11:56:57,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4455990.0, ans=0.125 2024-08-19 11:56:57,606 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.61 vs. limit=15.0 2024-08-19 11:57:05,426 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-19 11:57:05,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4456090.0, ans=0.125 2024-08-19 11:57:07,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4456090.0, ans=0.09899494936611666 2024-08-19 11:57:36,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4456290.0, ans=0.0 2024-08-19 11:57:40,062 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.280e+01 2.488e+01 2.674e+01 4.382e+01, threshold=4.976e+01, percent-clipped=0.0 2024-08-19 11:57:49,318 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 28 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-19 11:57:51,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=4456290.0, ans=0.025 2024-08-19 11:57:57,846 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 1050, loss[loss=0.114, beats_loss=0.01063, ecapa_loss=0.0001147, whisper_loss=0.1023, over 19693.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01039, ecapa_loss=0.0001371, whisper_loss=0.08869, over 3813617.26 frames. ], batch size: 75, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:58:04,714 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.66 vs. limit=22.5 2024-08-19 11:58:15,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4456490.0, ans=0.125 2024-08-19 11:58:26,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4456490.0, ans=0.2 2024-08-19 11:58:37,119 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 15 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-19 11:58:50,355 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-19 11:58:58,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4456690.0, ans=0.0 2024-08-19 11:59:19,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4456790.0, ans=0.1 2024-08-19 11:59:29,755 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 1100, loss[loss=0.1053, beats_loss=0.01206, ecapa_loss=0.0001297, whisper_loss=0.09195, over 22313.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01035, ecapa_loss=0.000137, whisper_loss=0.08868, over 3784270.44 frames. ], batch size: 91, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:59:54,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4456990.0, ans=0.125 2024-08-19 12:00:13,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4457090.0, ans=0.0 2024-08-19 12:00:38,229 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.95 vs. limit=22.5 2024-08-19 12:00:44,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=4457190.0, ans=0.05 2024-08-19 12:00:50,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4457190.0, ans=0.125 2024-08-19 12:00:59,220 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.392e+01 2.653e+01 2.944e+01 8.213e+01, threshold=5.307e+01, percent-clipped=1.0 2024-08-19 12:01:01,245 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 18 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-19 12:01:01,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4457290.0, ans=0.125 2024-08-19 12:01:10,442 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 1150, loss[loss=0.08773, beats_loss=0.007777, ecapa_loss=0.0001373, whisper_loss=0.07858, over 14756.00 frames. ], tot_loss[loss=0.09973, beats_loss=0.01043, ecapa_loss=0.0001369, whisper_loss=0.08793, over 3800121.54 frames. ], batch size: 56, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:01:16,298 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 19 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-19 12:01:21,568 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.35 vs. limit=15.0 2024-08-19 12:01:23,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4457390.0, ans=0.125 2024-08-19 12:01:31,376 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 27 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-19 12:01:34,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4457490.0, ans=0.125 2024-08-19 12:02:15,546 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 12:02:25,440 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 18 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-19 12:02:43,284 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 1200, loss[loss=0.1051, beats_loss=0.009281, ecapa_loss=0.0001664, whisper_loss=0.09417, over 16753.00 frames. ], tot_loss[loss=0.1, beats_loss=0.01042, ecapa_loss=0.0001372, whisper_loss=0.08821, over 3781932.34 frames. ], batch size: 65, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:02:47,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4457890.0, ans=0.2 2024-08-19 12:03:37,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4458190.0, ans=0.125 2024-08-19 12:03:51,248 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 23 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-19 12:04:04,454 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.704e+01 2.388e+01 2.700e+01 3.012e+01 6.792e+01, threshold=5.400e+01, percent-clipped=1.0 2024-08-19 12:04:08,033 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 36 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-19 12:04:12,301 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.75 vs. limit=15.0 2024-08-19 12:04:18,807 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 1250, loss[loss=0.1041, beats_loss=0.01085, ecapa_loss=0.0001396, whisper_loss=0.09182, over 23033.00 frames. ], tot_loss[loss=0.09982, beats_loss=0.01046, ecapa_loss=0.0001376, whisper_loss=0.08798, over 3784525.28 frames. ], batch size: 87, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:04:24,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4458390.0, ans=0.0 2024-08-19 12:04:28,383 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 24 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-19 12:04:32,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4458390.0, ans=0.125 2024-08-19 12:04:34,273 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-19 12:04:44,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4458490.0, ans=0.1 2024-08-19 12:05:16,190 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 31 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-19 12:05:49,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=4458790.0, ans=0.02 2024-08-19 12:05:56,313 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 1300, loss[loss=0.1058, beats_loss=0.01004, ecapa_loss=0.0001565, whisper_loss=0.09417, over 21716.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01052, ecapa_loss=0.0001373, whisper_loss=0.08818, over 3813846.08 frames. ], batch size: 88, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:05:59,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4458890.0, ans=0.125 2024-08-19 12:06:01,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4458890.0, ans=0.1 2024-08-19 12:06:09,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4458890.0, ans=0.0 2024-08-19 12:06:14,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4458990.0, ans=0.2 2024-08-19 12:06:25,480 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4458990.0, ans=0.125 2024-08-19 12:06:57,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4459190.0, ans=0.125 2024-08-19 12:06:59,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4459190.0, ans=0.1 2024-08-19 12:07:08,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4459190.0, ans=0.2 2024-08-19 12:07:11,562 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.06 vs. limit=10.0 2024-08-19 12:07:17,548 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.691e+01 2.249e+01 2.463e+01 2.703e+01 4.319e+01, threshold=4.927e+01, percent-clipped=0.0 2024-08-19 12:07:28,870 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 16 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-19 12:07:32,295 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 1350, loss[loss=0.08434, beats_loss=0.0132, ecapa_loss=0.0001032, whisper_loss=0.0701, over 18909.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01048, ecapa_loss=0.0001376, whisper_loss=0.08865, over 3840765.33 frames. ], batch size: 72, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:08:04,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4459490.0, ans=0.0 2024-08-19 12:08:06,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4459490.0, ans=0.125 2024-08-19 12:08:08,281 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.27 vs. limit=15.0 2024-08-19 12:08:13,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4459590.0, ans=0.125 2024-08-19 12:08:32,800 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 35 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-19 12:08:53,123 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-19 12:08:59,897 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.04 vs. limit=15.0 2024-08-19 12:09:01,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4459790.0, ans=0.125 2024-08-19 12:09:04,126 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 1400, loss[loss=0.1119, beats_loss=0.009044, ecapa_loss=0.0001811, whisper_loss=0.101, over 22530.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01036, ecapa_loss=0.0001386, whisper_loss=0.08941, over 3846649.52 frames. ], batch size: 94, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:09:19,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4459990.0, ans=0.0 2024-08-19 12:09:32,067 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.82 vs. limit=22.5 2024-08-19 12:09:51,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4460190.0, ans=0.1 2024-08-19 12:10:03,042 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.66 vs. limit=15.0 2024-08-19 12:10:11,811 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.232e+01 2.433e+01 2.753e+01 5.485e+01, threshold=4.866e+01, percent-clipped=1.0 2024-08-19 12:10:19,986 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 28 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-19 12:10:23,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4460290.0, ans=0.0 2024-08-19 12:10:26,669 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 1450, loss[loss=0.1093, beats_loss=0.01019, ecapa_loss=0.0001149, whisper_loss=0.09799, over 24069.00 frames. ], tot_loss[loss=0.09985, beats_loss=0.01041, ecapa_loss=0.0001379, whisper_loss=0.08806, over 3837089.32 frames. ], batch size: 92, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:11:30,332 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-19 12:11:34,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4460590.0, ans=0.1 2024-08-19 12:11:36,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4460590.0, ans=0.125 2024-08-19 12:11:54,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4460690.0, ans=0.125 2024-08-19 12:12:02,452 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 21 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-19 12:12:13,991 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 23 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-19 12:12:17,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4460790.0, ans=0.1 2024-08-19 12:12:26,462 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 1500, loss[loss=0.1266, beats_loss=0.009941, ecapa_loss=0.0001584, whisper_loss=0.1151, over 15677.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01042, ecapa_loss=0.0001374, whisper_loss=0.08833, over 3813209.24 frames. ], batch size: 62, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:12:42,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4460890.0, ans=0.0 2024-08-19 12:12:56,087 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-19 12:13:01,242 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 22 from LS+wenet, 13 from Vox, 19 fro AS 2024-08-19 12:13:04,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4461090.0, ans=0.125 2024-08-19 12:13:14,248 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.22 vs. limit=15.0 2024-08-19 12:13:15,812 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 27 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-19 12:13:17,450 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-19 12:13:17,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4461190.0, ans=0.0 2024-08-19 12:13:19,153 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-19 12:13:41,563 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.290e+01 2.508e+01 2.829e+01 3.950e+01, threshold=5.015e+01, percent-clipped=0.0 2024-08-19 12:13:56,717 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 1550, loss[loss=0.08502, beats_loss=0.0134, ecapa_loss=0.0001128, whisper_loss=0.07049, over 19815.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01042, ecapa_loss=0.0001375, whisper_loss=0.08863, over 3827176.83 frames. ], batch size: 82, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:14:04,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4461390.0, ans=0.0 2024-08-19 12:14:12,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4461390.0, ans=0.125 2024-08-19 12:14:14,171 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 22 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-19 12:14:21,185 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-19 12:15:29,691 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 1600, loss[loss=0.1224, beats_loss=0.008031, ecapa_loss=0.0001501, whisper_loss=0.1129, over 20828.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01043, ecapa_loss=0.0001371, whisper_loss=0.0893, over 3854735.73 frames. ], batch size: 80, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:15:30,102 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-19 12:15:36,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4461890.0, ans=0.125 2024-08-19 12:15:37,891 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 36 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-19 12:15:38,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4461890.0, ans=0.125 2024-08-19 12:15:39,851 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-19 12:15:47,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4461990.0, ans=0.0 2024-08-19 12:15:50,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4461990.0, ans=0.0 2024-08-19 12:16:09,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4462090.0, ans=0.1 2024-08-19 12:16:15,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4462090.0, ans=0.125 2024-08-19 12:16:25,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4462190.0, ans=0.1 2024-08-19 12:16:34,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4462190.0, ans=0.125 2024-08-19 12:16:43,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4462290.0, ans=0.125 2024-08-19 12:16:45,902 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.703e+01 2.361e+01 2.575e+01 2.927e+01 3.831e+01, threshold=5.149e+01, percent-clipped=0.0 2024-08-19 12:16:57,822 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 1650, loss[loss=0.08874, beats_loss=0.01188, ecapa_loss=0.0001393, whisper_loss=0.07547, over 22366.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01037, ecapa_loss=0.0001373, whisper_loss=0.08977, over 3851913.46 frames. ], batch size: 92, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:16:58,766 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.51 vs. limit=15.0 2024-08-19 12:17:32,907 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-19 12:17:55,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4462690.0, ans=0.1 2024-08-19 12:18:15,220 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.73 vs. limit=22.5 2024-08-19 12:18:18,785 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.64 vs. limit=12.0 2024-08-19 12:18:23,137 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.88 vs. limit=6.0 2024-08-19 12:18:29,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4462790.0, ans=0.125 2024-08-19 12:18:35,150 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 1700, loss[loss=0.0926, beats_loss=0.01118, ecapa_loss=0.0001298, whisper_loss=0.08012, over 19772.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01037, ecapa_loss=0.0001369, whisper_loss=0.08949, over 3849957.85 frames. ], batch size: 78, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:18:41,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4462890.0, ans=0.2 2024-08-19 12:18:45,896 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-19 12:18:46,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4462890.0, ans=0.0 2024-08-19 12:19:06,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4462990.0, ans=0.0 2024-08-19 12:19:29,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4463190.0, ans=0.0 2024-08-19 12:19:36,434 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.44 vs. limit=15.0 2024-08-19 12:19:39,512 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-19 12:19:44,233 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.732e+01 2.356e+01 2.588e+01 2.791e+01 7.851e+01, threshold=5.177e+01, percent-clipped=4.0 2024-08-19 12:19:45,523 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 13 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-19 12:20:00,998 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 1750, loss[loss=0.09095, beats_loss=0.01037, ecapa_loss=0.0001019, whisper_loss=0.07957, over 15138.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01036, ecapa_loss=0.0001386, whisper_loss=0.08925, over 3824562.96 frames. ], batch size: 56, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:20:11,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4463390.0, ans=0.125 2024-08-19 12:20:17,970 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-19 12:20:26,576 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-19 12:21:03,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4463690.0, ans=0.125 2024-08-19 12:21:20,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4463690.0, ans=0.1 2024-08-19 12:21:25,946 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 34 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-19 12:21:43,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4463890.0, ans=0.125 2024-08-19 12:21:45,349 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 1800, loss[loss=0.1113, beats_loss=0.011, ecapa_loss=0.000112, whisper_loss=0.09919, over 21656.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01036, ecapa_loss=0.0001391, whisper_loss=0.08919, over 3844812.18 frames. ], batch size: 80, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:22:04,467 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 27 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-19 12:22:13,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4463990.0, ans=0.05 2024-08-19 12:22:32,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4464090.0, ans=0.1 2024-08-19 12:22:59,615 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 33 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-19 12:23:04,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4464190.0, ans=0.125 2024-08-19 12:23:21,351 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.692e+01 2.194e+01 2.455e+01 2.727e+01 5.354e+01, threshold=4.909e+01, percent-clipped=1.0 2024-08-19 12:23:31,073 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-19 12:23:42,040 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 1850, loss[loss=0.07771, beats_loss=0.01173, ecapa_loss=9.079e-05, whisper_loss=0.06508, over 16020.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01035, ecapa_loss=0.0001379, whisper_loss=0.08918, over 3835324.10 frames. ], batch size: 60, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:24:02,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4464390.0, ans=0.125 2024-08-19 12:24:05,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4464390.0, ans=0.1 2024-08-19 12:24:08,385 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 17 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-19 12:24:32,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4464490.0, ans=0.125 2024-08-19 12:24:46,226 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 12 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-19 12:24:54,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4464590.0, ans=0.125 2024-08-19 12:25:10,805 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-19 12:25:11,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4464690.0, ans=0.1 2024-08-19 12:25:25,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4464790.0, ans=0.125 2024-08-19 12:25:44,924 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 1900, loss[loss=0.1114, beats_loss=0.01084, ecapa_loss=0.0001261, whisper_loss=0.09928, over 17280.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01043, ecapa_loss=0.0001389, whisper_loss=0.08877, over 3832588.33 frames. ], batch size: 68, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:25:54,892 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-19 12:26:09,396 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-19 12:26:15,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4464990.0, ans=0.2 2024-08-19 12:26:19,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=4464990.0, ans=15.0 2024-08-19 12:26:25,281 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.72 vs. limit=15.0 2024-08-19 12:26:33,918 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 16 from LS+wenet, 28 from Vox, 20 fro AS 2024-08-19 12:26:54,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4465090.0, ans=0.125 2024-08-19 12:26:58,208 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 16 from LS+wenet, 12 from Vox, 37 fro AS 2024-08-19 12:27:10,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4465190.0, ans=0.1 2024-08-19 12:27:18,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4465190.0, ans=0.125 2024-08-19 12:27:21,849 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 12:27:28,989 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.98 vs. limit=22.5 2024-08-19 12:27:33,014 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.271e+01 2.463e+01 2.812e+01 5.945e+01, threshold=4.926e+01, percent-clipped=2.0 2024-08-19 12:27:47,459 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 28 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-19 12:27:55,046 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 1950, loss[loss=0.08634, beats_loss=0.01058, ecapa_loss=0.0001406, whisper_loss=0.07436, over 13294.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01045, ecapa_loss=0.0001383, whisper_loss=0.08861, over 3822673.96 frames. ], batch size: 54, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:28:30,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4465490.0, ans=0.1 2024-08-19 12:28:35,102 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 24 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-19 12:28:35,679 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.27 vs. limit=22.5 2024-08-19 12:28:51,550 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-19 12:29:00,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4465590.0, ans=0.125 2024-08-19 12:29:09,825 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.24 vs. limit=15.0 2024-08-19 12:29:13,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4465690.0, ans=0.0 2024-08-19 12:29:15,966 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.15 vs. limit=15.0 2024-08-19 12:29:36,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4465790.0, ans=0.2 2024-08-19 12:29:41,716 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 16 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-19 12:29:43,404 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 2000, loss[loss=0.0718, beats_loss=0.01129, ecapa_loss=0.0001545, whisper_loss=0.05896, over 17769.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01044, ecapa_loss=0.0001384, whisper_loss=0.08847, over 3791966.63 frames. ], batch size: 74, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:30:05,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=4465990.0, ans=0.05 2024-08-19 12:30:29,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4466090.0, ans=0.1 2024-08-19 12:30:54,175 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 33 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-19 12:30:58,697 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-19 12:31:01,173 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.320e+01 2.587e+01 3.185e+01 3.788e+02, threshold=5.175e+01, percent-clipped=4.0 2024-08-19 12:31:14,346 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 2050, loss[loss=0.1072, beats_loss=0.01075, ecapa_loss=0.000126, whisper_loss=0.09523, over 21480.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01046, ecapa_loss=0.0001393, whisper_loss=0.08829, over 3806307.30 frames. ], batch size: 87, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:31:19,015 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.81 vs. limit=10.0 2024-08-19 12:31:19,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4466390.0, ans=0.2 2024-08-19 12:31:32,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4466490.0, ans=0.125 2024-08-19 12:31:38,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4466490.0, ans=0.125 2024-08-19 12:31:47,094 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-19 12:32:08,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4466690.0, ans=0.0 2024-08-19 12:32:26,473 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4466690.0, ans=0.0 2024-08-19 12:32:39,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4466790.0, ans=0.2 2024-08-19 12:32:48,697 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 2100, loss[loss=0.1075, beats_loss=0.01191, ecapa_loss=0.0001258, whisper_loss=0.09432, over 21899.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01043, ecapa_loss=0.0001389, whisper_loss=0.08857, over 3836037.01 frames. ], batch size: 89, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:32:53,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4466890.0, ans=0.035 2024-08-19 12:32:56,722 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 26 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-19 12:33:10,476 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-19 12:33:23,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4466990.0, ans=0.035 2024-08-19 12:33:27,279 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 28 from LS+wenet, 12 from Vox, 20 fro AS 2024-08-19 12:33:29,261 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 25 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-19 12:33:31,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4467090.0, ans=0.0 2024-08-19 12:33:49,275 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4467190.0, ans=0.0 2024-08-19 12:34:10,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4467290.0, ans=0.125 2024-08-19 12:34:11,809 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.656e+01 2.305e+01 2.500e+01 2.818e+01 4.309e+01, threshold=5.000e+01, percent-clipped=0.0 2024-08-19 12:34:12,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4467290.0, ans=0.125 2024-08-19 12:34:28,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4467390.0, ans=0.2 2024-08-19 12:34:29,976 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 2150, loss[loss=0.09689, beats_loss=0.01096, ecapa_loss=0.0001192, whisper_loss=0.08473, over 19924.00 frames. ], tot_loss[loss=0.09944, beats_loss=0.01054, ecapa_loss=0.0001379, whisper_loss=0.08753, over 3835000.59 frames. ], batch size: 79, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:34:34,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4467390.0, ans=0.125 2024-08-19 12:35:02,084 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.70 vs. limit=15.0 2024-08-19 12:35:07,389 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-19 12:35:17,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4467590.0, ans=0.125 2024-08-19 12:35:27,730 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.993e+01 2024-08-19 12:35:36,546 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=15.0 2024-08-19 12:35:47,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4467790.0, ans=0.05 2024-08-19 12:35:56,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4467790.0, ans=0.125 2024-08-19 12:35:59,538 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 21 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-19 12:36:01,843 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 2200, loss[loss=0.1077, beats_loss=0.01014, ecapa_loss=0.0001612, whisper_loss=0.09593, over 13202.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01049, ecapa_loss=0.0001366, whisper_loss=0.0887, over 3788435.18 frames. ], batch size: 55, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:36:31,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4467990.0, ans=0.1 2024-08-19 12:36:33,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4467990.0, ans=0.0 2024-08-19 12:36:56,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4468190.0, ans=0.0 2024-08-19 12:37:06,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4468190.0, ans=0.0 2024-08-19 12:37:06,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4468190.0, ans=0.2 2024-08-19 12:37:12,477 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 26 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-19 12:37:15,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4468190.0, ans=0.125 2024-08-19 12:37:16,914 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.16 vs. limit=12.0 2024-08-19 12:37:21,688 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.215e+01 2.622e+01 2.935e+01 2.277e+02, threshold=5.244e+01, percent-clipped=1.0 2024-08-19 12:37:34,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4468290.0, ans=0.0 2024-08-19 12:37:38,371 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 2250, loss[loss=0.08339, beats_loss=0.01275, ecapa_loss=0.0001162, whisper_loss=0.06948, over 19600.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01054, ecapa_loss=0.0001383, whisper_loss=0.08858, over 3833712.13 frames. ], batch size: 75, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:37:44,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4468390.0, ans=0.0 2024-08-19 12:37:48,606 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 25 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-19 12:38:20,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4468590.0, ans=0.0 2024-08-19 12:38:44,703 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 27 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-19 12:39:01,249 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-19 12:39:12,480 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4468790.0, ans=0.1 2024-08-19 12:39:24,247 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 2300, loss[loss=0.1033, beats_loss=0.01143, ecapa_loss=0.0001067, whisper_loss=0.09082, over 17986.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01054, ecapa_loss=0.000138, whisper_loss=0.08929, over 3874505.57 frames. ], batch size: 69, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:40:01,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4469090.0, ans=0.125 2024-08-19 12:40:10,108 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=15.72 vs. limit=15.0 2024-08-19 12:40:40,234 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.339e+01 2.578e+01 2.838e+01 4.503e+01, threshold=5.155e+01, percent-clipped=1.0 2024-08-19 12:40:54,648 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 2350, loss[loss=0.08001, beats_loss=0.01176, ecapa_loss=9.436e-05, whisper_loss=0.0673, over 15457.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01046, ecapa_loss=0.0001384, whisper_loss=0.09015, over 3851553.99 frames. ], batch size: 60, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 12:41:13,404 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 19 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-19 12:41:35,125 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.44 vs. limit=22.5 2024-08-19 12:41:40,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4469590.0, ans=0.035 2024-08-19 12:41:49,284 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.99 vs. limit=15.0 2024-08-19 12:41:57,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4469690.0, ans=0.0 2024-08-19 12:42:05,732 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.82 vs. limit=22.5 2024-08-19 12:42:16,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4469790.0, ans=0.2 2024-08-19 12:42:28,525 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 2400, loss[loss=0.09432, beats_loss=0.01115, ecapa_loss=0.0001481, whisper_loss=0.0817, over 21239.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01041, ecapa_loss=0.0001382, whisper_loss=0.08988, over 3831747.82 frames. ], batch size: 87, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 12:42:28,626 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-19 12:42:28,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4469890.0, ans=0.0 2024-08-19 12:42:40,217 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 35 from LS+wenet, 7 from Vox, 29 fro AS 2024-08-19 12:42:42,774 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.77 vs. limit=6.0 2024-08-19 12:42:49,076 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 31 from Vox, 30 fro AS 2024-08-19 12:42:56,600 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 25 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-19 12:43:00,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4469990.0, ans=0.125 2024-08-19 12:43:07,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4470090.0, ans=0.1 2024-08-19 12:43:07,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4470090.0, ans=0.035 2024-08-19 12:43:10,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4470090.0, ans=0.0 2024-08-19 12:43:11,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4470090.0, ans=0.125 2024-08-19 12:43:23,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4470190.0, ans=0.1 2024-08-19 12:43:23,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4470190.0, ans=0.125 2024-08-19 12:43:25,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4470190.0, ans=0.1 2024-08-19 12:43:30,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4470190.0, ans=0.125 2024-08-19 12:43:45,929 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.014e+01 2.427e+01 2.678e+01 2.913e+01 4.859e+01, threshold=5.356e+01, percent-clipped=0.0 2024-08-19 12:43:57,766 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 2450, loss[loss=0.1016, beats_loss=0.007026, ecapa_loss=0.0001506, whisper_loss=0.09306, over 15289.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01037, ecapa_loss=0.0001383, whisper_loss=0.09, over 3833134.56 frames. ], batch size: 59, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 12:44:12,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4470490.0, ans=0.0 2024-08-19 12:44:35,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4470590.0, ans=0.09899494936611666 2024-08-19 12:44:44,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4470590.0, ans=0.125 2024-08-19 12:44:47,889 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 20 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-19 12:44:53,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4470690.0, ans=0.05 2024-08-19 12:44:54,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4470690.0, ans=0.2 2024-08-19 12:44:56,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4470690.0, ans=0.2 2024-08-19 12:44:58,134 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-19 12:45:07,942 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 12:45:08,963 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-19 12:45:15,961 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 19 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-19 12:45:25,561 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 2500, loss[loss=0.1006, beats_loss=0.01079, ecapa_loss=0.0001249, whisper_loss=0.0886, over 18286.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01038, ecapa_loss=0.0001381, whisper_loss=0.0894, over 3815089.43 frames. ], batch size: 70, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 12:45:33,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4470890.0, ans=0.125 2024-08-19 12:45:37,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4470890.0, ans=0.125 2024-08-19 12:45:50,268 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.89 vs. limit=22.5 2024-08-19 12:45:51,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4470990.0, ans=0.125 2024-08-19 12:46:08,406 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 33 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-19 12:46:24,808 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 25 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-19 12:46:29,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4471190.0, ans=0.125 2024-08-19 12:46:37,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4471190.0, ans=0.125 2024-08-19 12:46:43,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4471290.0, ans=0.07 2024-08-19 12:46:44,875 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.332e+01 2.578e+01 2.868e+01 6.708e+01, threshold=5.156e+01, percent-clipped=1.0 2024-08-19 12:46:49,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4471290.0, ans=0.07 2024-08-19 12:46:56,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4471290.0, ans=0.0 2024-08-19 12:46:59,033 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 2550, loss[loss=0.08312, beats_loss=0.0126, ecapa_loss=0.0001372, whisper_loss=0.06915, over 21756.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01034, ecapa_loss=0.0001383, whisper_loss=0.09034, over 3834650.13 frames. ], batch size: 91, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 12:47:51,873 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 12:47:53,362 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 21 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-19 12:47:59,156 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.35 vs. limit=15.0 2024-08-19 12:48:05,616 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 36 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-19 12:48:22,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4471890.0, ans=0.1 2024-08-19 12:48:23,863 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 2600, loss[loss=0.08805, beats_loss=0.01122, ecapa_loss=0.0001552, whisper_loss=0.07528, over 21968.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01033, ecapa_loss=0.0001379, whisper_loss=0.09072, over 3839750.13 frames. ], batch size: 91, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 12:48:28,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4471890.0, ans=0.0 2024-08-19 12:48:28,325 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.37 vs. limit=15.0 2024-08-19 12:48:40,925 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.13 vs. limit=10.0 2024-08-19 12:48:42,078 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 16 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-19 12:48:48,727 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-19 12:49:08,696 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 26 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-19 12:49:27,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4472190.0, ans=0.125 2024-08-19 12:49:39,412 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.420e+01 2.627e+01 2.917e+01 8.623e+01, threshold=5.255e+01, percent-clipped=2.0 2024-08-19 12:49:48,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4472290.0, ans=0.0 2024-08-19 12:49:52,260 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-19 12:49:54,073 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 2650, loss[loss=0.09904, beats_loss=0.01055, ecapa_loss=0.0001594, whisper_loss=0.0869, over 21708.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01031, ecapa_loss=0.000139, whisper_loss=0.09106, over 3877540.25 frames. ], batch size: 88, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 12:50:04,918 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4472390.0, ans=0.0 2024-08-19 12:50:29,059 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-19 12:50:33,440 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.52 vs. limit=15.0 2024-08-19 12:50:38,460 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-19 12:51:21,958 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 26 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-19 12:51:23,216 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 2700, loss[loss=0.1124, beats_loss=0.009559, ecapa_loss=0.000135, whisper_loss=0.1015, over 18839.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01035, ecapa_loss=0.000139, whisper_loss=0.09057, over 3911702.55 frames. ], batch size: 73, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 12:51:36,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4472890.0, ans=0.125 2024-08-19 12:52:09,308 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 21 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-19 12:52:21,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4473190.0, ans=0.0 2024-08-19 12:52:28,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4473190.0, ans=0.125 2024-08-19 12:52:30,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4473290.0, ans=0.025 2024-08-19 12:52:34,861 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.658e+01 2.264e+01 2.469e+01 2.717e+01 4.475e+01, threshold=4.939e+01, percent-clipped=0.0 2024-08-19 12:52:48,106 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 2750, loss[loss=0.1326, beats_loss=0.007674, ecapa_loss=0.0001315, whisper_loss=0.1236, over 21138.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01034, ecapa_loss=0.0001383, whisper_loss=0.09055, over 3887930.71 frames. ], batch size: 78, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 12:52:51,373 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-19 12:52:56,911 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-19 12:53:01,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4473390.0, ans=0.125 2024-08-19 12:53:24,772 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 16 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-19 12:53:48,093 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-19 12:54:10,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4473790.0, ans=0.125 2024-08-19 12:54:17,686 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 2800, loss[loss=0.05424, beats_loss=0.009961, ecapa_loss=0.0001594, whisper_loss=0.04269, over 12641.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01039, ecapa_loss=0.0001386, whisper_loss=0.08998, over 3849880.21 frames. ], batch size: 53, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 12:54:21,016 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 27 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-19 12:54:21,576 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.14 vs. limit=15.0 2024-08-19 12:54:25,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4473890.0, ans=0.125 2024-08-19 12:54:30,751 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.69 vs. limit=22.5 2024-08-19 12:54:39,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4473990.0, ans=0.1 2024-08-19 12:55:06,925 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 21 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-19 12:55:08,932 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 22 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-19 12:55:14,135 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-19 12:55:19,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4474190.0, ans=0.025 2024-08-19 12:55:19,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4474190.0, ans=0.125 2024-08-19 12:55:23,326 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 26 from Vox, 23 fro AS 2024-08-19 12:55:32,307 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 20 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-19 12:55:33,886 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.412e+01 2.633e+01 2.866e+01 1.596e+02, threshold=5.265e+01, percent-clipped=3.0 2024-08-19 12:55:50,094 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 2850, loss[loss=0.08716, beats_loss=0.008134, ecapa_loss=0.0001855, whisper_loss=0.07717, over 14855.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01048, ecapa_loss=0.0001378, whisper_loss=0.08945, over 3877480.58 frames. ], batch size: 60, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 12:55:52,317 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-19 12:55:54,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4474390.0, ans=0.0 2024-08-19 12:55:58,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4474390.0, ans=0.2 2024-08-19 12:56:04,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4474490.0, ans=0.125 2024-08-19 12:56:22,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4474590.0, ans=0.125 2024-08-19 12:56:36,451 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 20 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-19 12:56:43,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4474690.0, ans=0.1 2024-08-19 12:56:52,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4474690.0, ans=0.015 2024-08-19 12:57:06,528 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.35 vs. limit=22.5 2024-08-19 12:57:14,194 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 2900, loss[loss=0.0894, beats_loss=0.01339, ecapa_loss=0.0001036, whisper_loss=0.07497, over 21352.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01052, ecapa_loss=0.0001383, whisper_loss=0.08938, over 3869534.59 frames. ], batch size: 84, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 12:57:41,833 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=15.0 2024-08-19 12:57:43,473 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-19 12:58:00,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4475090.0, ans=0.2 2024-08-19 12:58:04,927 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-19 12:58:09,932 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-19 12:58:30,170 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.714e+01 2.344e+01 2.665e+01 2.999e+01 6.021e+01, threshold=5.330e+01, percent-clipped=1.0 2024-08-19 12:58:44,975 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 2950, loss[loss=0.09458, beats_loss=0.01209, ecapa_loss=0.0001274, whisper_loss=0.08122, over 22181.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01045, ecapa_loss=0.0001393, whisper_loss=0.09001, over 3881812.18 frames. ], batch size: 93, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 12:58:49,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=4475390.0, ans=15.0 2024-08-19 12:59:30,637 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.00 vs. limit=15.0 2024-08-19 12:59:39,186 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.97 vs. limit=15.0 2024-08-19 12:59:47,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4475690.0, ans=0.125 2024-08-19 13:00:13,954 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 3000, loss[loss=0.09821, beats_loss=0.01017, ecapa_loss=0.0001484, whisper_loss=0.08656, over 21556.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01042, ecapa_loss=0.0001393, whisper_loss=0.08984, over 3874189.74 frames. ], batch size: 84, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 13:00:13,955 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-19 13:00:40,294 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.0865, 1.9287, 2.1419, 1.7274], device='cuda:0') 2024-08-19 13:00:59,264 INFO [train_multi_KD3.py:1149] (0/4) Epoch 31, validation on ASR_libri: loss=0.2548, beats_loss=0, ecapa_loss=0.0005195, whisper_loss=0.2496, over 922467.00 frames. 2024-08-19 13:01:17,449 INFO [train_multi_KD3.py:1149] (0/4) Epoch 31, validation on SV_voxceleb1: loss=0.003921, beats_loss=0, ecapa_loss=0.0003921, whisper_loss=0, over 939242.00 frames. 2024-08-19 13:02:01,000 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.5520, 1.6231, 1.6999, 1.9666], device='cuda:0') 2024-08-19 13:03:07,050 INFO [train_multi_KD3.py:1149] (0/4) Epoch 31, validation on AT_audioset: loss=0.02304, beats_loss=0.02304, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 13:03:07,056 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32389MB 2024-08-19 13:03:07,294 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 11 from Vox, 39 fro AS 2024-08-19 13:03:12,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4475890.0, ans=0.125 2024-08-19 13:03:20,399 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 19 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-19 13:03:37,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4476090.0, ans=0.1 2024-08-19 13:03:50,857 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-19 13:03:54,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4476190.0, ans=0.125 2024-08-19 13:03:55,046 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.60 vs. limit=12.0 2024-08-19 13:03:57,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4476190.0, ans=0.0 2024-08-19 13:04:16,073 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.421e+01 2.691e+01 3.119e+01 4.810e+01, threshold=5.383e+01, percent-clipped=0.0 2024-08-19 13:04:16,592 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4476290.0, ans=0.125 2024-08-19 13:04:23,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4476290.0, ans=0.0 2024-08-19 13:04:24,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4476290.0, ans=0.125 2024-08-19 13:04:26,437 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 22 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-19 13:04:30,501 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 3050, loss[loss=0.1031, beats_loss=0.01072, ecapa_loss=0.0001359, whisper_loss=0.09105, over 20517.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01049, ecapa_loss=0.0001392, whisper_loss=0.08917, over 3861457.12 frames. ], batch size: 81, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 13:04:37,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4476390.0, ans=0.1 2024-08-19 13:04:53,531 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 27 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-19 13:05:07,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4476590.0, ans=0.0 2024-08-19 13:05:25,918 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4476690.0, ans=0.125 2024-08-19 13:05:25,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4476690.0, ans=0.125 2024-08-19 13:05:52,368 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.77 vs. limit=15.0 2024-08-19 13:05:57,901 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.405e+00 2024-08-19 13:06:00,816 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 3100, loss[loss=0.1038, beats_loss=0.009042, ecapa_loss=0.0001512, whisper_loss=0.09323, over 18077.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01048, ecapa_loss=0.0001396, whisper_loss=0.09, over 3838103.43 frames. ], batch size: 72, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 13:06:04,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4476890.0, ans=0.2 2024-08-19 13:06:06,794 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.02 vs. limit=15.0 2024-08-19 13:06:20,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4476990.0, ans=0.0 2024-08-19 13:06:34,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4477090.0, ans=0.0 2024-08-19 13:06:34,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4477090.0, ans=0.125 2024-08-19 13:06:45,675 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-19 13:06:56,361 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 24 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-19 13:07:02,561 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.74 vs. limit=15.0 2024-08-19 13:07:04,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4477190.0, ans=0.125 2024-08-19 13:07:14,612 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.300e+01 2.518e+01 2.909e+01 4.343e+01, threshold=5.036e+01, percent-clipped=0.0 2024-08-19 13:07:16,971 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.38 vs. limit=15.0 2024-08-19 13:07:19,959 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-19 13:07:27,346 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 3150, loss[loss=0.08961, beats_loss=0.01215, ecapa_loss=0.0001628, whisper_loss=0.07583, over 20707.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01053, ecapa_loss=0.0001395, whisper_loss=0.08995, over 3840580.41 frames. ], batch size: 87, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 13:07:30,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4477390.0, ans=0.0 2024-08-19 13:07:51,807 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 33 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-19 13:07:56,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4477590.0, ans=0.035 2024-08-19 13:08:05,416 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-19 13:08:50,883 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 3200, loss[loss=0.1195, beats_loss=0.007777, ecapa_loss=0.0001833, whisper_loss=0.1099, over 20058.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01054, ecapa_loss=0.0001398, whisper_loss=0.08989, over 3840583.75 frames. ], batch size: 81, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 13:09:04,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4477890.0, ans=0.125 2024-08-19 13:09:11,009 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 28 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-19 13:09:23,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4477990.0, ans=0.125 2024-08-19 13:09:29,208 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 25 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-19 13:09:37,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4478090.0, ans=0.2 2024-08-19 13:09:53,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4478190.0, ans=0.2 2024-08-19 13:09:55,095 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 22 from LS+wenet, 20 from Vox, 17 fro AS 2024-08-19 13:10:02,209 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-19 13:10:05,697 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.281e+01 2.603e+01 2.850e+01 4.454e+01, threshold=5.206e+01, percent-clipped=0.0 2024-08-19 13:10:09,232 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 23 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-19 13:10:19,922 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 3250, loss[loss=0.12, beats_loss=0.007876, ecapa_loss=0.0001505, whisper_loss=0.1107, over 16749.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01049, ecapa_loss=0.000141, whisper_loss=0.08993, over 3837408.47 frames. ], batch size: 65, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 13:10:28,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4478390.0, ans=0.0 2024-08-19 13:10:39,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4478490.0, ans=0.125 2024-08-19 13:10:39,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4478490.0, ans=0.125 2024-08-19 13:10:43,455 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 27 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-19 13:10:45,855 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-19 13:10:55,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4478590.0, ans=0.125 2024-08-19 13:11:01,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4478590.0, ans=0.125 2024-08-19 13:11:33,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4478790.0, ans=0.0 2024-08-19 13:11:37,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4478790.0, ans=0.1 2024-08-19 13:11:42,617 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 3300, loss[loss=0.1077, beats_loss=0.008821, ecapa_loss=0.0001709, whisper_loss=0.0972, over 18628.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01045, ecapa_loss=0.0001412, whisper_loss=0.0904, over 3866017.91 frames. ], batch size: 76, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:12:06,863 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.84 vs. limit=15.0 2024-08-19 13:12:08,463 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.44 vs. limit=15.0 2024-08-19 13:12:20,889 WARNING [optim.py:496] (0/4) Scaling gradients by 0.09731415659189224, model_norm_threshold=52.06379318237305 2024-08-19 13:12:21,140 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.62, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.783e+05, grad_sumsq=1.704e+07, orig_rms_sq=1.046e-02 2024-08-19 13:12:28,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4479090.0, ans=0.04949747468305833 2024-08-19 13:12:36,049 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-19 13:12:40,187 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2024-08-19 13:12:50,647 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.743e+01 2.318e+01 2.677e+01 3.024e+01 5.350e+02, threshold=5.354e+01, percent-clipped=4.0 2024-08-19 13:12:55,335 WARNING [optim.py:496] (0/4) Scaling gradients by 0.09374216943979263, model_norm_threshold=53.53926467895508 2024-08-19 13:12:55,609 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.2.out_combiner.bypass_scale with proportion 0.08, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.490e+04, grad_sumsq=4.301e+04, orig_rms_sq=5.788e-01 2024-08-19 13:13:01,550 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 3350, loss[loss=0.08961, beats_loss=0.007939, ecapa_loss=0.0001529, whisper_loss=0.08014, over 15594.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01043, ecapa_loss=0.0001417, whisper_loss=0.09028, over 3877154.51 frames. ], batch size: 60, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:13:07,527 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 30 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-19 13:13:10,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4479390.0, ans=0.0 2024-08-19 13:13:12,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4479390.0, ans=0.125 2024-08-19 13:13:22,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4479490.0, ans=0.125 2024-08-19 13:13:23,179 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 20 from LS+wenet, 8 from Vox, 44 fro AS 2024-08-19 13:13:27,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=4479490.0, ans=0.05 2024-08-19 13:13:29,121 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-19 13:13:32,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4479590.0, ans=0.1 2024-08-19 13:13:34,228 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.56 vs. limit=10.0 2024-08-19 13:13:39,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4479590.0, ans=0.2 2024-08-19 13:14:16,259 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 3400, loss[loss=0.0938, beats_loss=0.01123, ecapa_loss=0.000142, whisper_loss=0.08116, over 21855.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01046, ecapa_loss=0.000142, whisper_loss=0.0899, over 3867773.71 frames. ], batch size: 88, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:14:24,049 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-19 13:14:27,994 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.59 vs. limit=12.0 2024-08-19 13:14:31,982 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-448000.pt 2024-08-19 13:14:34,798 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-19 13:14:37,313 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.52 vs. limit=22.5 2024-08-19 13:14:39,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4479990.0, ans=0.125 2024-08-19 13:14:52,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4480090.0, ans=0.0 2024-08-19 13:14:56,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4480090.0, ans=0.2 2024-08-19 13:14:59,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4480090.0, ans=0.125 2024-08-19 13:14:59,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4480090.0, ans=0.1 2024-08-19 13:15:00,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4480090.0, ans=0.0 2024-08-19 13:15:07,641 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 15 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-19 13:15:12,296 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.41 vs. limit=15.0 2024-08-19 13:15:22,759 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.735e+01 2.253e+01 2.575e+01 3.055e+01 5.711e+02, threshold=5.150e+01, percent-clipped=3.0 2024-08-19 13:15:33,756 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 3450, loss[loss=0.1009, beats_loss=0.007739, ecapa_loss=0.000141, whisper_loss=0.09179, over 23842.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01043, ecapa_loss=0.0001425, whisper_loss=0.08973, over 3867988.77 frames. ], batch size: 93, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:16:08,604 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-19 13:16:08,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4480590.0, ans=0.2 2024-08-19 13:16:25,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4480690.0, ans=0.0 2024-08-19 13:16:34,185 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.46 vs. limit=15.0 2024-08-19 13:16:38,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4480790.0, ans=0.125 2024-08-19 13:16:47,967 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-19 13:16:48,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4480790.0, ans=0.125 2024-08-19 13:16:50,259 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.53 vs. limit=22.5 2024-08-19 13:16:50,981 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 3500, loss[loss=0.1235, beats_loss=0.009019, ecapa_loss=0.0001497, whisper_loss=0.113, over 23616.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01042, ecapa_loss=0.0001416, whisper_loss=0.09036, over 3899477.38 frames. ], batch size: 93, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:16:52,132 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.75 vs. limit=5.0 2024-08-19 13:17:00,811 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.42 vs. limit=15.0 2024-08-19 13:17:27,947 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.59 vs. limit=15.0 2024-08-19 13:17:28,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4481090.0, ans=0.125 2024-08-19 13:17:29,834 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-19 13:17:42,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4481190.0, ans=0.0 2024-08-19 13:17:45,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4481190.0, ans=0.125 2024-08-19 13:17:57,049 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.595e+01 2.291e+01 2.477e+01 2.753e+01 3.837e+01, threshold=4.954e+01, percent-clipped=0.0 2024-08-19 13:18:04,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4481290.0, ans=0.125 2024-08-19 13:18:06,611 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 3550, loss[loss=0.08498, beats_loss=0.01171, ecapa_loss=0.0001564, whisper_loss=0.0717, over 21300.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01047, ecapa_loss=0.0001413, whisper_loss=0.08928, over 3893420.78 frames. ], batch size: 89, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:18:09,556 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 13:18:10,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4481390.0, ans=0.0 2024-08-19 13:18:21,539 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 35 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-19 13:18:27,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4481490.0, ans=0.125 2024-08-19 13:18:36,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4481590.0, ans=0.125 2024-08-19 13:18:37,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4481590.0, ans=0.95 2024-08-19 13:18:38,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4481590.0, ans=0.125 2024-08-19 13:18:47,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4481590.0, ans=0.125 2024-08-19 13:19:22,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4481790.0, ans=0.125 2024-08-19 13:19:25,154 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 3600, loss[loss=0.07232, beats_loss=0.01258, ecapa_loss=0.0001079, whisper_loss=0.05866, over 13387.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01058, ecapa_loss=0.0001408, whisper_loss=0.08875, over 3869508.66 frames. ], batch size: 54, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:19:30,944 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.04 vs. limit=10.0 2024-08-19 13:19:35,808 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.55 vs. limit=22.5 2024-08-19 13:19:38,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4481890.0, ans=0.125 2024-08-19 13:19:51,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4481990.0, ans=0.2 2024-08-19 13:20:06,251 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.01 vs. limit=15.0 2024-08-19 13:20:21,410 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 19 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-19 13:20:27,924 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 18 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-19 13:20:34,079 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.650e+01 2.290e+01 2.482e+01 2.749e+01 4.098e+01, threshold=4.965e+01, percent-clipped=0.0 2024-08-19 13:20:43,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4482390.0, ans=0.07 2024-08-19 13:20:44,718 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 3650, loss[loss=0.08935, beats_loss=0.0124, ecapa_loss=0.0001511, whisper_loss=0.07544, over 20925.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01056, ecapa_loss=0.0001406, whisper_loss=0.08958, over 3883830.62 frames. ], batch size: 90, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:20:51,368 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 28 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-19 13:21:03,447 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-19 13:21:51,197 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 28 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-19 13:22:00,028 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 3700, loss[loss=0.09248, beats_loss=0.01178, ecapa_loss=0.0001214, whisper_loss=0.07949, over 22165.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01058, ecapa_loss=0.0001403, whisper_loss=0.08985, over 3896667.49 frames. ], batch size: 90, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:22:10,526 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-19 13:22:12,271 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-19 13:22:17,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4482990.0, ans=0.0 2024-08-19 13:22:30,415 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-19 13:22:31,773 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 21 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-19 13:22:41,916 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.51 vs. limit=12.0 2024-08-19 13:22:49,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4483190.0, ans=0.125 2024-08-19 13:22:50,055 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.24 vs. limit=15.0 2024-08-19 13:22:53,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4483190.0, ans=0.04949747468305833 2024-08-19 13:22:57,115 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-19 13:23:09,230 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.337e+01 2.650e+01 3.046e+01 1.477e+02, threshold=5.301e+01, percent-clipped=3.0 2024-08-19 13:23:14,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4483290.0, ans=0.2 2024-08-19 13:23:19,386 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 28 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-19 13:23:20,808 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 3750, loss[loss=0.1285, beats_loss=0.008063, ecapa_loss=0.0001232, whisper_loss=0.1192, over 17585.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01049, ecapa_loss=0.0001408, whisper_loss=0.09073, over 3898825.90 frames. ], batch size: 63, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:23:29,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4483390.0, ans=0.2 2024-08-19 13:23:41,827 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 27 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-19 13:23:44,441 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.27 vs. limit=15.0 2024-08-19 13:23:49,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4483490.0, ans=0.1 2024-08-19 13:24:02,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=4483590.0, ans=0.05 2024-08-19 13:24:03,583 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 9 from Vox, 41 fro AS 2024-08-19 13:24:03,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4483590.0, ans=0.125 2024-08-19 13:24:04,601 WARNING [optim.py:496] (0/4) Scaling gradients by 0.08629204332828522, model_norm_threshold=53.00765609741211 2024-08-19 13:24:04,780 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.31, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.164e+05, grad_sumsq=1.113e+07, orig_rms_sq=1.046e-02 2024-08-19 13:24:05,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4483690.0, ans=0.2 2024-08-19 13:24:06,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4483690.0, ans=0.0 2024-08-19 13:24:13,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4483690.0, ans=0.125 2024-08-19 13:24:17,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4483690.0, ans=0.125 2024-08-19 13:24:24,139 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-19 13:24:27,594 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=4.462e-01 2024-08-19 13:24:36,475 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 3800, loss[loss=0.08221, beats_loss=0.01315, ecapa_loss=0.0001515, whisper_loss=0.06754, over 15585.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01061, ecapa_loss=0.0001413, whisper_loss=0.09034, over 3879780.68 frames. ], batch size: 66, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:24:40,074 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 21 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-19 13:25:01,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4483990.0, ans=0.125 2024-08-19 13:25:07,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4484090.0, ans=0.1 2024-08-19 13:25:09,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4484090.0, ans=0.2 2024-08-19 13:25:12,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4484090.0, ans=0.125 2024-08-19 13:25:13,858 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-19 13:25:20,680 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-19 13:25:25,831 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.15 vs. limit=15.0 2024-08-19 13:25:34,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4484190.0, ans=0.025 2024-08-19 13:25:37,621 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 14 from Vox, 43 fro AS 2024-08-19 13:25:37,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4484190.0, ans=0.1 2024-08-19 13:25:39,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4484290.0, ans=0.0 2024-08-19 13:25:39,585 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.45 vs. limit=15.0 2024-08-19 13:25:45,330 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.415e+01 2.611e+01 2.947e+01 6.143e+02, threshold=5.223e+01, percent-clipped=2.0 2024-08-19 13:25:55,011 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-19 13:25:56,293 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 3850, loss[loss=0.1018, beats_loss=0.009668, ecapa_loss=0.0001151, whisper_loss=0.09096, over 16085.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01063, ecapa_loss=0.000142, whisper_loss=0.08963, over 3870430.22 frames. ], batch size: 61, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:26:08,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4484390.0, ans=0.125 2024-08-19 13:26:12,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4484490.0, ans=0.125 2024-08-19 13:26:28,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4484590.0, ans=0.0 2024-08-19 13:26:31,785 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-19 13:26:50,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4484690.0, ans=0.1 2024-08-19 13:26:55,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4484690.0, ans=0.07 2024-08-19 13:27:01,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4484790.0, ans=0.125 2024-08-19 13:27:11,944 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 3900, loss[loss=0.07981, beats_loss=0.009551, ecapa_loss=0.0001504, whisper_loss=0.06876, over 13123.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01056, ecapa_loss=0.0001424, whisper_loss=0.09037, over 3895975.04 frames. ], batch size: 53, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:27:16,986 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-19 13:27:17,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4484890.0, ans=0.0 2024-08-19 13:27:18,549 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-19 13:27:38,225 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 18 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-19 13:28:02,548 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4485190.0, ans=0.125 2024-08-19 13:28:18,593 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.733e+01 2.352e+01 2.565e+01 2.860e+01 1.728e+02, threshold=5.131e+01, percent-clipped=1.0 2024-08-19 13:28:30,324 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 3950, loss[loss=0.09033, beats_loss=0.01301, ecapa_loss=0.0001411, whisper_loss=0.07591, over 22529.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01051, ecapa_loss=0.000142, whisper_loss=0.09081, over 3915774.79 frames. ], batch size: 90, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:28:41,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4485390.0, ans=0.1 2024-08-19 13:28:47,952 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-19 13:28:52,079 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.90 vs. limit=22.5 2024-08-19 13:28:54,178 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 29 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-19 13:28:59,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4485490.0, ans=0.125 2024-08-19 13:29:18,801 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.38 vs. limit=15.0 2024-08-19 13:29:20,258 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.63 vs. limit=15.0 2024-08-19 13:29:21,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4485690.0, ans=0.0 2024-08-19 13:29:23,163 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.85 vs. limit=10.0 2024-08-19 13:29:28,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4485690.0, ans=0.0 2024-08-19 13:29:48,651 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 4000, loss[loss=0.09901, beats_loss=0.008161, ecapa_loss=0.0001618, whisper_loss=0.08923, over 17472.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0105, ecapa_loss=0.0001429, whisper_loss=0.09038, over 3904538.39 frames. ], batch size: 68, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:29:56,305 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.41 vs. limit=15.0 2024-08-19 13:29:58,683 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-19 13:30:20,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4486090.0, ans=0.125 2024-08-19 13:30:42,527 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.21 vs. limit=22.5 2024-08-19 13:30:54,630 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.618e+01 2.333e+01 2.515e+01 2.857e+01 4.559e+01, threshold=5.029e+01, percent-clipped=0.0 2024-08-19 13:31:05,662 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 4050, loss[loss=0.1174, beats_loss=0.009605, ecapa_loss=0.0001751, whisper_loss=0.1061, over 18839.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0105, ecapa_loss=0.0001426, whisper_loss=0.09053, over 3881146.32 frames. ], batch size: 76, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:31:12,035 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 16 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-19 13:31:14,891 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 22 from LS+wenet, 18 from Vox, 16 fro AS 2024-08-19 13:31:16,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4486390.0, ans=0.125 2024-08-19 13:31:23,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=4486490.0, ans=0.5 2024-08-19 13:31:27,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4486490.0, ans=0.125 2024-08-19 13:31:28,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4486490.0, ans=0.0 2024-08-19 13:31:34,326 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.69 vs. limit=22.5 2024-08-19 13:31:55,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4486690.0, ans=0.125 2024-08-19 13:31:58,706 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-19 13:32:11,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4486790.0, ans=0.1 2024-08-19 13:32:23,950 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 4100, loss[loss=0.1037, beats_loss=0.0114, ecapa_loss=0.000131, whisper_loss=0.09104, over 22970.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01053, ecapa_loss=0.0001429, whisper_loss=0.09051, over 3884686.15 frames. ], batch size: 91, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:32:24,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4486890.0, ans=0.1 2024-08-19 13:32:30,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4486890.0, ans=0.05 2024-08-19 13:32:32,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4486890.0, ans=0.0 2024-08-19 13:32:34,237 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 22 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-19 13:32:47,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4486990.0, ans=0.04949747468305833 2024-08-19 13:32:50,527 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 22 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-19 13:32:58,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4487090.0, ans=0.125 2024-08-19 13:33:33,944 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.83 vs. limit=15.0 2024-08-19 13:33:35,991 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+01 2.243e+01 2.583e+01 2.871e+01 4.368e+01, threshold=5.166e+01, percent-clipped=0.0 2024-08-19 13:33:39,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4487290.0, ans=0.0 2024-08-19 13:33:40,822 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 21 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-19 13:33:46,111 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 4150, loss[loss=0.07377, beats_loss=0.01208, ecapa_loss=0.0001231, whisper_loss=0.06046, over 21882.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01061, ecapa_loss=0.0001426, whisper_loss=0.08997, over 3870310.82 frames. ], batch size: 93, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:34:08,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4487490.0, ans=0.0 2024-08-19 13:34:17,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4487490.0, ans=0.125 2024-08-19 13:34:21,038 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 17 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-19 13:34:41,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4487590.0, ans=0.1 2024-08-19 13:34:48,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4487690.0, ans=0.2 2024-08-19 13:34:54,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4487690.0, ans=0.125 2024-08-19 13:35:09,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4487790.0, ans=0.07 2024-08-19 13:35:11,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4487790.0, ans=0.0 2024-08-19 13:35:14,273 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 24 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-19 13:35:26,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4487890.0, ans=0.125 2024-08-19 13:35:27,034 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 4200, loss[loss=0.06804, beats_loss=0.01374, ecapa_loss=0.0001379, whisper_loss=0.05292, over 16311.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01056, ecapa_loss=0.0001428, whisper_loss=0.09026, over 3875702.23 frames. ], batch size: 67, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:35:27,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4487890.0, ans=0.07 2024-08-19 13:35:31,862 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-19 13:35:57,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4487990.0, ans=0.125 2024-08-19 13:36:02,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4488090.0, ans=0.0 2024-08-19 13:36:22,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4488190.0, ans=0.0 2024-08-19 13:36:29,674 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.02 vs. limit=15.0 2024-08-19 13:36:35,357 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.32 vs. limit=15.0 2024-08-19 13:36:43,500 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.778e+01 2.376e+01 2.579e+01 2.870e+01 3.813e+01, threshold=5.158e+01, percent-clipped=0.0 2024-08-19 13:36:44,419 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.84 vs. limit=15.0 2024-08-19 13:36:53,587 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 4250, loss[loss=0.1154, beats_loss=0.008483, ecapa_loss=0.0001437, whisper_loss=0.1055, over 22374.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0105, ecapa_loss=0.0001427, whisper_loss=0.09024, over 3899157.55 frames. ], batch size: 89, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:36:57,382 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.65 vs. limit=10.0 2024-08-19 13:37:50,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4488690.0, ans=0.125 2024-08-19 13:37:55,285 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 28 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-19 13:38:15,212 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 4300, loss[loss=0.1267, beats_loss=0.007089, ecapa_loss=0.0001378, whisper_loss=0.1182, over 20806.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01049, ecapa_loss=0.000142, whisper_loss=0.08959, over 3859910.40 frames. ], batch size: 76, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:38:22,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4488890.0, ans=0.125 2024-08-19 13:38:47,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4488990.0, ans=0.1 2024-08-19 13:38:51,162 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-19 13:38:59,215 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 17 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-19 13:39:08,036 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 19 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-19 13:39:15,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4489190.0, ans=0.125 2024-08-19 13:39:22,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4489190.0, ans=0.0 2024-08-19 13:39:36,974 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.711e+01 2.321e+01 2.560e+01 2.840e+01 5.054e+01, threshold=5.120e+01, percent-clipped=0.0 2024-08-19 13:39:39,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4489290.0, ans=0.125 2024-08-19 13:39:39,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4489290.0, ans=0.125 2024-08-19 13:39:49,766 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 4350, loss[loss=0.09775, beats_loss=0.009727, ecapa_loss=0.0001711, whisper_loss=0.08631, over 15797.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01057, ecapa_loss=0.0001418, whisper_loss=0.08873, over 3850614.39 frames. ], batch size: 67, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:39:50,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4489390.0, ans=0.125 2024-08-19 13:40:00,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4489390.0, ans=0.1 2024-08-19 13:40:00,204 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.85 vs. limit=10.0 2024-08-19 13:40:11,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4489490.0, ans=0.1 2024-08-19 13:40:18,604 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 27 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-19 13:40:28,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4489590.0, ans=0.125 2024-08-19 13:40:42,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4489690.0, ans=0.2 2024-08-19 13:40:53,909 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-19 13:40:59,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4489690.0, ans=0.125 2024-08-19 13:41:01,255 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 21 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-19 13:41:03,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4489790.0, ans=0.125 2024-08-19 13:41:22,322 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 4400, loss[loss=0.09438, beats_loss=0.01075, ecapa_loss=0.0001089, whisper_loss=0.08254, over 21201.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01049, ecapa_loss=0.0001421, whisper_loss=0.08931, over 3875072.77 frames. ], batch size: 85, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:41:30,196 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.55 vs. limit=6.0 2024-08-19 13:41:59,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4490090.0, ans=0.125 2024-08-19 13:42:01,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4490090.0, ans=0.0 2024-08-19 13:42:18,943 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.01 vs. limit=15.0 2024-08-19 13:42:20,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4490190.0, ans=0.125 2024-08-19 13:42:28,669 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-19 13:42:42,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4490290.0, ans=0.125 2024-08-19 13:42:45,469 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.746e+01 2.324e+01 2.485e+01 2.810e+01 3.864e+01, threshold=4.971e+01, percent-clipped=0.0 2024-08-19 13:42:58,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4490390.0, ans=0.0 2024-08-19 13:43:00,461 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 4450, loss[loss=0.1007, beats_loss=0.01027, ecapa_loss=0.0001441, whisper_loss=0.08897, over 21151.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01035, ecapa_loss=0.0001424, whisper_loss=0.09015, over 3881123.93 frames. ], batch size: 85, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:43:54,535 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-19 13:43:55,009 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.95 vs. limit=22.5 2024-08-19 13:44:10,002 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-19 13:44:27,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4490790.0, ans=0.125 2024-08-19 13:44:27,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4490790.0, ans=0.2 2024-08-19 13:44:29,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4490790.0, ans=0.0 2024-08-19 13:44:30,292 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.98 vs. limit=15.0 2024-08-19 13:44:32,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4490790.0, ans=0.0 2024-08-19 13:44:40,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4490790.0, ans=0.125 2024-08-19 13:44:40,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4490790.0, ans=0.125 2024-08-19 13:44:53,131 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 4500, loss[loss=0.1067, beats_loss=0.01185, ecapa_loss=0.0001166, whisper_loss=0.09372, over 22166.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01047, ecapa_loss=0.0001416, whisper_loss=0.08928, over 3877410.40 frames. ], batch size: 88, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:45:00,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4490890.0, ans=0.2 2024-08-19 13:45:53,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4491090.0, ans=0.2 2024-08-19 13:45:53,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4491090.0, ans=0.125 2024-08-19 13:46:14,391 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 25 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-19 13:46:20,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4491290.0, ans=0.1 2024-08-19 13:46:22,436 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.756e+01 2.269e+01 2.513e+01 2.857e+01 4.975e+01, threshold=5.026e+01, percent-clipped=1.0 2024-08-19 13:46:37,957 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 4550, loss[loss=0.127, beats_loss=0.008079, ecapa_loss=0.0001434, whisper_loss=0.1175, over 21827.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01046, ecapa_loss=0.0001417, whisper_loss=0.08942, over 3882833.99 frames. ], batch size: 82, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:46:42,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4491390.0, ans=0.0 2024-08-19 13:46:47,784 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.70 vs. limit=15.0 2024-08-19 13:47:03,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4491490.0, ans=0.2 2024-08-19 13:47:27,145 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.63 vs. limit=10.0 2024-08-19 13:47:31,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4491590.0, ans=0.125 2024-08-19 13:47:42,564 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 28 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-19 13:47:56,875 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.68 vs. limit=15.0 2024-08-19 13:47:58,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4491790.0, ans=0.125 2024-08-19 13:48:00,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4491790.0, ans=0.0 2024-08-19 13:48:11,590 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 4600, loss[loss=0.09602, beats_loss=0.008453, ecapa_loss=0.000191, whisper_loss=0.08565, over 18496.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01048, ecapa_loss=0.0001422, whisper_loss=0.08878, over 3851821.20 frames. ], batch size: 79, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:48:15,516 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 23 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-19 13:48:17,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4491890.0, ans=0.0 2024-08-19 13:49:31,936 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.304e+01 2.544e+01 2.901e+01 6.121e+01, threshold=5.088e+01, percent-clipped=1.0 2024-08-19 13:49:36,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4492290.0, ans=0.125 2024-08-19 13:49:36,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4492290.0, ans=0.2 2024-08-19 13:49:39,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4492290.0, ans=0.04949747468305833 2024-08-19 13:49:39,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4492290.0, ans=0.1 2024-08-19 13:49:43,898 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 4650, loss[loss=0.1105, beats_loss=0.01026, ecapa_loss=0.0001195, whisper_loss=0.09904, over 23645.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01047, ecapa_loss=0.0001422, whisper_loss=0.08887, over 3850080.27 frames. ], batch size: 90, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:49:51,770 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-19 13:50:06,087 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-19 13:50:15,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4492490.0, ans=0.0 2024-08-19 13:51:05,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4492790.0, ans=0.125 2024-08-19 13:51:13,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4492890.0, ans=0.0 2024-08-19 13:51:14,856 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 4700, loss[loss=0.1152, beats_loss=0.008876, ecapa_loss=0.0001776, whisper_loss=0.1045, over 21769.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01039, ecapa_loss=0.0001418, whisper_loss=0.08992, over 3880587.39 frames. ], batch size: 91, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:51:29,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4492890.0, ans=0.035 2024-08-19 13:51:34,498 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 15 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-19 13:51:51,175 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.29 vs. limit=15.0 2024-08-19 13:51:55,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4493090.0, ans=0.0 2024-08-19 13:51:58,430 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.75 vs. limit=22.5 2024-08-19 13:52:04,216 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 20 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-19 13:52:04,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4493090.0, ans=0.125 2024-08-19 13:52:28,779 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 23 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-19 13:52:30,140 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.714e+01 2.412e+01 2.639e+01 2.968e+01 4.627e+01, threshold=5.277e+01, percent-clipped=0.0 2024-08-19 13:52:42,730 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 4750, loss[loss=0.1008, beats_loss=0.01161, ecapa_loss=0.0001292, whisper_loss=0.08792, over 20729.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01045, ecapa_loss=0.0001415, whisper_loss=0.08939, over 3889998.20 frames. ], batch size: 82, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:52:48,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=4493390.0, ans=10.0 2024-08-19 13:52:55,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4493390.0, ans=0.0 2024-08-19 13:52:58,947 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 15 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-19 13:53:06,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4493490.0, ans=0.125 2024-08-19 13:53:15,877 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.29 vs. limit=15.0 2024-08-19 13:53:17,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4493590.0, ans=0.125 2024-08-19 13:53:29,304 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-19 13:53:34,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4493590.0, ans=0.07 2024-08-19 13:53:43,369 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-19 13:53:43,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4493690.0, ans=0.125 2024-08-19 13:53:58,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4493790.0, ans=0.125 2024-08-19 13:54:14,331 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 4800, loss[loss=0.08671, beats_loss=0.01188, ecapa_loss=0.0001444, whisper_loss=0.07339, over 20707.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01045, ecapa_loss=0.0001416, whisper_loss=0.09017, over 3886418.98 frames. ], batch size: 87, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 13:54:54,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4494090.0, ans=0.1 2024-08-19 13:55:17,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4494190.0, ans=0.125 2024-08-19 13:55:22,189 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 16 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-19 13:55:26,738 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.328e+01 2.534e+01 2.852e+01 3.872e+01, threshold=5.068e+01, percent-clipped=0.0 2024-08-19 13:55:38,102 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 4850, loss[loss=0.1066, beats_loss=0.01044, ecapa_loss=0.0001157, whisper_loss=0.09499, over 18072.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01047, ecapa_loss=0.0001412, whisper_loss=0.09046, over 3875319.34 frames. ], batch size: 67, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 13:55:38,266 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 36 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-19 13:55:48,979 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.31 vs. limit=22.5 2024-08-19 13:56:04,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4494490.0, ans=0.0 2024-08-19 13:56:04,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4494490.0, ans=0.125 2024-08-19 13:56:05,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4494490.0, ans=0.0 2024-08-19 13:56:13,463 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 20 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-19 13:56:39,137 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.72 vs. limit=15.0 2024-08-19 13:56:50,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4494790.0, ans=0.0 2024-08-19 13:57:03,359 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 4900, loss[loss=0.1134, beats_loss=0.008434, ecapa_loss=0.0001205, whisper_loss=0.1038, over 21378.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01041, ecapa_loss=0.0001416, whisper_loss=0.09091, over 3887386.78 frames. ], batch size: 81, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 13:57:06,233 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.00 vs. limit=15.0 2024-08-19 13:57:15,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4494890.0, ans=0.2 2024-08-19 13:57:26,041 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.89 vs. limit=15.0 2024-08-19 13:57:44,561 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 21 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-19 13:57:46,114 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 24 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-19 13:57:46,719 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.17 vs. limit=22.5 2024-08-19 13:57:54,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4495190.0, ans=0.0 2024-08-19 13:57:58,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4495190.0, ans=0.1 2024-08-19 13:58:03,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4495190.0, ans=0.09899494936611666 2024-08-19 13:58:08,985 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.63 vs. limit=15.0 2024-08-19 13:58:15,436 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.052e+01 2.360e+01 2.527e+01 2.756e+01 4.531e+01, threshold=5.055e+01, percent-clipped=0.0 2024-08-19 13:58:25,263 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 4950, loss[loss=0.1027, beats_loss=0.007904, ecapa_loss=0.0001352, whisper_loss=0.09343, over 16007.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01043, ecapa_loss=0.0001416, whisper_loss=0.09066, over 3872164.69 frames. ], batch size: 62, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 13:58:39,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4495390.0, ans=0.0 2024-08-19 13:59:15,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4495690.0, ans=0.125 2024-08-19 13:59:44,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4495790.0, ans=0.1 2024-08-19 13:59:44,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4495790.0, ans=0.125 2024-08-19 13:59:44,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4495790.0, ans=0.0 2024-08-19 13:59:49,069 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 5000, loss[loss=0.1054, beats_loss=0.009971, ecapa_loss=0.000151, whisper_loss=0.09395, over 19994.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01049, ecapa_loss=0.0001409, whisper_loss=0.09018, over 3869208.67 frames. ], batch size: 83, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:00:22,125 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 19 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-19 14:00:23,088 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.36 vs. limit=15.0 2024-08-19 14:00:23,939 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-19 14:00:27,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4496090.0, ans=0.04949747468305833 2024-08-19 14:00:34,193 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.84 vs. limit=12.0 2024-08-19 14:01:05,810 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-19 14:01:07,620 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.293e+01 2.592e+01 2.875e+01 4.425e+01, threshold=5.184e+01, percent-clipped=0.0 2024-08-19 14:01:13,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4496290.0, ans=0.125 2024-08-19 14:01:17,805 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 5050, loss[loss=0.09712, beats_loss=0.01076, ecapa_loss=0.0001676, whisper_loss=0.08469, over 21098.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01053, ecapa_loss=0.0001412, whisper_loss=0.09035, over 3870526.74 frames. ], batch size: 90, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:01:17,928 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 26 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-19 14:01:42,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4496490.0, ans=0.2 2024-08-19 14:01:58,138 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 32 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-19 14:02:03,591 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-19 14:02:10,736 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 17 from LS+wenet, 8 from Vox, 31 fro AS 2024-08-19 14:02:19,250 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 23 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-19 14:02:30,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4496790.0, ans=0.2 2024-08-19 14:02:41,955 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 5100, loss[loss=0.09061, beats_loss=0.01204, ecapa_loss=0.0001317, whisper_loss=0.07726, over 21994.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01055, ecapa_loss=0.0001417, whisper_loss=0.09102, over 3849677.10 frames. ], batch size: 91, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:03:20,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4497090.0, ans=0.125 2024-08-19 14:03:35,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4497190.0, ans=0.125 2024-08-19 14:03:56,381 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.306e+01 2.543e+01 2.887e+01 4.105e+01, threshold=5.087e+01, percent-clipped=0.0 2024-08-19 14:04:00,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4497290.0, ans=0.1 2024-08-19 14:04:00,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4497290.0, ans=0.125 2024-08-19 14:04:06,705 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 5150, loss[loss=0.1015, beats_loss=0.009599, ecapa_loss=0.0001537, whisper_loss=0.09036, over 19687.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01057, ecapa_loss=0.0001415, whisper_loss=0.09043, over 3839937.90 frames. ], batch size: 79, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:04:24,274 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.46 vs. limit=15.0 2024-08-19 14:04:31,729 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 24 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-19 14:05:21,025 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.86 vs. limit=15.0 2024-08-19 14:05:26,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4497790.0, ans=0.1 2024-08-19 14:05:28,790 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 5200, loss[loss=0.1202, beats_loss=0.006545, ecapa_loss=0.0001377, whisper_loss=0.1123, over 14980.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01052, ecapa_loss=0.000141, whisper_loss=0.09056, over 3843543.46 frames. ], batch size: 54, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:05:37,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4497890.0, ans=0.0 2024-08-19 14:05:53,735 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 32 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-19 14:05:56,266 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.93 vs. limit=10.0 2024-08-19 14:06:07,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4498090.0, ans=0.1 2024-08-19 14:06:26,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=4498190.0, ans=0.025 2024-08-19 14:06:42,502 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.701e+01 2.299e+01 2.552e+01 2.822e+01 3.690e+01, threshold=5.104e+01, percent-clipped=0.0 2024-08-19 14:06:52,373 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 5250, loss[loss=0.09695, beats_loss=0.01172, ecapa_loss=0.0001126, whisper_loss=0.08411, over 16792.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01042, ecapa_loss=0.0001409, whisper_loss=0.09149, over 3865707.41 frames. ], batch size: 62, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:06:52,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4498390.0, ans=0.1 2024-08-19 14:07:03,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=4498390.0, ans=0.95 2024-08-19 14:07:14,888 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-19 14:07:25,647 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 32 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-19 14:07:27,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4498590.0, ans=0.125 2024-08-19 14:07:34,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4498590.0, ans=0.125 2024-08-19 14:07:37,666 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 20 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-19 14:08:19,104 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 5300, loss[loss=0.09142, beats_loss=0.0136, ecapa_loss=0.0001129, whisper_loss=0.07669, over 22161.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01046, ecapa_loss=0.0001409, whisper_loss=0.09099, over 3897339.39 frames. ], batch size: 92, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:08:30,475 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.88 vs. limit=15.0 2024-08-19 14:08:47,097 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-19 14:09:11,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4499190.0, ans=0.125 2024-08-19 14:09:24,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4499290.0, ans=0.125 2024-08-19 14:09:28,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4499290.0, ans=0.0 2024-08-19 14:09:31,258 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.981e+01 2.343e+01 2.651e+01 2.972e+01 3.624e+01, threshold=5.303e+01, percent-clipped=0.0 2024-08-19 14:09:40,846 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 5350, loss[loss=0.1061, beats_loss=0.00634, ecapa_loss=0.0001368, whisper_loss=0.09841, over 16046.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01034, ecapa_loss=0.0001416, whisper_loss=0.09106, over 3850850.07 frames. ], batch size: 57, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:10:02,236 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 12 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-19 14:10:03,835 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 27 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-19 14:10:28,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4499590.0, ans=0.2 2024-08-19 14:11:04,832 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.21 vs. limit=15.0 2024-08-19 14:11:09,940 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 5400, loss[loss=0.1076, beats_loss=0.01071, ecapa_loss=0.0001191, whisper_loss=0.09568, over 16899.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01031, ecapa_loss=0.0001409, whisper_loss=0.09101, over 3825250.04 frames. ], batch size: 64, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:11:12,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4499890.0, ans=0.0 2024-08-19 14:11:32,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4499990.0, ans=0.125 2024-08-19 14:11:34,280 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 22 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-19 14:11:50,594 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 22 from LS+wenet, 42 from Vox, 28 fro AS 2024-08-19 14:11:56,214 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.59 vs. limit=6.0 2024-08-19 14:12:21,936 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.40 vs. limit=15.0 2024-08-19 14:12:22,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=4500290.0, ans=15.0 2024-08-19 14:12:26,261 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.275e+01 2.501e+01 2.873e+01 4.130e+01, threshold=5.001e+01, percent-clipped=0.0 2024-08-19 14:12:28,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4500290.0, ans=0.2 2024-08-19 14:12:36,765 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 5450, loss[loss=0.09647, beats_loss=0.009752, ecapa_loss=0.0001331, whisper_loss=0.08538, over 17747.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01033, ecapa_loss=0.0001417, whisper_loss=0.0911, over 3846614.66 frames. ], batch size: 69, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:12:50,118 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-19 14:13:04,787 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-19 14:13:11,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4500590.0, ans=0.0 2024-08-19 14:13:14,250 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 23 from LS+wenet, 20 from Vox, 16 fro AS 2024-08-19 14:13:33,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4500690.0, ans=0.125 2024-08-19 14:13:39,241 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 17 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-19 14:13:44,807 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-19 14:13:45,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4500690.0, ans=0.125 2024-08-19 14:14:07,494 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 5500, loss[loss=0.1225, beats_loss=0.008068, ecapa_loss=0.0001645, whisper_loss=0.1128, over 22143.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01046, ecapa_loss=0.0001412, whisper_loss=0.08986, over 3861409.93 frames. ], batch size: 90, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:14:09,992 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.94 vs. limit=12.0 2024-08-19 14:14:33,391 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 35 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-19 14:14:43,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4501090.0, ans=0.0 2024-08-19 14:14:46,727 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-19 14:15:08,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4501190.0, ans=0.125 2024-08-19 14:15:09,991 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 27 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-19 14:15:14,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4501190.0, ans=0.2 2024-08-19 14:15:16,100 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-19 14:15:29,747 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+01 2.304e+01 2.520e+01 2.871e+01 1.187e+02, threshold=5.040e+01, percent-clipped=1.0 2024-08-19 14:15:42,780 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 5550, loss[loss=0.09542, beats_loss=0.01161, ecapa_loss=0.0001592, whisper_loss=0.08222, over 22024.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0104, ecapa_loss=0.0001427, whisper_loss=0.09023, over 3867536.92 frames. ], batch size: 90, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:15:59,128 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.25 vs. limit=10.0 2024-08-19 14:16:12,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4501490.0, ans=0.1 2024-08-19 14:16:18,955 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 17 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-19 14:16:22,613 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-19 14:16:32,652 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 14:16:38,948 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.61 vs. limit=15.0 2024-08-19 14:16:50,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4501690.0, ans=0.2 2024-08-19 14:16:58,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4501790.0, ans=0.09899494936611666 2024-08-19 14:17:17,870 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 5600, loss[loss=0.0934, beats_loss=0.009577, ecapa_loss=0.0001661, whisper_loss=0.08216, over 22562.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01033, ecapa_loss=0.0001425, whisper_loss=0.09073, over 3872218.79 frames. ], batch size: 94, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:18:16,348 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-19 14:18:26,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4502190.0, ans=0.1 2024-08-19 14:18:39,441 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.342e+01 2.515e+01 2.771e+01 5.960e+01, threshold=5.030e+01, percent-clipped=1.0 2024-08-19 14:18:43,743 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.77 vs. limit=15.0 2024-08-19 14:18:49,824 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 5650, loss[loss=0.1157, beats_loss=0.008083, ecapa_loss=0.0001891, whisper_loss=0.1057, over 21561.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01042, ecapa_loss=0.0001416, whisper_loss=0.08989, over 3893971.38 frames. ], batch size: 89, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:18:56,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4502390.0, ans=0.125 2024-08-19 14:19:29,332 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.17 vs. limit=15.0 2024-08-19 14:19:49,545 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-19 14:19:57,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4502690.0, ans=0.125 2024-08-19 14:20:19,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4502790.0, ans=0.1 2024-08-19 14:20:32,098 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-19 14:20:48,587 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 5700, loss[loss=0.1008, beats_loss=0.009288, ecapa_loss=0.0001384, whisper_loss=0.09011, over 14304.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01043, ecapa_loss=0.0001423, whisper_loss=0.09019, over 3914243.38 frames. ], batch size: 59, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:21:35,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4502990.0, ans=0.1 2024-08-19 14:21:52,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4503090.0, ans=0.125 2024-08-19 14:21:54,279 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-19 14:22:10,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4503190.0, ans=0.0 2024-08-19 14:22:31,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4503190.0, ans=0.1 2024-08-19 14:22:47,176 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.393e+01 2.644e+01 2.920e+01 4.310e+01, threshold=5.288e+01, percent-clipped=0.0 2024-08-19 14:23:03,906 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 5750, loss[loss=0.08698, beats_loss=0.01071, ecapa_loss=0.0001379, whisper_loss=0.07489, over 18832.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01035, ecapa_loss=0.0001427, whisper_loss=0.09109, over 3918314.48 frames. ], batch size: 77, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:23:04,024 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-19 14:23:30,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4503490.0, ans=0.125 2024-08-19 14:24:10,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4503590.0, ans=0.5 2024-08-19 14:24:13,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4503590.0, ans=0.125 2024-08-19 14:24:18,899 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.59 vs. limit=15.0 2024-08-19 14:24:26,899 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-19 14:24:51,141 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 20 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-19 14:24:56,397 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 25 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-19 14:25:04,601 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 17 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-19 14:25:12,403 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 15 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-19 14:25:15,868 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 5800, loss[loss=0.1012, beats_loss=0.01037, ecapa_loss=0.0001899, whisper_loss=0.08893, over 19063.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0103, ecapa_loss=0.0001431, whisper_loss=0.09091, over 3885056.74 frames. ], batch size: 84, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:25:31,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4503890.0, ans=0.125 2024-08-19 14:25:35,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4503990.0, ans=0.125 2024-08-19 14:25:44,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4503990.0, ans=0.2 2024-08-19 14:25:47,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4503990.0, ans=0.125 2024-08-19 14:25:51,290 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.85 vs. limit=22.5 2024-08-19 14:25:51,446 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.23 vs. limit=22.5 2024-08-19 14:26:07,473 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=4504190.0, ans=0.05 2024-08-19 14:26:28,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4504290.0, ans=0.04949747468305833 2024-08-19 14:26:30,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4504290.0, ans=0.0 2024-08-19 14:26:35,464 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.790e+01 2.385e+01 2.563e+01 2.878e+01 9.355e+01, threshold=5.127e+01, percent-clipped=2.0 2024-08-19 14:26:46,462 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.45 vs. limit=12.0 2024-08-19 14:26:47,228 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 5850, loss[loss=0.1011, beats_loss=0.01238, ecapa_loss=0.000146, whisper_loss=0.0873, over 21883.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0104, ecapa_loss=0.0001418, whisper_loss=0.09008, over 3875704.89 frames. ], batch size: 89, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:26:51,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4504390.0, ans=0.0 2024-08-19 14:27:20,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4504490.0, ans=0.2 2024-08-19 14:27:33,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4504590.0, ans=0.125 2024-08-19 14:27:56,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4504690.0, ans=0.0 2024-08-19 14:28:12,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4504790.0, ans=0.05 2024-08-19 14:28:16,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4504790.0, ans=0.125 2024-08-19 14:28:28,252 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 5900, loss[loss=0.06452, beats_loss=0.01312, ecapa_loss=0.0001919, whisper_loss=0.04948, over 15321.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01047, ecapa_loss=0.0001411, whisper_loss=0.08963, over 3861868.67 frames. ], batch size: 70, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:28:49,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4504990.0, ans=0.2 2024-08-19 14:29:15,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4505090.0, ans=0.125 2024-08-19 14:29:19,071 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.24 vs. limit=22.5 2024-08-19 14:29:19,353 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.61 vs. limit=15.0 2024-08-19 14:29:24,421 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 29 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-19 14:29:31,045 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 24 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-19 14:29:47,983 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4505290.0, ans=0.2 2024-08-19 14:29:51,850 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.307e+01 2.493e+01 2.853e+01 3.698e+01, threshold=4.986e+01, percent-clipped=0.0 2024-08-19 14:30:02,199 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 5950, loss[loss=0.09244, beats_loss=0.009543, ecapa_loss=0.0001767, whisper_loss=0.08113, over 19451.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01041, ecapa_loss=0.0001415, whisper_loss=0.09011, over 3858903.40 frames. ], batch size: 85, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:30:02,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4505390.0, ans=0.125 2024-08-19 14:30:07,004 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.57 vs. limit=15.0 2024-08-19 14:30:12,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4505390.0, ans=0.2 2024-08-19 14:30:14,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4505390.0, ans=0.125 2024-08-19 14:30:20,105 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-19 14:30:40,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4505590.0, ans=0.125 2024-08-19 14:30:55,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4505590.0, ans=0.0 2024-08-19 14:31:00,275 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4505690.0, ans=0.125 2024-08-19 14:31:06,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4505690.0, ans=0.1 2024-08-19 14:31:12,186 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 22 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-19 14:31:14,578 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 31 from LS+wenet, 28 from Vox, 24 fro AS 2024-08-19 14:31:22,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4505790.0, ans=0.0 2024-08-19 14:31:39,202 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.52 vs. limit=15.0 2024-08-19 14:31:42,960 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 6000, loss[loss=0.09914, beats_loss=0.01196, ecapa_loss=0.0001497, whisper_loss=0.08568, over 21354.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01049, ecapa_loss=0.0001411, whisper_loss=0.0902, over 3872663.16 frames. ], batch size: 90, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:31:42,961 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-19 14:32:32,845 INFO [train_multi_KD3.py:1149] (0/4) Epoch 31, validation on ASR_libri: loss=0.2517, beats_loss=0, ecapa_loss=0.0005139, whisper_loss=0.2466, over 922467.00 frames. 2024-08-19 14:32:50,735 INFO [train_multi_KD3.py:1149] (0/4) Epoch 31, validation on SV_voxceleb1: loss=0.003959, beats_loss=0, ecapa_loss=0.0003959, whisper_loss=0, over 939242.00 frames. 2024-08-19 14:33:52,902 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.1248, 3.1669, 3.2632, 3.0086], device='cuda:0') 2024-08-19 14:34:14,904 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.0118, 2.0418, 2.1375, 1.9985, 2.5966, 2.0316, 2.1483, 2.0359], device='cuda:0') 2024-08-19 14:34:38,948 INFO [train_multi_KD3.py:1149] (0/4) Epoch 31, validation on AT_audioset: loss=0.02302, beats_loss=0.02302, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 14:34:38,953 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32389MB 2024-08-19 14:34:51,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4505890.0, ans=0.125 2024-08-19 14:35:13,695 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 36 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-19 14:35:45,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4506190.0, ans=0.125 2024-08-19 14:35:50,079 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.86 vs. limit=15.0 2024-08-19 14:35:55,825 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.316e+01 2.553e+01 2.863e+01 1.087e+02, threshold=5.107e+01, percent-clipped=1.0 2024-08-19 14:36:07,624 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 6050, loss[loss=0.1049, beats_loss=0.007932, ecapa_loss=0.0001766, whisper_loss=0.09518, over 19381.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01043, ecapa_loss=0.0001415, whisper_loss=0.09058, over 3886309.61 frames. ], batch size: 78, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:36:13,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4506390.0, ans=0.125 2024-08-19 14:36:42,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4506490.0, ans=0.125 2024-08-19 14:37:12,416 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-19 14:37:25,201 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-19 14:37:25,931 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.70 vs. limit=15.0 2024-08-19 14:37:34,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4506790.0, ans=0.1 2024-08-19 14:37:35,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4506790.0, ans=0.95 2024-08-19 14:37:42,774 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 6100, loss[loss=0.1174, beats_loss=0.009237, ecapa_loss=0.0001481, whisper_loss=0.1067, over 21352.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0104, ecapa_loss=0.0001408, whisper_loss=0.0914, over 3904965.84 frames. ], batch size: 89, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:37:44,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4506890.0, ans=0.1 2024-08-19 14:38:12,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4506990.0, ans=0.025 2024-08-19 14:38:15,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4506990.0, ans=0.0 2024-08-19 14:38:18,036 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.58 vs. limit=15.0 2024-08-19 14:38:23,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4507090.0, ans=0.2 2024-08-19 14:38:23,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4507090.0, ans=0.1 2024-08-19 14:38:45,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4507190.0, ans=0.125 2024-08-19 14:38:46,517 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.17 vs. limit=15.0 2024-08-19 14:38:54,069 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.84 vs. limit=15.0 2024-08-19 14:38:58,025 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.243e+01 2.514e+01 2.857e+01 4.099e+01, threshold=5.029e+01, percent-clipped=0.0 2024-08-19 14:39:01,933 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.34 vs. limit=15.0 2024-08-19 14:39:07,563 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 6150, loss[loss=0.1106, beats_loss=0.009609, ecapa_loss=0.0001029, whisper_loss=0.09997, over 16435.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01049, ecapa_loss=0.0001397, whisper_loss=0.09072, over 3865890.80 frames. ], batch size: 62, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:39:16,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4507390.0, ans=0.2 2024-08-19 14:39:16,708 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=5.559e+01 2024-08-19 14:39:53,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4507590.0, ans=0.125 2024-08-19 14:40:11,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4507690.0, ans=0.1 2024-08-19 14:40:14,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4507690.0, ans=0.125 2024-08-19 14:40:37,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4507890.0, ans=0.0 2024-08-19 14:40:37,295 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.40 vs. limit=15.0 2024-08-19 14:40:38,437 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 6200, loss[loss=0.1078, beats_loss=0.009499, ecapa_loss=0.000143, whisper_loss=0.09683, over 23197.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0105, ecapa_loss=0.0001396, whisper_loss=0.0902, over 3862347.61 frames. ], batch size: 91, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:40:41,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4507890.0, ans=0.0 2024-08-19 14:41:03,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4507990.0, ans=0.04949747468305833 2024-08-19 14:41:03,788 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.43 vs. limit=22.5 2024-08-19 14:41:19,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4508090.0, ans=0.0 2024-08-19 14:41:19,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4508090.0, ans=0.1 2024-08-19 14:41:38,911 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=15.0 2024-08-19 14:42:00,976 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.259e+01 2.437e+01 2.769e+01 4.793e+01, threshold=4.873e+01, percent-clipped=0.0 2024-08-19 14:42:04,593 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4508290.0, ans=0.125 2024-08-19 14:42:10,025 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 6250, loss[loss=0.09201, beats_loss=0.01259, ecapa_loss=0.0001204, whisper_loss=0.07822, over 14118.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01051, ecapa_loss=0.000139, whisper_loss=0.08992, over 3847509.73 frames. ], batch size: 55, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:42:22,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4508490.0, ans=0.2 2024-08-19 14:42:33,595 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.66 vs. limit=10.0 2024-08-19 14:42:42,320 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-19 14:42:42,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4508590.0, ans=0.0 2024-08-19 14:42:46,827 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 25 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-19 14:43:20,367 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 6300, loss[loss=0.08058, beats_loss=0.01257, ecapa_loss=0.0001472, whisper_loss=0.06654, over 15453.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01048, ecapa_loss=0.0001402, whisper_loss=0.09057, over 3865255.29 frames. ], batch size: 64, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:43:23,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4508890.0, ans=0.125 2024-08-19 14:43:47,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4509090.0, ans=0.1 2024-08-19 14:43:57,293 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-19 14:44:16,183 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.32 vs. limit=15.0 2024-08-19 14:44:23,705 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.807e+01 2.387e+01 2.658e+01 3.235e+01 4.854e+01, threshold=5.315e+01, percent-clipped=0.0 2024-08-19 14:44:32,912 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 6350, loss[loss=0.07438, beats_loss=0.009669, ecapa_loss=0.0001653, whisper_loss=0.06306, over 15585.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01042, ecapa_loss=0.0001419, whisper_loss=0.09055, over 3856926.00 frames. ], batch size: 64, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:44:33,121 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-19 14:44:56,220 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-19 14:45:09,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4509590.0, ans=0.1 2024-08-19 14:45:12,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4509590.0, ans=0.0 2024-08-19 14:45:29,592 WARNING [optim.py:496] (0/4) Scaling gradients by 0.010832761414349079, model_norm_threshold=53.15283203125 2024-08-19 14:45:29,760 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.45, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.092e+07, grad_sumsq=1.044e+09, orig_rms_sq=1.046e-02 2024-08-19 14:45:43,927 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 6400, loss[loss=0.1132, beats_loss=0.009548, ecapa_loss=0.0001445, whisper_loss=0.1022, over 21332.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0104, ecapa_loss=0.0001418, whisper_loss=0.0909, over 3850842.93 frames. ], batch size: 85, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:45:45,568 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 26 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-19 14:45:47,450 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.76 vs. limit=15.0 2024-08-19 14:46:00,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4509990.0, ans=0.1 2024-08-19 14:46:04,908 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.04 vs. limit=6.0 2024-08-19 14:46:05,580 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 24 from LS+wenet, 21 from Vox, 49 fro AS 2024-08-19 14:46:18,664 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 26 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-19 14:46:20,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4510090.0, ans=0.0 2024-08-19 14:46:26,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4510190.0, ans=0.1 2024-08-19 14:46:28,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4510190.0, ans=0.125 2024-08-19 14:46:37,992 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.48 vs. limit=15.0 2024-08-19 14:46:38,367 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 18 from LS+wenet, 7 from Vox, 28 fro AS 2024-08-19 14:46:46,825 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.039e+01 2.363e+01 2.610e+01 2.887e+01 4.907e+03, threshold=5.221e+01, percent-clipped=2.0 2024-08-19 14:46:51,173 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 22 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-19 14:46:52,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4510290.0, ans=0.125 2024-08-19 14:46:55,554 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 6450, loss[loss=0.09874, beats_loss=0.008119, ecapa_loss=0.000159, whisper_loss=0.08903, over 16591.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01046, ecapa_loss=0.0001407, whisper_loss=0.09072, over 3893433.59 frames. ], batch size: 63, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:46:59,515 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.81 vs. limit=22.5 2024-08-19 14:47:03,934 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.53 vs. limit=22.5 2024-08-19 14:47:15,811 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-19 14:47:18,719 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 28 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-19 14:47:24,926 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.49 vs. limit=22.5 2024-08-19 14:47:26,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4510590.0, ans=0.125 2024-08-19 14:47:27,608 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.57 vs. limit=22.5 2024-08-19 14:47:45,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4510690.0, ans=0.0 2024-08-19 14:48:05,635 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-19 14:48:10,044 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 6500, loss[loss=0.08726, beats_loss=0.01323, ecapa_loss=0.0001528, whisper_loss=0.0725, over 21002.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01049, ecapa_loss=0.0001407, whisper_loss=0.09065, over 3904523.30 frames. ], batch size: 89, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:48:12,571 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.whiten.whitening_limit, batch_count=4510890.0, ans=15.0 2024-08-19 14:48:20,020 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-19 14:48:24,317 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-19 14:48:29,041 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-19 14:48:58,280 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 18 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-19 14:49:13,654 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.374e+01 2.606e+01 2.807e+01 1.094e+02, threshold=5.213e+01, percent-clipped=1.0 2024-08-19 14:49:23,915 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 6550, loss[loss=0.1076, beats_loss=0.01087, ecapa_loss=0.0001204, whisper_loss=0.09557, over 20809.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01056, ecapa_loss=0.0001401, whisper_loss=0.09052, over 3932477.06 frames. ], batch size: 82, lr: 1.97e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:49:27,701 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.19 vs. limit=10.0 2024-08-19 14:49:32,962 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 18 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-19 14:49:43,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4511490.0, ans=0.0 2024-08-19 14:50:12,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4511690.0, ans=0.125 2024-08-19 14:50:28,132 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.82 vs. limit=10.0 2024-08-19 14:50:29,957 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-19 14:50:30,178 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4511790.0, ans=0.2 2024-08-19 14:50:30,784 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=22.5 2024-08-19 14:50:31,345 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 35 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-19 14:50:36,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4511790.0, ans=0.2 2024-08-19 14:50:38,605 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 6600, loss[loss=0.09993, beats_loss=0.009837, ecapa_loss=0.0001352, whisper_loss=0.08874, over 19770.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01053, ecapa_loss=0.0001408, whisper_loss=0.09052, over 3957580.91 frames. ], batch size: 81, lr: 1.97e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:50:41,956 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-19 14:50:44,199 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=16.50 vs. limit=15.0 2024-08-19 14:50:56,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4511990.0, ans=0.1 2024-08-19 14:51:41,325 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-19 14:51:44,340 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.291e+01 2.479e+01 2.799e+01 1.189e+02, threshold=4.958e+01, percent-clipped=2.0 2024-08-19 14:51:51,984 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.29 vs. limit=15.0 2024-08-19 14:51:53,301 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 6650, loss[loss=0.1098, beats_loss=0.009204, ecapa_loss=0.0001683, whisper_loss=0.09894, over 21878.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01043, ecapa_loss=0.0001418, whisper_loss=0.09027, over 3943853.63 frames. ], batch size: 91, lr: 1.97e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:52:05,948 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.72 vs. limit=15.0 2024-08-19 14:52:23,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4512590.0, ans=0.125 2024-08-19 14:52:30,054 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 27 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-19 14:52:30,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4512590.0, ans=0.1 2024-08-19 14:52:47,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4512790.0, ans=0.125 2024-08-19 14:52:51,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4512790.0, ans=0.125 2024-08-19 14:52:57,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4512790.0, ans=0.2 2024-08-19 14:53:00,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4512790.0, ans=0.2 2024-08-19 14:53:03,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4512890.0, ans=0.125 2024-08-19 14:53:04,789 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 6700, loss[loss=0.07866, beats_loss=0.01087, ecapa_loss=0.0001274, whisper_loss=0.06651, over 16469.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01048, ecapa_loss=0.0001412, whisper_loss=0.08957, over 3926404.07 frames. ], batch size: 63, lr: 1.97e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:53:15,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4512890.0, ans=0.125 2024-08-19 14:53:16,752 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-19 14:53:26,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4512990.0, ans=0.0 2024-08-19 14:53:31,095 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-19 14:53:39,691 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 16 from Vox, 49 fro AS 2024-08-19 14:53:43,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4513090.0, ans=0.0 2024-08-19 14:53:55,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4513190.0, ans=0.125 2024-08-19 14:53:57,206 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-19 14:54:00,468 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-19 14:54:02,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4513190.0, ans=0.1 2024-08-19 14:54:06,604 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 31 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-19 14:54:10,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4513290.0, ans=0.04949747468305833 2024-08-19 14:54:14,030 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.375e+01 2.659e+01 3.001e+01 3.941e+01, threshold=5.319e+01, percent-clipped=0.0 2024-08-19 14:54:14,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4513290.0, ans=0.125 2024-08-19 14:54:17,230 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-19 14:54:23,298 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 6750, loss[loss=0.09841, beats_loss=0.01136, ecapa_loss=0.0001184, whisper_loss=0.08587, over 23287.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01049, ecapa_loss=0.0001412, whisper_loss=0.09008, over 3893351.40 frames. ], batch size: 92, lr: 1.97e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:54:29,312 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-19 14:54:30,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4513390.0, ans=0.125 2024-08-19 14:54:32,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=4513390.0, ans=0.1 2024-08-19 14:54:45,197 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 17 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-19 14:54:54,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4513590.0, ans=0.125 2024-08-19 14:55:15,406 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=6.733e+00 2024-08-19 14:55:17,594 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-19 14:55:28,869 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 6800, loss[loss=0.1111, beats_loss=0.009584, ecapa_loss=0.0001246, whisper_loss=0.1002, over 24459.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01049, ecapa_loss=0.0001415, whisper_loss=0.08955, over 3881413.97 frames. ], batch size: 94, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 14:55:30,128 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 14 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-19 14:55:31,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4513890.0, ans=0.1 2024-08-19 14:55:33,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4513890.0, ans=0.125 2024-08-19 14:55:43,628 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-19 14:56:16,013 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 13 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-19 14:56:20,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4514290.0, ans=0.0 2024-08-19 14:56:21,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4514290.0, ans=0.2 2024-08-19 14:56:23,586 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.299e+01 2.607e+01 2.944e+01 3.489e+02, threshold=5.214e+01, percent-clipped=2.0 2024-08-19 14:56:27,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4514290.0, ans=0.0 2024-08-19 14:56:30,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4514390.0, ans=0.0 2024-08-19 14:56:31,349 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 6850, loss[loss=0.08403, beats_loss=0.01044, ecapa_loss=0.0001208, whisper_loss=0.07238, over 17570.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01049, ecapa_loss=0.0001405, whisper_loss=0.08924, over 3865013.06 frames. ], batch size: 68, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 14:56:35,139 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 34 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-19 14:56:37,462 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 17 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-19 14:56:42,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4514490.0, ans=0.0 2024-08-19 14:57:01,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4514590.0, ans=0.0 2024-08-19 14:57:01,857 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.13 vs. limit=15.0 2024-08-19 14:57:17,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4514690.0, ans=0.125 2024-08-19 14:57:21,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4514790.0, ans=0.2 2024-08-19 14:57:33,381 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 6900, loss[loss=0.1115, beats_loss=0.01082, ecapa_loss=0.0001456, whisper_loss=0.09926, over 22025.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01048, ecapa_loss=0.0001408, whisper_loss=0.08917, over 3857756.29 frames. ], batch size: 89, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 14:57:37,124 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 22 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-19 14:57:48,304 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-19 14:57:49,089 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.81 vs. limit=15.0 2024-08-19 14:57:53,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4514990.0, ans=0.125 2024-08-19 14:58:03,820 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.62 vs. limit=15.0 2024-08-19 14:58:04,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4515090.0, ans=0.0 2024-08-19 14:58:14,643 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.05 vs. limit=15.0 2024-08-19 14:58:27,592 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.255e+01 2.566e+01 2.840e+01 4.154e+01, threshold=5.133e+01, percent-clipped=0.0 2024-08-19 14:58:28,478 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.89 vs. limit=10.0 2024-08-19 14:58:31,639 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-19 14:58:35,199 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 6950, loss[loss=0.09, beats_loss=0.009491, ecapa_loss=0.0001461, whisper_loss=0.07905, over 15668.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01051, ecapa_loss=0.0001402, whisper_loss=0.08944, over 3889527.54 frames. ], batch size: 61, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 14:58:40,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4515390.0, ans=0.2 2024-08-19 14:58:45,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4515390.0, ans=0.0 2024-08-19 14:59:02,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4515590.0, ans=0.125 2024-08-19 14:59:08,976 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 20 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-19 14:59:21,488 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 28 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-19 14:59:32,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4515790.0, ans=0.0 2024-08-19 14:59:33,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4515790.0, ans=0.125 2024-08-19 14:59:37,385 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 7000, loss[loss=0.1052, beats_loss=0.01002, ecapa_loss=0.0001571, whisper_loss=0.09362, over 21325.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01054, ecapa_loss=0.0001407, whisper_loss=0.08895, over 3892018.67 frames. ], batch size: 84, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 14:59:49,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4515990.0, ans=0.125 2024-08-19 15:00:15,554 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-19 15:00:20,297 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 24 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-19 15:00:30,741 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.10 vs. limit=22.5 2024-08-19 15:00:31,228 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.030e+01 2.441e+01 2.633e+01 3.064e+01 9.212e+01, threshold=5.267e+01, percent-clipped=3.0 2024-08-19 15:00:37,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4516390.0, ans=0.2 2024-08-19 15:00:38,725 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 7050, loss[loss=0.1153, beats_loss=0.008825, ecapa_loss=0.0001788, whisper_loss=0.1046, over 20368.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01044, ecapa_loss=0.0001418, whisper_loss=0.08907, over 3893502.85 frames. ], batch size: 84, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:00:43,817 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 24 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-19 15:00:47,596 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 20 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-19 15:00:51,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4516490.0, ans=0.1 2024-08-19 15:00:57,809 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-19 15:00:58,978 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-19 15:01:10,857 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2024-08-19 15:01:21,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4516690.0, ans=0.0 2024-08-19 15:01:32,598 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-19 15:01:40,655 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 7100, loss[loss=0.1049, beats_loss=0.01058, ecapa_loss=0.0001576, whisper_loss=0.09277, over 21902.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01047, ecapa_loss=0.0001409, whisper_loss=0.08882, over 3874780.41 frames. ], batch size: 92, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:01:45,061 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.47 vs. limit=22.5 2024-08-19 15:01:47,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4516890.0, ans=0.125 2024-08-19 15:02:10,310 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 13 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-19 15:02:13,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4517090.0, ans=0.025 2024-08-19 15:02:19,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4517190.0, ans=0.1 2024-08-19 15:02:34,329 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.02 vs. limit=15.0 2024-08-19 15:02:35,777 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.673e+01 2.230e+01 2.578e+01 2.810e+01 3.581e+01, threshold=5.156e+01, percent-clipped=0.0 2024-08-19 15:02:40,198 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.57 vs. limit=10.0 2024-08-19 15:02:42,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4517390.0, ans=0.0 2024-08-19 15:02:43,322 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 7150, loss[loss=0.11, beats_loss=0.007996, ecapa_loss=0.0001703, whisper_loss=0.1003, over 20058.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01054, ecapa_loss=0.0001396, whisper_loss=0.0887, over 3886378.08 frames. ], batch size: 82, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:02:51,827 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.58 vs. limit=15.0 2024-08-19 15:02:57,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4517490.0, ans=0.0 2024-08-19 15:03:06,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=4517490.0, ans=0.05 2024-08-19 15:03:23,315 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.75 vs. limit=15.0 2024-08-19 15:03:26,871 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.10 vs. limit=15.0 2024-08-19 15:03:30,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4517690.0, ans=0.125 2024-08-19 15:03:32,725 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.693e-01 2024-08-19 15:03:41,765 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.95 vs. limit=12.0 2024-08-19 15:03:43,660 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 9 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-19 15:03:45,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4517890.0, ans=0.1 2024-08-19 15:03:45,992 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 7200, loss[loss=0.1246, beats_loss=0.009673, ecapa_loss=0.0001128, whisper_loss=0.1138, over 22113.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01058, ecapa_loss=0.0001395, whisper_loss=0.08868, over 3897421.65 frames. ], batch size: 83, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:04:07,755 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.56 vs. limit=15.0 2024-08-19 15:04:19,687 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.65 vs. limit=15.0 2024-08-19 15:04:35,422 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.80 vs. limit=15.0 2024-08-19 15:04:37,590 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 18 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-19 15:04:39,777 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.333e+01 2.621e+01 2.974e+01 6.907e+01, threshold=5.242e+01, percent-clipped=0.0 2024-08-19 15:04:39,904 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-19 15:04:40,522 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.76 vs. limit=12.0 2024-08-19 15:04:45,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4518290.0, ans=0.2 2024-08-19 15:04:46,966 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 7250, loss[loss=0.1086, beats_loss=0.009936, ecapa_loss=0.0001454, whisper_loss=0.09719, over 18478.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01059, ecapa_loss=0.0001392, whisper_loss=0.08853, over 3897824.04 frames. ], batch size: 75, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:04:50,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4518390.0, ans=0.0 2024-08-19 15:05:01,059 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.58 vs. limit=15.0 2024-08-19 15:05:15,269 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-19 15:05:32,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4518690.0, ans=0.1 2024-08-19 15:05:41,776 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-19 15:05:41,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4518790.0, ans=0.125 2024-08-19 15:05:43,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4518790.0, ans=0.125 2024-08-19 15:05:47,599 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 7300, loss[loss=0.0993, beats_loss=0.01133, ecapa_loss=0.0001414, whisper_loss=0.08655, over 14202.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0105, ecapa_loss=0.00014, whisper_loss=0.08941, over 3888885.43 frames. ], batch size: 54, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:05:54,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4518890.0, ans=0.125 2024-08-19 15:06:01,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4518990.0, ans=0.0 2024-08-19 15:06:17,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4519090.0, ans=0.125 2024-08-19 15:06:23,383 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 29 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-19 15:06:37,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4519290.0, ans=0.125 2024-08-19 15:06:39,924 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 15:06:41,772 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.312e+01 2.518e+01 2.847e+01 5.686e+01, threshold=5.035e+01, percent-clipped=2.0 2024-08-19 15:06:43,456 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.26 vs. limit=22.5 2024-08-19 15:06:49,085 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 7350, loss[loss=0.129, beats_loss=0.00825, ecapa_loss=0.0001181, whisper_loss=0.1195, over 19080.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01049, ecapa_loss=0.0001409, whisper_loss=0.08929, over 3873785.44 frames. ], batch size: 69, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:06:52,327 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.29 vs. limit=15.0 2024-08-19 15:06:54,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4519390.0, ans=0.1 2024-08-19 15:07:07,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4519490.0, ans=0.125 2024-08-19 15:07:13,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4519590.0, ans=0.125 2024-08-19 15:07:18,366 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 27 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-19 15:07:20,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4519590.0, ans=0.125 2024-08-19 15:07:25,680 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 15 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-19 15:07:28,312 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-19 15:07:31,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4519690.0, ans=0.0 2024-08-19 15:07:36,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4519790.0, ans=0.2 2024-08-19 15:07:45,218 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-19 15:07:47,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4519790.0, ans=0.2 2024-08-19 15:07:50,033 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 7400, loss[loss=0.1005, beats_loss=0.01172, ecapa_loss=0.0001355, whisper_loss=0.08739, over 22084.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01047, ecapa_loss=0.000141, whisper_loss=0.08942, over 3870574.99 frames. ], batch size: 90, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:07:50,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4519890.0, ans=0.2 2024-08-19 15:07:52,918 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten.whitening_limit, batch_count=4519890.0, ans=15.0 2024-08-19 15:07:59,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4519890.0, ans=0.1 2024-08-19 15:08:02,510 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-452000.pt 2024-08-19 15:08:08,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=4519990.0, ans=0.025 2024-08-19 15:08:20,825 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-19 15:08:23,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4520090.0, ans=0.0 2024-08-19 15:08:39,415 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 23 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-19 15:08:43,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4520290.0, ans=0.1 2024-08-19 15:08:43,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4520290.0, ans=0.125 2024-08-19 15:08:47,144 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.604e+01 2.272e+01 2.510e+01 2.684e+01 4.218e+01, threshold=5.020e+01, percent-clipped=0.0 2024-08-19 15:08:51,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4520290.0, ans=0.1 2024-08-19 15:08:53,983 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 7450, loss[loss=0.08512, beats_loss=0.01304, ecapa_loss=0.0001346, whisper_loss=0.07074, over 17282.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0104, ecapa_loss=0.000142, whisper_loss=0.09003, over 3896819.95 frames. ], batch size: 72, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:08:59,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4520390.0, ans=0.125 2024-08-19 15:09:01,977 INFO [train_multi_KD3.py:844] (0/4) A total of 98 cuts. 32 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-19 15:09:03,172 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-19 15:09:04,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4520390.0, ans=0.0 2024-08-19 15:09:05,966 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-19 15:09:10,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4520490.0, ans=0.0 2024-08-19 15:09:12,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4520490.0, ans=0.1 2024-08-19 15:09:18,030 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-19 15:09:56,177 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 7500, loss[loss=0.07994, beats_loss=0.01182, ecapa_loss=0.0001256, whisper_loss=0.06686, over 18470.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01045, ecapa_loss=0.0001396, whisper_loss=0.08961, over 3890309.14 frames. ], batch size: 73, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:09:58,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=4520890.0, ans=15.0 2024-08-19 15:10:04,952 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 16 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-19 15:10:09,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4520990.0, ans=0.0 2024-08-19 15:10:10,120 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-19 15:10:16,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4520990.0, ans=0.2 2024-08-19 15:10:27,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4521090.0, ans=0.0 2024-08-19 15:10:46,281 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 25 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-19 15:10:51,086 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.958e+01 2.277e+01 2.526e+01 2.958e+01 5.169e+01, threshold=5.052e+01, percent-clipped=1.0 2024-08-19 15:10:57,398 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 22 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-19 15:10:58,353 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 7550, loss[loss=0.08111, beats_loss=0.01213, ecapa_loss=0.0001439, whisper_loss=0.06754, over 20511.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01038, ecapa_loss=0.0001411, whisper_loss=0.09012, over 3899710.18 frames. ], batch size: 85, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:11:02,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4521390.0, ans=10.0 2024-08-19 15:11:04,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4521390.0, ans=0.2 2024-08-19 15:11:09,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4521490.0, ans=0.0 2024-08-19 15:11:18,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4521490.0, ans=0.125 2024-08-19 15:11:20,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4521490.0, ans=0.125 2024-08-19 15:11:21,162 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.86 vs. limit=15.0 2024-08-19 15:11:24,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4521590.0, ans=0.125 2024-08-19 15:11:27,269 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.34 vs. limit=15.0 2024-08-19 15:11:37,416 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 15 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-19 15:11:38,781 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4521690.0, ans=0.0 2024-08-19 15:11:48,900 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.85 vs. limit=15.0 2024-08-19 15:11:57,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4521790.0, ans=0.0 2024-08-19 15:11:59,528 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 7600, loss[loss=0.1201, beats_loss=0.007196, ecapa_loss=0.0001308, whisper_loss=0.1116, over 21131.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01035, ecapa_loss=0.0001404, whisper_loss=0.0907, over 3879508.93 frames. ], batch size: 78, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:12:02,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4521890.0, ans=0.2 2024-08-19 15:12:13,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4521990.0, ans=0.125 2024-08-19 15:12:15,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=4521990.0, ans=0.2 2024-08-19 15:12:18,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4521990.0, ans=0.125 2024-08-19 15:12:24,267 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 22 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-19 15:12:28,524 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.16 vs. limit=6.0 2024-08-19 15:12:41,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4522190.0, ans=0.125 2024-08-19 15:12:44,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4522190.0, ans=0.1 2024-08-19 15:12:46,190 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 16 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-19 15:12:54,039 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+01 2.291e+01 2.555e+01 2.938e+01 1.089e+02, threshold=5.110e+01, percent-clipped=2.0 2024-08-19 15:13:01,255 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 7650, loss[loss=0.1011, beats_loss=0.01026, ecapa_loss=0.0001412, whisper_loss=0.08939, over 21597.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01035, ecapa_loss=0.0001415, whisper_loss=0.0904, over 3886893.98 frames. ], batch size: 87, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:13:04,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4522390.0, ans=0.125 2024-08-19 15:13:08,252 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.99 vs. limit=15.0 2024-08-19 15:13:11,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4522390.0, ans=0.0 2024-08-19 15:13:26,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4522590.0, ans=0.125 2024-08-19 15:13:44,485 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 25 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-19 15:13:47,584 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.51 vs. limit=15.0 2024-08-19 15:13:49,513 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 25 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-19 15:13:56,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4522790.0, ans=0.2 2024-08-19 15:14:01,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4522890.0, ans=0.125 2024-08-19 15:14:02,335 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 7700, loss[loss=0.09655, beats_loss=0.01056, ecapa_loss=0.0001308, whisper_loss=0.08468, over 16449.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01033, ecapa_loss=0.0001406, whisper_loss=0.09001, over 3885086.24 frames. ], batch size: 66, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:14:19,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4522990.0, ans=0.95 2024-08-19 15:14:24,740 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.86 vs. limit=12.0 2024-08-19 15:14:27,767 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 24 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-19 15:14:40,876 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 24 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-19 15:14:49,028 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.56 vs. limit=12.0 2024-08-19 15:14:55,509 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.405e+01 2.630e+01 2.865e+01 8.039e+01, threshold=5.260e+01, percent-clipped=1.0 2024-08-19 15:15:03,006 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 7750, loss[loss=0.1033, beats_loss=0.01072, ecapa_loss=0.0001068, whisper_loss=0.09155, over 17202.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01033, ecapa_loss=0.00014, whisper_loss=0.09066, over 3919804.88 frames. ], batch size: 65, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:15:15,868 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.10 vs. limit=15.0 2024-08-19 15:15:16,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4523490.0, ans=0.125 2024-08-19 15:15:16,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4523490.0, ans=0.0 2024-08-19 15:15:25,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4523490.0, ans=0.125 2024-08-19 15:15:27,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4523590.0, ans=0.05 2024-08-19 15:15:32,327 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 19 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-19 15:15:32,598 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 15:15:36,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4523590.0, ans=0.0 2024-08-19 15:15:39,622 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-19 15:15:45,943 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.41 vs. limit=22.5 2024-08-19 15:15:47,761 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 25 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-19 15:16:03,690 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 7800, loss[loss=0.1096, beats_loss=0.008377, ecapa_loss=0.0001604, whisper_loss=0.09961, over 21769.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01038, ecapa_loss=0.0001399, whisper_loss=0.08993, over 3907656.92 frames. ], batch size: 89, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:16:06,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4523890.0, ans=0.125 2024-08-19 15:16:46,362 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.96 vs. limit=15.0 2024-08-19 15:16:48,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4524190.0, ans=0.0 2024-08-19 15:16:56,327 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.665e+01 2.331e+01 2.591e+01 2.941e+01 6.755e+01, threshold=5.181e+01, percent-clipped=1.0 2024-08-19 15:16:59,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4524290.0, ans=0.125 2024-08-19 15:17:01,327 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 27 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-19 15:17:03,386 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 7850, loss[loss=0.09555, beats_loss=0.01187, ecapa_loss=0.0001274, whisper_loss=0.08241, over 16165.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01041, ecapa_loss=0.0001408, whisper_loss=0.09011, over 3924311.71 frames. ], batch size: 63, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:17:11,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4524390.0, ans=0.0 2024-08-19 15:17:14,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4524490.0, ans=0.125 2024-08-19 15:17:21,860 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 24 from LS+wenet, 6 from Vox, 33 fro AS 2024-08-19 15:17:49,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=4524690.0, ans=0.05 2024-08-19 15:18:01,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4524790.0, ans=0.1 2024-08-19 15:18:02,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4524890.0, ans=0.2 2024-08-19 15:18:03,663 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 7900, loss[loss=0.1065, beats_loss=0.01091, ecapa_loss=8.398e-05, whisper_loss=0.09472, over 15040.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01045, ecapa_loss=0.0001414, whisper_loss=0.09038, over 3919012.33 frames. ], batch size: 55, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:18:10,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=4524890.0, ans=15.0 2024-08-19 15:18:24,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4524990.0, ans=0.2 2024-08-19 15:18:26,361 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.23 vs. limit=12.0 2024-08-19 15:18:32,399 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-19 15:18:32,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=4525090.0, ans=0.2 2024-08-19 15:18:39,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4525190.0, ans=0.0 2024-08-19 15:18:44,239 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-19 15:18:56,257 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.850e+01 2.323e+01 2.530e+01 2.924e+01 2.485e+02, threshold=5.060e+01, percent-clipped=4.0 2024-08-19 15:19:00,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4525290.0, ans=0.2 2024-08-19 15:19:03,379 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 7950, loss[loss=0.1267, beats_loss=0.008244, ecapa_loss=0.0001588, whisper_loss=0.1168, over 23142.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01049, ecapa_loss=0.0001407, whisper_loss=0.09016, over 3914702.02 frames. ], batch size: 88, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:19:05,878 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 18 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-19 15:19:23,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4525490.0, ans=0.125 2024-08-19 15:19:34,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4525590.0, ans=0.125 2024-08-19 15:19:39,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4525690.0, ans=0.2 2024-08-19 15:19:43,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4525690.0, ans=0.1 2024-08-19 15:19:47,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4525690.0, ans=0.0 2024-08-19 15:19:53,889 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 32 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-19 15:19:55,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4525790.0, ans=0.125 2024-08-19 15:20:03,579 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 8000, loss[loss=0.1009, beats_loss=0.01211, ecapa_loss=0.0001201, whisper_loss=0.08757, over 22947.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01054, ecapa_loss=0.0001397, whisper_loss=0.09001, over 3904240.90 frames. ], batch size: 92, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:20:03,682 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-19 15:20:05,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=4525890.0, ans=15.0 2024-08-19 15:20:17,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4525990.0, ans=0.09899494936611666 2024-08-19 15:20:20,330 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 28 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-19 15:20:21,740 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 28 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-19 15:20:23,234 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 15:20:51,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4526290.0, ans=0.2 2024-08-19 15:20:56,407 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.249e+01 2.539e+01 2.836e+01 4.576e+01, threshold=5.079e+01, percent-clipped=0.0 2024-08-19 15:21:02,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4526390.0, ans=0.125 2024-08-19 15:21:03,677 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 8050, loss[loss=0.08727, beats_loss=0.0104, ecapa_loss=0.0001494, whisper_loss=0.07537, over 15676.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01052, ecapa_loss=0.0001392, whisper_loss=0.08997, over 3888467.95 frames. ], batch size: 63, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:21:09,747 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-19 15:21:25,734 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.08 vs. limit=6.0 2024-08-19 15:21:28,841 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 28 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-19 15:21:36,047 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.95 vs. limit=15.0 2024-08-19 15:22:03,880 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 8100, loss[loss=0.09911, beats_loss=0.008905, ecapa_loss=0.0001391, whisper_loss=0.08881, over 18680.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01045, ecapa_loss=0.0001401, whisper_loss=0.09021, over 3907054.87 frames. ], batch size: 72, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:22:06,073 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.85 vs. limit=22.5 2024-08-19 15:22:24,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4526990.0, ans=0.2 2024-08-19 15:22:39,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4527190.0, ans=0.1 2024-08-19 15:22:41,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4527190.0, ans=0.0 2024-08-19 15:22:47,313 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 17 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-19 15:22:49,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4527190.0, ans=0.1 2024-08-19 15:22:50,042 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=15.0 2024-08-19 15:22:56,202 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.730e+01 2.370e+01 2.531e+01 2.810e+01 1.337e+02, threshold=5.062e+01, percent-clipped=2.0 2024-08-19 15:23:00,588 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2024-08-19 15:23:03,536 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 8150, loss[loss=0.1144, beats_loss=0.01022, ecapa_loss=0.0001272, whisper_loss=0.1029, over 22102.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01049, ecapa_loss=0.0001406, whisper_loss=0.08957, over 3918509.07 frames. ], batch size: 87, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:23:12,435 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.44 vs. limit=15.0 2024-08-19 15:23:15,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4527490.0, ans=0.0 2024-08-19 15:23:19,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4527490.0, ans=0.0 2024-08-19 15:23:20,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4527490.0, ans=0.125 2024-08-19 15:23:58,152 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-19 15:24:03,103 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 8200, loss[loss=0.08287, beats_loss=0.01297, ecapa_loss=0.0001125, whisper_loss=0.06878, over 19207.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0105, ecapa_loss=0.0001402, whisper_loss=0.08975, over 3954435.77 frames. ], batch size: 76, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:24:15,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4527990.0, ans=0.09899494936611666 2024-08-19 15:24:15,998 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.20 vs. limit=15.0 2024-08-19 15:24:20,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4527990.0, ans=0.1 2024-08-19 15:24:24,211 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=7.212e-01 2024-08-19 15:24:27,918 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.16 vs. limit=15.0 2024-08-19 15:24:29,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4528090.0, ans=0.125 2024-08-19 15:24:40,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4528190.0, ans=0.09899494936611666 2024-08-19 15:24:46,474 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-19 15:24:49,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4528190.0, ans=0.0 2024-08-19 15:24:55,857 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.279e+01 2.474e+01 2.838e+01 8.024e+01, threshold=4.948e+01, percent-clipped=1.0 2024-08-19 15:25:03,237 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 8250, loss[loss=0.106, beats_loss=0.01171, ecapa_loss=0.0001315, whisper_loss=0.09295, over 22951.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01052, ecapa_loss=0.0001406, whisper_loss=0.08979, over 3951594.74 frames. ], batch size: 90, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:25:04,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4528390.0, ans=0.1 2024-08-19 15:25:14,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4528490.0, ans=0.125 2024-08-19 15:25:19,034 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-19 15:25:22,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4528490.0, ans=0.04949747468305833 2024-08-19 15:25:29,876 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-19 15:25:32,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4528590.0, ans=0.125 2024-08-19 15:25:35,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4528590.0, ans=0.0 2024-08-19 15:25:50,592 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4528790.0, ans=0.125 2024-08-19 15:25:53,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4528790.0, ans=0.2 2024-08-19 15:26:03,568 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 8300, loss[loss=0.1086, beats_loss=0.01148, ecapa_loss=0.0001091, whisper_loss=0.09598, over 22835.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01056, ecapa_loss=0.0001399, whisper_loss=0.08991, over 3936514.45 frames. ], batch size: 89, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:26:19,005 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-19 15:26:22,616 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 19 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-19 15:26:31,264 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.05 vs. limit=15.0 2024-08-19 15:26:50,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4529290.0, ans=0.2 2024-08-19 15:26:55,593 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.344e+01 2.567e+01 2.934e+01 1.286e+02, threshold=5.133e+01, percent-clipped=2.0 2024-08-19 15:26:58,254 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 24 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-19 15:27:02,883 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 8350, loss[loss=0.09443, beats_loss=0.01119, ecapa_loss=0.0001501, whisper_loss=0.08174, over 17484.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01055, ecapa_loss=0.0001407, whisper_loss=0.08908, over 3922921.48 frames. ], batch size: 71, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:27:21,283 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=8.74 vs. limit=12.0 2024-08-19 15:27:21,523 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.42 vs. limit=6.0 2024-08-19 15:27:32,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4529590.0, ans=0.125 2024-08-19 15:27:34,270 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-19 15:27:41,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4529690.0, ans=0.0 2024-08-19 15:27:52,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4529790.0, ans=0.125 2024-08-19 15:27:53,203 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-19 15:27:54,342 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 23 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-19 15:27:59,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4529790.0, ans=0.125 2024-08-19 15:28:02,405 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 8400, loss[loss=0.08995, beats_loss=0.01175, ecapa_loss=0.0001525, whisper_loss=0.07667, over 21540.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01057, ecapa_loss=0.0001406, whisper_loss=0.08901, over 3952018.68 frames. ], batch size: 91, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:28:05,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4529890.0, ans=0.04949747468305833 2024-08-19 15:28:14,812 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-19 15:28:19,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4529990.0, ans=0.0 2024-08-19 15:28:26,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4530090.0, ans=0.09899494936611666 2024-08-19 15:28:42,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4530190.0, ans=0.1 2024-08-19 15:28:52,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4530290.0, ans=0.1 2024-08-19 15:28:54,962 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.308e+01 2.547e+01 2.800e+01 8.519e+01, threshold=5.094e+01, percent-clipped=2.0 2024-08-19 15:28:55,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4530290.0, ans=0.95 2024-08-19 15:29:01,979 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 8450, loss[loss=0.1005, beats_loss=0.0117, ecapa_loss=0.0001333, whisper_loss=0.08749, over 21284.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01053, ecapa_loss=0.0001408, whisper_loss=0.08925, over 3962386.29 frames. ], batch size: 88, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:29:23,223 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 23 from LS+wenet, 24 from Vox, 47 fro AS 2024-08-19 15:29:25,317 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.95 vs. limit=15.0 2024-08-19 15:29:40,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4530690.0, ans=0.125 2024-08-19 15:29:49,409 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-19 15:30:00,918 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 8500, loss[loss=0.09486, beats_loss=0.00994, ecapa_loss=0.0001498, whisper_loss=0.08342, over 20588.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01039, ecapa_loss=0.0001411, whisper_loss=0.09023, over 3962888.74 frames. ], batch size: 83, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:30:02,593 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.54 vs. limit=15.0 2024-08-19 15:30:15,863 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.02 vs. limit=10.0 2024-08-19 15:30:23,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4531090.0, ans=0.125 2024-08-19 15:30:29,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4531090.0, ans=0.0 2024-08-19 15:30:31,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4531090.0, ans=0.04949747468305833 2024-08-19 15:30:41,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4531190.0, ans=0.125 2024-08-19 15:30:47,262 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 27 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-19 15:30:53,227 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.328e+01 2.576e+01 3.014e+01 4.814e+01, threshold=5.151e+01, percent-clipped=0.0 2024-08-19 15:31:00,398 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 8550, loss[loss=0.1129, beats_loss=0.009128, ecapa_loss=0.0001472, whisper_loss=0.1023, over 19732.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01037, ecapa_loss=0.000142, whisper_loss=0.09037, over 3938299.72 frames. ], batch size: 79, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:31:03,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4531390.0, ans=0.0 2024-08-19 15:31:18,442 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 24 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-19 15:31:18,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4531490.0, ans=0.1 2024-08-19 15:31:41,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4531690.0, ans=0.125 2024-08-19 15:31:46,454 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-19 15:31:59,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4531890.0, ans=0.125 2024-08-19 15:31:59,968 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.88 vs. limit=12.0 2024-08-19 15:32:00,551 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 8600, loss[loss=0.09949, beats_loss=0.007919, ecapa_loss=0.0001511, whisper_loss=0.09006, over 13714.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01031, ecapa_loss=0.0001415, whisper_loss=0.09104, over 3907965.01 frames. ], batch size: 55, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:32:00,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4531890.0, ans=0.125 2024-08-19 15:32:36,335 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 28 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-19 15:32:36,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4532190.0, ans=0.125 2024-08-19 15:32:36,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4532190.0, ans=0.0 2024-08-19 15:32:37,611 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 22 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-19 15:32:38,918 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4532190.0, ans=0.0 2024-08-19 15:32:43,814 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 39 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-19 15:32:45,970 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 29 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-19 15:32:51,901 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 24 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-19 15:32:52,880 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.343e+01 2.555e+01 2.873e+01 3.984e+01, threshold=5.110e+01, percent-clipped=0.0 2024-08-19 15:32:58,374 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.05 vs. limit=6.0 2024-08-19 15:33:00,238 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 8650, loss[loss=0.06245, beats_loss=0.012, ecapa_loss=0.0001522, whisper_loss=0.04892, over 13567.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01034, ecapa_loss=0.0001417, whisper_loss=0.09106, over 3932010.99 frames. ], batch size: 57, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:33:00,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4532390.0, ans=0.125 2024-08-19 15:33:12,815 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.68 vs. limit=15.0 2024-08-19 15:33:13,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4532490.0, ans=0.125 2024-08-19 15:33:14,300 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 17 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-19 15:33:26,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4532590.0, ans=0.0 2024-08-19 15:33:28,774 WARNING [optim.py:496] (0/4) Scaling gradients by 0.054981451481580734, model_norm_threshold=51.102230072021484 2024-08-19 15:33:28,936 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.12, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.056e+05, grad_sumsq=1.009e+07, orig_rms_sq=1.047e-02 2024-08-19 15:33:36,696 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.14 vs. limit=15.0 2024-08-19 15:33:42,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4532690.0, ans=0.1 2024-08-19 15:33:55,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4532790.0, ans=0.125 2024-08-19 15:34:00,475 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 8700, loss[loss=0.09305, beats_loss=0.009862, ecapa_loss=0.0001202, whisper_loss=0.08198, over 14455.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01045, ecapa_loss=0.0001414, whisper_loss=0.09039, over 3876489.09 frames. ], batch size: 55, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:34:02,373 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.58 vs. limit=15.0 2024-08-19 15:34:05,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4532890.0, ans=0.125 2024-08-19 15:34:11,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4532990.0, ans=0.1 2024-08-19 15:34:15,550 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.72 vs. limit=15.0 2024-08-19 15:34:16,340 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 22 from LS+wenet, 14 from Vox, 50 fro AS 2024-08-19 15:34:17,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4532990.0, ans=0.07 2024-08-19 15:34:22,474 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 14 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-19 15:34:27,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4533090.0, ans=0.0 2024-08-19 15:34:40,636 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 18 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-19 15:34:43,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=4533190.0, ans=15.0 2024-08-19 15:34:53,681 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.820e+01 2.310e+01 2.553e+01 2.787e+01 9.294e+02, threshold=5.105e+01, percent-clipped=1.0 2024-08-19 15:34:57,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4533290.0, ans=0.09899494936611666 2024-08-19 15:34:58,625 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 29 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-19 15:35:00,801 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 8750, loss[loss=0.1037, beats_loss=0.01059, ecapa_loss=0.0001308, whisper_loss=0.09181, over 19900.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01042, ecapa_loss=0.0001408, whisper_loss=0.08966, over 3845733.89 frames. ], batch size: 78, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:35:02,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4533390.0, ans=0.125 2024-08-19 15:35:12,872 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 15 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-19 15:35:13,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4533490.0, ans=0.125 2024-08-19 15:35:14,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=4533490.0, ans=0.2 2024-08-19 15:35:19,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4533490.0, ans=0.1 2024-08-19 15:35:39,436 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-19 15:35:46,162 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.98 vs. limit=15.0 2024-08-19 15:35:59,600 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 23 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-19 15:36:00,677 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 8800, loss[loss=0.1, beats_loss=0.01187, ecapa_loss=0.0001263, whisper_loss=0.08687, over 20143.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01054, ecapa_loss=0.0001399, whisper_loss=0.08968, over 3865469.99 frames. ], batch size: 80, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:36:02,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4533890.0, ans=0.0 2024-08-19 15:36:11,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4533990.0, ans=0.2 2024-08-19 15:36:29,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4534090.0, ans=0.125 2024-08-19 15:36:42,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4534190.0, ans=0.2 2024-08-19 15:36:44,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4534190.0, ans=0.125 2024-08-19 15:36:45,346 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 26 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-19 15:36:53,352 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.915e+01 2.339e+01 2.639e+01 2.921e+01 5.304e+01, threshold=5.278e+01, percent-clipped=1.0 2024-08-19 15:37:00,585 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 8850, loss[loss=0.1206, beats_loss=0.00687, ecapa_loss=0.0001669, whisper_loss=0.112, over 14524.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01054, ecapa_loss=0.0001392, whisper_loss=0.08972, over 3842827.51 frames. ], batch size: 56, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:37:02,199 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-19 15:37:05,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4534390.0, ans=0.0 2024-08-19 15:37:15,052 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-19 15:37:21,155 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 23 from LS+wenet, 22 from Vox, 49 fro AS 2024-08-19 15:37:24,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4534590.0, ans=0.0 2024-08-19 15:37:40,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4534690.0, ans=0.125 2024-08-19 15:37:46,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4534690.0, ans=0.125 2024-08-19 15:38:00,679 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 8900, loss[loss=0.1346, beats_loss=0.007391, ecapa_loss=0.000138, whisper_loss=0.1258, over 18848.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01059, ecapa_loss=0.000139, whisper_loss=0.08976, over 3872511.69 frames. ], batch size: 68, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:38:10,664 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 26 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-19 15:38:10,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4534890.0, ans=0.125 2024-08-19 15:38:10,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4534890.0, ans=0.125 2024-08-19 15:38:14,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4534990.0, ans=0.125 2024-08-19 15:38:36,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4535190.0, ans=0.0 2024-08-19 15:38:36,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4535190.0, ans=10.0 2024-08-19 15:38:37,984 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 15:38:43,605 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-19 15:38:54,824 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.335e+01 2.657e+01 2.947e+01 4.207e+01, threshold=5.314e+01, percent-clipped=0.0 2024-08-19 15:38:56,811 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.64 vs. limit=15.0 2024-08-19 15:38:57,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4535290.0, ans=0.125 2024-08-19 15:39:01,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4535390.0, ans=0.125 2024-08-19 15:39:01,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4535390.0, ans=0.125 2024-08-19 15:39:02,159 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 8950, loss[loss=0.1343, beats_loss=0.007952, ecapa_loss=0.0001634, whisper_loss=0.1248, over 23264.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01055, ecapa_loss=0.0001391, whisper_loss=0.0896, over 3884469.17 frames. ], batch size: 90, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:39:04,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4535390.0, ans=0.125 2024-08-19 15:39:10,762 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-19 15:39:18,315 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-19 15:39:29,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4535590.0, ans=0.125 2024-08-19 15:39:31,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4535590.0, ans=0.125 2024-08-19 15:39:36,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4535590.0, ans=0.0 2024-08-19 15:39:40,170 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.05 vs. limit=15.0 2024-08-19 15:39:42,356 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.763e+01 2024-08-19 15:39:43,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4535690.0, ans=0.1 2024-08-19 15:40:02,149 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 9000, loss[loss=0.08353, beats_loss=0.01111, ecapa_loss=0.000163, whisper_loss=0.07079, over 16882.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01053, ecapa_loss=0.0001398, whisper_loss=0.08936, over 3900361.55 frames. ], batch size: 73, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:40:02,150 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-19 15:40:16,219 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.7612, 2.3421, 2.3443, 1.4299, 0.2812, 2.9451, 2.7192, 0.8671], device='cuda:0') 2024-08-19 15:40:30,363 INFO [train_multi_KD3.py:1149] (0/4) Epoch 31, validation on ASR_libri: loss=0.2531, beats_loss=0, ecapa_loss=0.0005084, whisper_loss=0.248, over 922467.00 frames. 2024-08-19 15:40:43,481 INFO [train_multi_KD3.py:1149] (0/4) Epoch 31, validation on SV_voxceleb1: loss=0.004046, beats_loss=0, ecapa_loss=0.0004046, whisper_loss=0, over 939242.00 frames. 2024-08-19 15:42:05,880 INFO [train_multi_KD3.py:1149] (0/4) Epoch 31, validation on AT_audioset: loss=0.02311, beats_loss=0.02311, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 15:42:05,885 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32389MB 2024-08-19 15:42:07,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4535890.0, ans=0.125 2024-08-19 15:42:08,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4535890.0, ans=0.125 2024-08-19 15:42:09,593 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4535890.0, ans=0.125 2024-08-19 15:42:10,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4535890.0, ans=0.1 2024-08-19 15:42:18,838 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.60 vs. limit=15.0 2024-08-19 15:42:23,986 WARNING [optim.py:496] (0/4) Scaling gradients by 0.08310459554195404, model_norm_threshold=53.13531494140625 2024-08-19 15:42:24,152 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.161e+04, grad_sumsq=1.575e+04, orig_rms_sq=3.277e+00 2024-08-19 15:42:36,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4536090.0, ans=0.0 2024-08-19 15:42:37,943 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4536090.0, ans=0.0 2024-08-19 15:42:56,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4536290.0, ans=0.2 2024-08-19 15:42:58,834 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.296e+01 2.631e+01 2.912e+01 6.394e+02, threshold=5.262e+01, percent-clipped=1.0 2024-08-19 15:42:59,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4536290.0, ans=0.2 2024-08-19 15:43:05,969 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 9050, loss[loss=0.1292, beats_loss=0.007877, ecapa_loss=0.000111, whisper_loss=0.1202, over 17590.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01046, ecapa_loss=0.0001409, whisper_loss=0.09012, over 3920204.49 frames. ], batch size: 62, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:43:09,783 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 25 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-19 15:43:12,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4536390.0, ans=0.0 2024-08-19 15:43:16,965 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 16 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-19 15:43:21,530 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-19 15:43:22,743 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-19 15:43:22,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4536490.0, ans=0.125 2024-08-19 15:43:31,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4536590.0, ans=0.09899494936611666 2024-08-19 15:43:36,124 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-19 15:43:36,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4536590.0, ans=0.2 2024-08-19 15:43:38,875 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.12 vs. limit=15.0 2024-08-19 15:43:39,142 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.25 vs. limit=15.0 2024-08-19 15:43:43,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4536690.0, ans=0.125 2024-08-19 15:43:44,006 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 23 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-19 15:43:45,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4536690.0, ans=0.1 2024-08-19 15:43:51,661 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.97 vs. limit=22.5 2024-08-19 15:43:57,390 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.128e-01 2024-08-19 15:44:05,267 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 9100, loss[loss=0.1155, beats_loss=0.009031, ecapa_loss=0.0001291, whisper_loss=0.1052, over 22441.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01042, ecapa_loss=0.0001416, whisper_loss=0.09005, over 3885765.02 frames. ], batch size: 86, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:44:05,373 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 39 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-19 15:44:09,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4536890.0, ans=0.2 2024-08-19 15:44:10,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4536890.0, ans=0.125 2024-08-19 15:44:12,709 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 31 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-19 15:44:24,383 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 21 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-19 15:44:30,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4537090.0, ans=0.2 2024-08-19 15:44:35,572 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 23 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-19 15:44:40,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4537190.0, ans=0.1 2024-08-19 15:44:54,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4537290.0, ans=0.125 2024-08-19 15:45:00,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4537290.0, ans=0.125 2024-08-19 15:45:03,564 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.267e+01 2.524e+01 2.813e+01 7.972e+01, threshold=5.047e+01, percent-clipped=1.0 2024-08-19 15:45:12,319 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 9150, loss[loss=0.1023, beats_loss=0.008951, ecapa_loss=0.000173, whisper_loss=0.09166, over 17593.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01036, ecapa_loss=0.0001413, whisper_loss=0.09065, over 3888462.75 frames. ], batch size: 74, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:45:15,371 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 27 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-19 15:45:28,061 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.56 vs. limit=15.0 2024-08-19 15:45:28,241 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.11 vs. limit=15.0 2024-08-19 15:45:30,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4537490.0, ans=0.0 2024-08-19 15:45:47,886 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-19 15:45:48,415 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.47 vs. limit=22.5 2024-08-19 15:45:50,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4537590.0, ans=0.05 2024-08-19 15:45:52,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4537690.0, ans=0.0 2024-08-19 15:45:52,244 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.61 vs. limit=12.0 2024-08-19 15:45:57,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4537690.0, ans=0.05 2024-08-19 15:46:11,300 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-19 15:46:14,472 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 20 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-19 15:46:19,733 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 9200, loss[loss=0.1035, beats_loss=0.01103, ecapa_loss=0.0001351, whisper_loss=0.09113, over 22311.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01044, ecapa_loss=0.0001399, whisper_loss=0.09034, over 3870559.71 frames. ], batch size: 88, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:46:29,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4537890.0, ans=0.125 2024-08-19 15:46:47,545 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 25 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-19 15:46:48,107 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.79 vs. limit=15.0 2024-08-19 15:47:02,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4538190.0, ans=0.125 2024-08-19 15:47:11,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4538290.0, ans=0.0 2024-08-19 15:47:16,757 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.333e+01 2.570e+01 2.855e+01 5.209e+01, threshold=5.141e+01, percent-clipped=1.0 2024-08-19 15:47:19,737 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.260e+00 2024-08-19 15:47:24,290 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 9250, loss[loss=0.08549, beats_loss=0.01156, ecapa_loss=0.0001717, whisper_loss=0.07221, over 22171.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01041, ecapa_loss=0.0001409, whisper_loss=0.09066, over 3869601.55 frames. ], batch size: 91, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:47:32,786 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.81 vs. limit=15.0 2024-08-19 15:47:33,254 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 21 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-19 15:47:43,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4538490.0, ans=0.0 2024-08-19 15:47:55,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4538590.0, ans=0.1 2024-08-19 15:48:11,263 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.53 vs. limit=22.5 2024-08-19 15:48:29,378 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 9300, loss[loss=0.1097, beats_loss=0.009828, ecapa_loss=9.949e-05, whisper_loss=0.09886, over 19070.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01037, ecapa_loss=0.0001416, whisper_loss=0.09053, over 3853547.41 frames. ], batch size: 71, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:48:33,182 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-19 15:48:59,314 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 12 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-19 15:49:12,442 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 19 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-19 15:49:22,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4539290.0, ans=0.125 2024-08-19 15:49:25,214 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.419e+01 2.678e+01 2.934e+01 3.690e+01, threshold=5.356e+01, percent-clipped=0.0 2024-08-19 15:49:31,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4539290.0, ans=0.125 2024-08-19 15:49:33,374 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 9350, loss[loss=0.1027, beats_loss=0.01132, ecapa_loss=0.0001365, whisper_loss=0.09001, over 15603.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01041, ecapa_loss=0.0001412, whisper_loss=0.08979, over 3821438.61 frames. ], batch size: 61, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:50:09,160 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 17 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-19 15:50:20,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4539690.0, ans=0.125 2024-08-19 15:50:28,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4539790.0, ans=0.125 2024-08-19 15:50:28,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4539790.0, ans=0.025 2024-08-19 15:50:35,124 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 9400, loss[loss=0.07986, beats_loss=0.01165, ecapa_loss=0.0001348, whisper_loss=0.06687, over 18284.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01044, ecapa_loss=0.0001411, whisper_loss=0.08918, over 3824843.50 frames. ], batch size: 75, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:50:42,194 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.23 vs. limit=12.0 2024-08-19 15:50:56,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=4539990.0, ans=15.0 2024-08-19 15:51:00,812 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-19 15:51:17,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4540190.0, ans=0.125 2024-08-19 15:51:17,546 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.70 vs. limit=15.0 2024-08-19 15:51:34,297 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 28 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-19 15:51:39,318 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.736e+01 2.311e+01 2.569e+01 2.722e+01 4.090e+01, threshold=5.139e+01, percent-clipped=0.0 2024-08-19 15:51:42,011 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.07 vs. limit=15.0 2024-08-19 15:51:52,100 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 9450, loss[loss=0.09966, beats_loss=0.01076, ecapa_loss=0.0001276, whisper_loss=0.08762, over 17554.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01048, ecapa_loss=0.0001409, whisper_loss=0.08884, over 3813669.99 frames. ], batch size: 69, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:51:54,626 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-19 15:52:03,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4540390.0, ans=0.1 2024-08-19 15:52:04,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4540390.0, ans=0.125 2024-08-19 15:52:33,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4540590.0, ans=0.125 2024-08-19 15:52:38,841 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.25 vs. limit=15.0 2024-08-19 15:52:42,074 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 22 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-19 15:52:49,496 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-19 15:53:02,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4540790.0, ans=0.125 2024-08-19 15:53:04,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4540790.0, ans=0.125 2024-08-19 15:53:12,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=4540890.0, ans=0.1 2024-08-19 15:53:13,520 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 9500, loss[loss=0.08592, beats_loss=0.0117, ecapa_loss=0.00013, whisper_loss=0.07292, over 15737.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01052, ecapa_loss=0.0001403, whisper_loss=0.08857, over 3823142.78 frames. ], batch size: 65, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:53:14,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4540890.0, ans=0.1 2024-08-19 15:53:15,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4540890.0, ans=0.125 2024-08-19 15:53:15,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4540890.0, ans=0.2 2024-08-19 15:53:27,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4540990.0, ans=0.125 2024-08-19 15:53:35,016 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.94 vs. limit=15.0 2024-08-19 15:53:36,251 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.09 vs. limit=12.0 2024-08-19 15:53:37,241 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 25 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-19 15:54:19,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4541190.0, ans=0.0 2024-08-19 15:54:20,183 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.59 vs. limit=22.5 2024-08-19 15:54:28,593 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4541290.0, ans=0.0 2024-08-19 15:54:37,970 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+01 2.373e+01 2.637e+01 2.974e+01 3.781e+01, threshold=5.275e+01, percent-clipped=0.0 2024-08-19 15:54:51,067 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 9550, loss[loss=0.1079, beats_loss=0.008283, ecapa_loss=0.0002099, whisper_loss=0.09748, over 18996.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01042, ecapa_loss=0.0001426, whisper_loss=0.08837, over 3775269.63 frames. ], batch size: 83, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:54:53,596 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.76 vs. limit=15.0 2024-08-19 15:55:42,763 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 39 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-19 15:55:47,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4541590.0, ans=0.125 2024-08-19 15:55:49,085 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-19 15:55:53,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4541690.0, ans=0.2 2024-08-19 15:56:21,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4541790.0, ans=0.2 2024-08-19 15:56:38,075 INFO [train_multi_KD3.py:1116] (0/4) Epoch 31, batch 9600, loss[loss=0.07957, beats_loss=0.01305, ecapa_loss=0.0001276, whisper_loss=0.06524, over 15053.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01046, ecapa_loss=0.0001404, whisper_loss=0.08888, over 3807127.97 frames. ], batch size: 60, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:56:47,322 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 17 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-19 15:57:18,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4542090.0, ans=0.125