2024-08-17 12:55:56,773 INFO [train_multi_KD3.py:1187] (2/4) Training started 2024-08-17 12:55:56,773 INFO [train_multi_KD3.py:1197] (2/4) Device: cuda:2 2024-08-17 12:55:56,791 INFO [train_multi_KD3.py:1212] (2/4) Using dtype=torch.bfloat16 2024-08-17 12:55:56,791 INFO [train_multi_KD3.py:1214] (2/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'e400fa3b456faf8afe0ee5bfe572946b4921a3db', 'k2-git-date': 'Sat Jul 15 04:21:50 2023', 'lhotse-version': '1.16.0', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.9', 'icefall-git-branch': 'multi_KD_with_wenet', 'icefall-git-sha1': '0d2af1df-clean', 'icefall-git-date': 'Wed Aug 14 17:27:16 2024', 'icefall-path': '/xy/mnt/yangxiaoyu/workspace/icefall_multi_KD', 'k2-path': '/root/anaconda3/lib/python3.9/site-packages/k2/__init__.py', 'lhotse-path': '/root/anaconda3/lib/python3.9/site-packages/lhotse/__init__.py', 'hostname': 'NGK_xiaoyu'}, 'world_size': 4, 'master_port': 13440, 'tensorboard': True, 'num_epochs': 35, 'start_epoch': 1, 'start_batch': 332000, 'exp_dir': PosixPath('multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'stop_early': True, 'use_fp16': False, 'use_bf16': True, 'share_asr': True, 'beats_loss_scale': 1.0, 'ecapa_loss_scale': 10.0, 'whisper_loss_scale': 1.0, 'whisper_cb_loss_scale': 0.01, 'repeat_librispeech': 5, 'repeat_wenetspeech': 0, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': True, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': True, 'use_ctc': False, 'speaker_input_idx': 2, 'whisper_dim': 1280, 'use_task_id': True, 'num_codebooks': 32, 'mvq_kd_layer_idx': -1, 'use_subsampled_output': True, 'delta_t': 6, 'full_libri': True, 'mini_libri': False, 'use_libriheavy': False, 'libriheavy_subset': 'small', 'use_librispeech': True, 'use_wenetspeech': False, 'use_audioset': True, 'audioset_subset': 'unbalanced', 'use_voxceleb': True, 'voxceleb_subset': 'vox2', 'use_fma': False, 'fma_subset': 'large', 'manifest_dir': PosixPath('data/fbank_LSVoxAs_with_whisper_large-v3_with_taskID'), 'max_duration': 1500, 'bucketing_sampler': False, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 1, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'enable_musan': False, 'enable_audioset': False, 'use_musan_separately': False, 'input_strategy': 'PrecomputedFeatures', 'drop_features': False, 'return_audio': False, 'use_beats': True, 'use_ecapa': True, 'use_whisper': True, 'whisper_mvq': False, 'beats_ckpt': 'data/models/BEATs/BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2.pt', 'whisper_version': 'large-v3', 'use_mert': False, 'blank_id': 0, 'vocab_size': 500, 'dtype': torch.bfloat16, 'use_amp': True} 2024-08-17 12:55:56,791 INFO [train_multi_KD3.py:1216] (2/4) About to create model 2024-08-17 12:55:57,150 INFO [model_shift.py:142] (2/4) Delta_t: 6 when computing the distillation loss 2024-08-17 12:55:57,155 INFO [train_multi_KD3.py:1220] (2/4) Number of model parameters: 66484678 2024-08-17 12:55:57,157 INFO [checkpoint.py:112] (2/4) Loading checkpoint from multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-332000.pt 2024-08-17 12:55:59,686 INFO [train_multi_KD3.py:1235] (2/4) Using DDP 2024-08-17 12:56:01,427 INFO [train_multi_KD3.py:1247] (2/4) Loading optimizer state dict 2024-08-17 12:56:01,729 INFO [train_multi_KD3.py:1255] (2/4) Loading scheduler state dict 2024-08-17 12:56:01,729 INFO [kd_datamodule.py:690] (2/4) About to get train 960 cuts 2024-08-17 12:56:01,782 INFO [train_multi_KD3.py:1306] (2/4) Getting audioset cuts 2024-08-17 12:56:01,782 INFO [kd_datamodule.py:900] (2/4) About to get the audioset cuts for KD. 2024-08-17 12:56:01,804 INFO [kd_datamodule.py:869] (2/4) About to get the voxceleb cuts. 2024-08-17 12:56:01,808 INFO [kd_datamodule.py:880] (2/4) Adding voxceleb2 cuts. 2024-08-17 12:56:01,815 INFO [train_multi_KD3.py:1320] (2/4) Using mux to combine Librispeech: True, WenetSpeech: False, audioset: True and voxceleb: True 2024-08-17 12:56:09,800 INFO [train_multi_KD3.py:1322] (2/4) Using mux to combine [CutSet(len=1406195) [underlying data type: ], CutSet(len=1904746) [underlying data type: ], CutSet(len=1187704) [underlying data type: ]] 2024-08-17 12:56:09,800 INFO [train_multi_KD3.py:1323] (2/4) Using weights: [1406195, 1904746, 1187704] 2024-08-17 12:56:09,801 INFO [train_multi_KD3.py:1332] (2/4) CutSet(len=4498645) [underlying data type: ] 2024-08-17 12:56:09,801 INFO [kd_datamodule.py:449] (2/4) Disable MUSAN 2024-08-17 12:56:09,801 INFO [kd_datamodule.py:489] (2/4) Disable SpecAugment 2024-08-17 12:56:09,801 INFO [kd_datamodule.py:491] (2/4) About to create train dataset 2024-08-17 12:56:09,807 INFO [kd_datamodule.py:528] (2/4) Using SimpleCutSampler 2024-08-17 12:56:09,809 INFO [kd_datamodule.py:536] (2/4) About to create train dataloader 2024-08-17 12:56:09,809 INFO [kd_datamodule.py:539] (2/4) Loading sampler state dict 2024-08-17 12:57:16,567 INFO [kd_datamodule.py:763] (2/4) About to get dev-clean cuts 2024-08-17 12:57:16,569 INFO [kd_datamodule.py:781] (2/4) About to get dev-other cuts 2024-08-17 12:57:16,575 INFO [kd_datamodule.py:570] (2/4) About to create dev dataset 2024-08-17 12:57:16,828 INFO [kd_datamodule.py:591] (2/4) About to create dev dataloader 2024-08-17 12:57:16,828 INFO [kd_datamodule.py:840] (2/4) About to get the test set of voxceleb1 set. 2024-08-17 12:57:16,829 INFO [kd_datamodule.py:570] (2/4) About to create dev dataset 2024-08-17 12:57:17,079 INFO [kd_datamodule.py:591] (2/4) About to create dev dataloader 2024-08-17 12:57:17,079 INFO [kd_datamodule.py:912] (2/4) About to get the audioset eval cuts. 2024-08-17 12:57:17,083 INFO [kd_datamodule.py:570] (2/4) About to create dev dataset 2024-08-17 12:57:17,577 INFO [kd_datamodule.py:591] (2/4) About to create dev dataloader 2024-08-17 12:57:17,577 INFO [train_multi_KD3.py:1412] (2/4) ['ASR_libri', 'SV_voxceleb1', 'AT_audioset'] 2024-08-17 12:57:17,577 INFO [train_multi_KD3.py:1416] (2/4) Loading grad scaler state dict 2024-08-17 12:57:30,037 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 0, loss[loss=0.1183, beats_loss=0.009043, ecapa_loss=0.0001942, whisper_loss=0.1073, over 20700.00 frames. ], tot_loss[loss=0.1183, beats_loss=0.009043, ecapa_loss=0.0001942, whisper_loss=0.1073, over 20700.00 frames. ], batch size: 89, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 12:57:30,037 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-17 12:58:09,332 INFO [train_multi_KD3.py:1149] (2/4) Epoch 23, validation on ASR_libri: loss=0.2516, beats_loss=0, ecapa_loss=0.0005218, whisper_loss=0.2464, over 922467.00 frames. 2024-08-17 12:58:23,199 INFO [train_multi_KD3.py:1149] (2/4) Epoch 23, validation on SV_voxceleb1: loss=0.004106, beats_loss=0, ecapa_loss=0.0004106, whisper_loss=0, over 939242.00 frames. 2024-08-17 13:00:20,862 INFO [train_multi_KD3.py:1149] (2/4) Epoch 23, validation on AT_audioset: loss=0.02324, beats_loss=0.02324, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-17 13:00:20,864 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31584MB 2024-08-17 13:00:24,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3320000.0, ans=0.0 2024-08-17 13:00:44,330 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3320100.0, ans=0.125 2024-08-17 13:01:04,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3320200.0, ans=0.2 2024-08-17 13:01:08,739 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 31 from LS+wenet, 21 from Vox, 17 fro AS 2024-08-17 13:01:12,880 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 17 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-17 13:01:21,914 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 18 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-17 13:01:24,225 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.681e-01 2024-08-17 13:01:29,687 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.35 vs. limit=10.0 2024-08-17 13:01:39,890 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 34 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-17 13:01:47,035 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.20 vs. limit=15.0 2024-08-17 13:01:51,824 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 50, loss[loss=0.1007, beats_loss=0.01134, ecapa_loss=0.0001265, whisper_loss=0.08811, over 22790.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01077, ecapa_loss=0.0001396, whisper_loss=0.09156, over 886261.74 frames. ], batch size: 90, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:01:58,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3320500.0, ans=0.0 2024-08-17 13:02:14,876 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-17 13:02:29,973 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-17 13:02:44,840 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.94 vs. limit=15.0 2024-08-17 13:02:48,182 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.300e+01 2.566e+01 2.925e+01 4.524e+01, threshold=5.133e+01, percent-clipped=0.0 2024-08-17 13:02:58,143 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 21 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-17 13:02:58,388 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3320900.0, ans=0.2 2024-08-17 13:02:59,786 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3320900.0, ans=0.125 2024-08-17 13:03:02,214 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-17 13:03:03,052 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.89 vs. limit=5.0 2024-08-17 13:03:06,751 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.009e-01 2024-08-17 13:03:08,938 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 100, loss[loss=0.106, beats_loss=0.009471, ecapa_loss=0.0001603, whisper_loss=0.09495, over 22409.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01065, ecapa_loss=0.0001429, whisper_loss=0.09008, over 1531132.84 frames. ], batch size: 89, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:03:31,515 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 20 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-17 13:03:42,070 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.81 vs. limit=22.5 2024-08-17 13:03:48,828 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 19 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-17 13:03:53,251 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3321300.0, ans=0.1 2024-08-17 13:03:54,531 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3321300.0, ans=0.1 2024-08-17 13:04:04,183 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3321300.0, ans=0.04949747468305833 2024-08-17 13:04:16,923 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=3321400.0, ans=15.0 2024-08-17 13:04:20,207 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 24 from LS+wenet, 33 from Vox, 37 fro AS 2024-08-17 13:04:24,703 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 150, loss[loss=0.08431, beats_loss=0.01184, ecapa_loss=0.0001558, whisper_loss=0.07091, over 22447.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01072, ecapa_loss=0.0001443, whisper_loss=0.08868, over 2069651.41 frames. ], batch size: 96, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:04:24,853 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-17 13:04:40,285 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-17 13:04:51,186 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.64 vs. limit=15.0 2024-08-17 13:04:53,394 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-17 13:05:17,985 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.722e+01 2.279e+01 2.554e+01 2.913e+01 4.090e+01, threshold=5.109e+01, percent-clipped=0.0 2024-08-17 13:05:30,846 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-17 13:05:34,012 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3321900.0, ans=0.0 2024-08-17 13:05:38,141 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 200, loss[loss=0.1054, beats_loss=0.01074, ecapa_loss=0.0001404, whisper_loss=0.0933, over 23761.00 frames. ], tot_loss[loss=0.09994, beats_loss=0.01067, ecapa_loss=0.0001443, whisper_loss=0.08782, over 2452639.13 frames. ], batch size: 90, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:05:48,767 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3322000.0, ans=0.125 2024-08-17 13:05:54,461 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-17 13:05:57,258 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3322100.0, ans=0.1 2024-08-17 13:06:08,594 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3322200.0, ans=0.125 2024-08-17 13:06:25,125 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.21 vs. limit=6.0 2024-08-17 13:06:33,194 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-17 13:06:39,445 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 18 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-17 13:06:41,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3322400.0, ans=0.125 2024-08-17 13:06:50,755 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 250, loss[loss=0.09305, beats_loss=0.01226, ecapa_loss=0.0001432, whisper_loss=0.07936, over 21589.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01055, ecapa_loss=0.0001456, whisper_loss=0.0898, over 2786903.89 frames. ], batch size: 87, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:06:52,321 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3322500.0, ans=0.125 2024-08-17 13:07:19,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3322700.0, ans=0.125 2024-08-17 13:07:22,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3322700.0, ans=0.125 2024-08-17 13:07:22,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3322700.0, ans=0.1 2024-08-17 13:07:31,770 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-17 13:07:40,462 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.294e+01 2.563e+01 2.958e+01 9.042e+01, threshold=5.127e+01, percent-clipped=1.0 2024-08-17 13:07:41,406 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=3322800.0, ans=10.0 2024-08-17 13:07:57,972 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3323000.0, ans=0.125 2024-08-17 13:07:58,314 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.99 vs. limit=12.0 2024-08-17 13:07:58,768 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 300, loss[loss=0.1225, beats_loss=0.008698, ecapa_loss=0.0001389, whisper_loss=0.1124, over 22698.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01061, ecapa_loss=0.0001487, whisper_loss=0.08933, over 3022398.47 frames. ], batch size: 84, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:08:08,095 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3323000.0, ans=0.0 2024-08-17 13:08:09,072 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-17 13:08:13,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3323100.0, ans=0.125 2024-08-17 13:08:20,366 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3323100.0, ans=0.125 2024-08-17 13:08:44,962 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3323300.0, ans=0.125 2024-08-17 13:08:46,255 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3323300.0, ans=0.95 2024-08-17 13:08:48,076 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=15.0 2024-08-17 13:08:55,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3323400.0, ans=0.125 2024-08-17 13:08:55,669 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3323400.0, ans=0.0 2024-08-17 13:09:01,194 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.31 vs. limit=15.0 2024-08-17 13:09:07,456 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 350, loss[loss=0.1145, beats_loss=0.009887, ecapa_loss=0.0001196, whisper_loss=0.1035, over 22288.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0106, ecapa_loss=0.0001486, whisper_loss=0.08972, over 3211591.10 frames. ], batch size: 84, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:09:30,031 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3323600.0, ans=0.125 2024-08-17 13:09:44,142 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 21 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-17 13:09:46,877 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-17 13:09:56,211 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.647e+01 2.195e+01 2.374e+01 2.744e+01 6.242e+01, threshold=4.747e+01, percent-clipped=1.0 2024-08-17 13:10:03,274 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.29 vs. limit=6.0 2024-08-17 13:10:06,855 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 13 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-17 13:10:10,103 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.66 vs. limit=15.0 2024-08-17 13:10:15,325 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 400, loss[loss=0.104, beats_loss=0.01034, ecapa_loss=0.0001309, whisper_loss=0.09238, over 19180.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01057, ecapa_loss=0.0001477, whisper_loss=0.09029, over 3351582.79 frames. ], batch size: 73, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:10:23,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3324000.0, ans=0.0 2024-08-17 13:10:28,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3324100.0, ans=0.0 2024-08-17 13:10:46,304 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3324200.0, ans=0.125 2024-08-17 13:10:47,807 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.43 vs. limit=22.5 2024-08-17 13:10:55,966 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 13:10:59,186 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 13:11:22,440 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 450, loss[loss=0.105, beats_loss=0.01066, ecapa_loss=0.0001217, whisper_loss=0.09314, over 22744.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01058, ecapa_loss=0.0001474, whisper_loss=0.0905, over 3490106.29 frames. ], batch size: 87, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:11:26,458 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.65 vs. limit=22.5 2024-08-17 13:11:36,813 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3324600.0, ans=0.1 2024-08-17 13:11:40,645 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3324600.0, ans=0.2 2024-08-17 13:11:43,319 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3324600.0, ans=0.0 2024-08-17 13:11:46,759 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 14 from LS+wenet, 26 from Vox, 19 fro AS 2024-08-17 13:11:57,077 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.08 vs. limit=22.5 2024-08-17 13:12:10,456 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+01 2.395e+01 2.662e+01 2.983e+01 2.736e+02, threshold=5.325e+01, percent-clipped=1.0 2024-08-17 13:12:23,155 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-17 13:12:30,306 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 500, loss[loss=0.1039, beats_loss=0.009981, ecapa_loss=0.0001738, whisper_loss=0.09221, over 17916.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01066, ecapa_loss=0.0001463, whisper_loss=0.09018, over 3605287.33 frames. ], batch size: 75, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:12:42,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3325000.0, ans=15.0 2024-08-17 13:13:01,308 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3325200.0, ans=0.125 2024-08-17 13:13:07,529 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-17 13:13:08,115 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.04 vs. limit=15.0 2024-08-17 13:13:38,884 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 550, loss[loss=0.09417, beats_loss=0.01158, ecapa_loss=0.0001616, whisper_loss=0.08097, over 19763.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01061, ecapa_loss=0.0001477, whisper_loss=0.09049, over 3662387.12 frames. ], batch size: 84, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:13:51,797 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.533e-03 2024-08-17 13:13:58,806 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3325600.0, ans=0.125 2024-08-17 13:14:26,191 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3325800.0, ans=0.0 2024-08-17 13:14:34,072 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.469e+01 2.694e+01 3.140e+01 4.946e+01, threshold=5.388e+01, percent-clipped=0.0 2024-08-17 13:14:35,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3325800.0, ans=0.125 2024-08-17 13:14:39,955 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-17 13:14:40,121 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3325900.0, ans=0.2 2024-08-17 13:14:45,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3325900.0, ans=0.125 2024-08-17 13:14:49,880 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 19 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-17 13:14:53,919 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 600, loss[loss=0.1082, beats_loss=0.00929, ecapa_loss=0.0001519, whisper_loss=0.09736, over 15848.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01047, ecapa_loss=0.0001481, whisper_loss=0.09234, over 3722769.08 frames. ], batch size: 61, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:15:12,066 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 30 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-17 13:15:17,034 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-17 13:15:19,350 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-17 13:15:28,613 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-17 13:15:29,182 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.250e+00 2024-08-17 13:15:49,146 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3326300.0, ans=0.125 2024-08-17 13:15:54,851 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 24 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-17 13:16:07,371 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 650, loss[loss=0.08765, beats_loss=0.01132, ecapa_loss=0.0001868, whisper_loss=0.07446, over 20161.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01039, ecapa_loss=0.0001494, whisper_loss=0.09254, over 3770797.21 frames. ], batch size: 88, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:16:28,937 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3326600.0, ans=0.2 2024-08-17 13:16:42,005 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 13 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-17 13:16:43,203 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 26 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-17 13:17:00,526 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.749e+01 2.302e+01 2.585e+01 2.989e+01 8.771e+01, threshold=5.171e+01, percent-clipped=2.0 2024-08-17 13:17:20,462 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 700, loss[loss=0.09863, beats_loss=0.01371, ecapa_loss=0.0001206, whisper_loss=0.08371, over 22807.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01045, ecapa_loss=0.0001483, whisper_loss=0.09154, over 3748054.79 frames. ], batch size: 92, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:17:33,056 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3327000.0, ans=0.125 2024-08-17 13:17:34,806 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.203e-03 2024-08-17 13:17:43,454 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.52 vs. limit=15.0 2024-08-17 13:18:09,785 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3327300.0, ans=0.125 2024-08-17 13:18:09,793 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3327300.0, ans=0.2 2024-08-17 13:18:31,707 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 750, loss[loss=0.1027, beats_loss=0.01263, ecapa_loss=0.0001389, whisper_loss=0.08871, over 17536.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01056, ecapa_loss=0.0001485, whisper_loss=0.09077, over 3776657.24 frames. ], batch size: 71, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:19:02,239 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3327700.0, ans=0.0 2024-08-17 13:19:24,261 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.841e+01 2.330e+01 2.505e+01 2.763e+01 4.149e+01, threshold=5.011e+01, percent-clipped=0.0 2024-08-17 13:19:39,631 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3327900.0, ans=0.125 2024-08-17 13:19:44,777 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 800, loss[loss=0.09509, beats_loss=0.009399, ecapa_loss=0.0001662, whisper_loss=0.08403, over 14726.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01055, ecapa_loss=0.0001501, whisper_loss=0.09141, over 3821054.54 frames. ], batch size: 61, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:20:07,985 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=3328100.0, ans=15.0 2024-08-17 13:20:14,076 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 24 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-17 13:20:15,366 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 28 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-17 13:20:28,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3328300.0, ans=0.0 2024-08-17 13:20:30,927 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 19 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-17 13:20:31,666 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2024-08-17 13:20:49,194 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-17 13:20:52,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3328400.0, ans=0.125 2024-08-17 13:20:56,729 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 850, loss[loss=0.09683, beats_loss=0.01194, ecapa_loss=0.0001554, whisper_loss=0.08333, over 18822.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01056, ecapa_loss=0.0001496, whisper_loss=0.09113, over 3827465.71 frames. ], batch size: 79, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:21:00,420 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3328500.0, ans=0.0 2024-08-17 13:21:23,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3328600.0, ans=0.125 2024-08-17 13:21:28,310 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3328700.0, ans=0.125 2024-08-17 13:21:30,880 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3328700.0, ans=0.5 2024-08-17 13:21:36,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3328700.0, ans=0.125 2024-08-17 13:21:45,339 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3328800.0, ans=0.0 2024-08-17 13:21:49,446 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.756e+01 2.262e+01 2.529e+01 2.778e+01 3.755e+01, threshold=5.059e+01, percent-clipped=0.0 2024-08-17 13:21:56,746 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 17 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-17 13:22:08,931 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 900, loss[loss=0.1099, beats_loss=0.01116, ecapa_loss=0.0001026, whisper_loss=0.0977, over 16768.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01054, ecapa_loss=0.00015, whisper_loss=0.09085, over 3839272.04 frames. ], batch size: 64, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:22:09,645 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 32 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-17 13:22:16,241 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3329000.0, ans=0.125 2024-08-17 13:22:19,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3329000.0, ans=0.1 2024-08-17 13:22:33,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3329100.0, ans=0.2 2024-08-17 13:22:46,929 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3329200.0, ans=0.05 2024-08-17 13:22:49,980 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3329200.0, ans=0.1 2024-08-17 13:22:56,401 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.52 vs. limit=15.0 2024-08-17 13:22:57,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3329300.0, ans=0.125 2024-08-17 13:23:02,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3329300.0, ans=0.125 2024-08-17 13:23:02,182 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3329300.0, ans=0.125 2024-08-17 13:23:03,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3329300.0, ans=0.125 2024-08-17 13:23:08,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3329400.0, ans=0.2 2024-08-17 13:23:13,805 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.37 vs. limit=15.0 2024-08-17 13:23:15,515 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 13:23:20,464 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 950, loss[loss=0.06701, beats_loss=0.01526, ecapa_loss=0.0001077, whisper_loss=0.05068, over 13111.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01061, ecapa_loss=0.0001485, whisper_loss=0.09027, over 3839206.19 frames. ], batch size: 54, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:23:31,338 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3329500.0, ans=0.0 2024-08-17 13:23:45,172 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-17 13:23:47,838 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-17 13:23:54,516 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3329700.0, ans=0.07 2024-08-17 13:24:12,388 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.272e+01 2.519e+01 2.767e+01 4.304e+01, threshold=5.037e+01, percent-clipped=0.0 2024-08-17 13:24:23,069 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 30 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-17 13:24:23,299 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3329900.0, ans=0.1 2024-08-17 13:24:28,440 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3329900.0, ans=0.1 2024-08-17 13:24:32,111 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 1000, loss[loss=0.09, beats_loss=0.01253, ecapa_loss=0.0001317, whisper_loss=0.07614, over 16951.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01055, ecapa_loss=0.0001489, whisper_loss=0.09082, over 3877180.80 frames. ], batch size: 69, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:25:20,105 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-17 13:25:29,563 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3330200.0, ans=0.2 2024-08-17 13:25:29,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3330200.0, ans=0.1 2024-08-17 13:25:45,570 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.97 vs. limit=22.5 2024-08-17 13:25:47,174 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.54 vs. limit=15.0 2024-08-17 13:26:07,201 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 1050, loss[loss=0.1052, beats_loss=0.01054, ecapa_loss=0.000128, whisper_loss=0.09336, over 16443.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01059, ecapa_loss=0.0001482, whisper_loss=0.09039, over 3857359.45 frames. ], batch size: 62, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:26:11,137 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-17 13:26:15,230 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 23 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-17 13:26:17,823 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 16 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-17 13:26:23,951 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 30 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-17 13:26:31,628 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.17 vs. limit=15.0 2024-08-17 13:26:41,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3330700.0, ans=0.1 2024-08-17 13:26:41,724 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3330700.0, ans=0.125 2024-08-17 13:26:48,620 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 25 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-17 13:26:54,589 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 39 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-17 13:26:58,180 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3330800.0, ans=0.05 2024-08-17 13:26:58,824 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.292e+01 2.531e+01 2.835e+01 4.393e+01, threshold=5.062e+01, percent-clipped=0.0 2024-08-17 13:27:18,568 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 1100, loss[loss=0.0944, beats_loss=0.01044, ecapa_loss=0.0001607, whisper_loss=0.08235, over 19898.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01051, ecapa_loss=0.0001491, whisper_loss=0.09091, over 3853828.02 frames. ], batch size: 81, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:27:43,640 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3331100.0, ans=0.125 2024-08-17 13:27:44,753 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-17 13:27:52,203 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 11 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-17 13:28:03,423 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3331300.0, ans=0.125 2024-08-17 13:28:08,786 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 26 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-17 13:28:26,766 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-17 13:28:32,231 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 1150, loss[loss=0.0914, beats_loss=0.01227, ecapa_loss=0.0001484, whisper_loss=0.07765, over 22231.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01055, ecapa_loss=0.0001487, whisper_loss=0.09094, over 3863047.16 frames. ], batch size: 90, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:28:32,716 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 26 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-17 13:28:37,215 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-17 13:28:42,963 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 21 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-17 13:28:54,945 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-17 13:28:59,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3331600.0, ans=0.125 2024-08-17 13:28:59,908 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 22 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-17 13:29:05,034 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3331700.0, ans=0.125 2024-08-17 13:29:09,359 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.490e+01 2024-08-17 13:29:10,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3331700.0, ans=0.125 2024-08-17 13:29:14,608 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 16 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-17 13:29:16,302 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3331800.0, ans=0.2 2024-08-17 13:29:16,639 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.38 vs. limit=15.0 2024-08-17 13:29:26,566 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.418e+01 2.645e+01 2.995e+01 9.791e+01, threshold=5.290e+01, percent-clipped=1.0 2024-08-17 13:29:28,937 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.84 vs. limit=15.0 2024-08-17 13:29:46,893 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 1200, loss[loss=0.104, beats_loss=0.009769, ecapa_loss=0.0001567, whisper_loss=0.0927, over 21957.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01053, ecapa_loss=0.0001505, whisper_loss=0.0908, over 3863320.58 frames. ], batch size: 91, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:30:22,872 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-17 13:30:44,135 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3332300.0, ans=0.0 2024-08-17 13:30:44,951 WARNING [optim.py:496] (2/4) Scaling gradients by 0.08344534784555435, model_norm_threshold=52.90373992919922 2024-08-17 13:30:45,117 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.1.conv_module2.depthwise_conv.causal_conv.weight with proportion 0.21, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.267e+04, grad_sumsq=4.529e+05, orig_rms_sq=1.825e-01 2024-08-17 13:30:46,485 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 33 from Vox, 32 fro AS 2024-08-17 13:30:51,262 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.13 vs. limit=15.0 2024-08-17 13:30:56,263 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.10 vs. limit=15.0 2024-08-17 13:31:01,514 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 1250, loss[loss=0.1065, beats_loss=0.00962, ecapa_loss=0.000188, whisper_loss=0.09502, over 22091.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01046, ecapa_loss=0.0001514, whisper_loss=0.09035, over 3837657.05 frames. ], batch size: 91, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:31:02,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3332500.0, ans=0.04949747468305833 2024-08-17 13:31:07,503 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3332500.0, ans=0.125 2024-08-17 13:31:15,383 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 14 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-17 13:31:17,543 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3332600.0, ans=0.0 2024-08-17 13:31:19,116 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.91 vs. limit=15.0 2024-08-17 13:31:22,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3332600.0, ans=0.025 2024-08-17 13:31:26,314 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.81 vs. limit=12.0 2024-08-17 13:31:27,457 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3332600.0, ans=0.125 2024-08-17 13:31:31,731 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3332700.0, ans=0.0 2024-08-17 13:31:37,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=3332700.0, ans=0.05 2024-08-17 13:31:41,177 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-17 13:31:44,520 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 25 from Vox, 20 fro AS 2024-08-17 13:31:50,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3332800.0, ans=0.2 2024-08-17 13:31:54,690 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.269e+01 2.597e+01 2.986e+01 6.340e+02, threshold=5.193e+01, percent-clipped=3.0 2024-08-17 13:31:56,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3332800.0, ans=0.0 2024-08-17 13:31:59,312 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 20 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-17 13:32:08,621 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3332900.0, ans=0.0 2024-08-17 13:32:15,205 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 1300, loss[loss=0.1273, beats_loss=0.008516, ecapa_loss=0.0001321, whisper_loss=0.1175, over 19420.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01052, ecapa_loss=0.0001508, whisper_loss=0.09006, over 3850693.08 frames. ], batch size: 72, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:32:17,902 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-17 13:32:28,786 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3333000.0, ans=0.125 2024-08-17 13:32:34,425 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-17 13:32:52,650 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 19 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-17 13:33:00,970 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-17 13:33:16,386 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.31 vs. limit=15.0 2024-08-17 13:33:33,522 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3333500.0, ans=0.0 2024-08-17 13:33:34,373 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 1350, loss[loss=0.08382, beats_loss=0.01131, ecapa_loss=0.0001327, whisper_loss=0.07118, over 17625.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0106, ecapa_loss=0.0001502, whisper_loss=0.08988, over 3870669.21 frames. ], batch size: 68, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:33:36,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3333500.0, ans=0.1 2024-08-17 13:33:52,495 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 31 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-17 13:34:09,338 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3333700.0, ans=0.125 2024-08-17 13:34:12,725 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 13:34:24,725 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.323e+01 2.578e+01 2.886e+01 4.482e+01, threshold=5.156e+01, percent-clipped=0.0 2024-08-17 13:34:45,545 WARNING [optim.py:496] (2/4) Scaling gradients by 0.028498075902462006, model_norm_threshold=51.5612907409668 2024-08-17 13:34:45,711 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.23, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.470e+05, grad_sumsq=7.470e+05, orig_rms_sq=1.000e+00 2024-08-17 13:34:45,743 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 1400, loss[loss=0.08552, beats_loss=0.01039, ecapa_loss=0.0001808, whisper_loss=0.07332, over 17773.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01055, ecapa_loss=0.0001502, whisper_loss=0.08955, over 3830333.33 frames. ], batch size: 76, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:34:49,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3334000.0, ans=0.0 2024-08-17 13:34:49,918 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3334000.0, ans=0.125 2024-08-17 13:35:01,096 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-17 13:35:06,878 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 25 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-17 13:35:09,309 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2024-08-17 13:35:16,319 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3334200.0, ans=0.125 2024-08-17 13:35:23,211 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-17 13:35:24,984 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.64 vs. limit=10.0 2024-08-17 13:35:26,035 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-17 13:35:45,875 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 17 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-17 13:35:57,240 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 1450, loss[loss=0.08902, beats_loss=0.01051, ecapa_loss=0.0002033, whisper_loss=0.07648, over 17243.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01055, ecapa_loss=0.0001512, whisper_loss=0.09026, over 3866339.20 frames. ], batch size: 74, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:36:17,026 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3334600.0, ans=0.5 2024-08-17 13:36:31,101 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3334700.0, ans=0.125 2024-08-17 13:36:32,595 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 23 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-17 13:36:33,922 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-17 13:36:35,318 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-17 13:36:37,652 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.25 vs. limit=15.0 2024-08-17 13:36:40,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3334800.0, ans=0.125 2024-08-17 13:36:49,419 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.311e+01 2.535e+01 2.723e+01 1.809e+03, threshold=5.071e+01, percent-clipped=1.0 2024-08-17 13:36:49,775 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-17 13:36:56,024 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.11 vs. limit=15.0 2024-08-17 13:37:10,440 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 1500, loss[loss=0.1162, beats_loss=0.01059, ecapa_loss=0.0001585, whisper_loss=0.104, over 22597.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01062, ecapa_loss=0.0001509, whisper_loss=0.0905, over 3882324.61 frames. ], batch size: 94, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:37:11,726 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.22 vs. limit=15.0 2024-08-17 13:37:45,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3335200.0, ans=0.2 2024-08-17 13:37:51,369 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3335200.0, ans=0.2 2024-08-17 13:38:26,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3335500.0, ans=0.0 2024-08-17 13:38:26,845 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 1550, loss[loss=0.09462, beats_loss=0.01133, ecapa_loss=0.000154, whisper_loss=0.08176, over 17087.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01065, ecapa_loss=0.0001497, whisper_loss=0.09085, over 3869154.25 frames. ], batch size: 66, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:38:36,099 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3335500.0, ans=0.035 2024-08-17 13:38:36,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3335500.0, ans=0.1 2024-08-17 13:38:51,250 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3335600.0, ans=0.125 2024-08-17 13:38:51,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3335600.0, ans=0.125 2024-08-17 13:39:00,533 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 18 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-17 13:39:20,480 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.739e+01 2.357e+01 2.589e+01 2.903e+01 1.401e+02, threshold=5.177e+01, percent-clipped=4.0 2024-08-17 13:39:23,362 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 30 from LS+wenet, 31 from Vox, 20 fro AS 2024-08-17 13:39:35,006 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-17 13:39:40,440 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 1600, loss[loss=0.09579, beats_loss=0.01159, ecapa_loss=0.0001534, whisper_loss=0.08267, over 20330.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01058, ecapa_loss=0.0001498, whisper_loss=0.09048, over 3821401.67 frames. ], batch size: 81, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:39:42,626 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3336000.0, ans=0.125 2024-08-17 13:39:47,001 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 28 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-17 13:39:52,848 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-17 13:39:56,498 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3336100.0, ans=0.125 2024-08-17 13:40:09,160 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3336100.0, ans=0.0 2024-08-17 13:40:23,490 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3336200.0, ans=0.0 2024-08-17 13:40:28,103 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.79 vs. limit=12.0 2024-08-17 13:40:39,628 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-17 13:40:47,186 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-17 13:40:55,410 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 1650, loss[loss=0.1018, beats_loss=0.009519, ecapa_loss=0.0001673, whisper_loss=0.09057, over 18697.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01057, ecapa_loss=0.00015, whisper_loss=0.09087, over 3823610.16 frames. ], batch size: 76, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:40:59,559 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3336500.0, ans=0.0 2024-08-17 13:41:01,277 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3336500.0, ans=0.125 2024-08-17 13:41:06,524 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3336500.0, ans=0.05 2024-08-17 13:41:12,887 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3336600.0, ans=0.125 2024-08-17 13:41:27,603 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 36 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-17 13:41:30,439 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.58 vs. limit=6.0 2024-08-17 13:41:31,913 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3336700.0, ans=0.0 2024-08-17 13:41:50,390 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.845e+01 2.288e+01 2.508e+01 2.798e+01 4.258e+01, threshold=5.017e+01, percent-clipped=0.0 2024-08-17 13:42:11,745 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 1700, loss[loss=0.09112, beats_loss=0.01074, ecapa_loss=0.0001478, whisper_loss=0.0789, over 16804.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01065, ecapa_loss=0.0001491, whisper_loss=0.09101, over 3851787.86 frames. ], batch size: 68, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:42:13,419 WARNING [optim.py:496] (2/4) Scaling gradients by 0.06336042284965515, model_norm_threshold=50.16535949707031 2024-08-17 13:42:13,582 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.09, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.458e+04, grad_sumsq=5.458e+04, orig_rms_sq=1.000e+00 2024-08-17 13:42:35,785 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 20 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-17 13:43:10,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3337400.0, ans=0.125 2024-08-17 13:43:26,296 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 1750, loss[loss=0.09941, beats_loss=0.008331, ecapa_loss=0.0001913, whisper_loss=0.08916, over 16668.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01067, ecapa_loss=0.0001489, whisper_loss=0.09112, over 3845711.53 frames. ], batch size: 66, lr: 2.67e-03, grad_scale: 1.152921504606847e+18 2024-08-17 13:43:26,466 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 19 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-17 13:43:26,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3337500.0, ans=0.125 2024-08-17 13:43:51,321 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 9 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-17 13:44:22,181 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+01 2.382e+01 2.704e+01 3.004e+01 7.917e+02, threshold=5.409e+01, percent-clipped=2.0 2024-08-17 13:44:24,972 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3337800.0, ans=0.05 2024-08-17 13:44:27,701 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 33 from LS+wenet, 10 from Vox, 40 fro AS 2024-08-17 13:44:42,666 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.01 vs. limit=15.0 2024-08-17 13:44:43,191 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 1800, loss[loss=0.1104, beats_loss=0.009796, ecapa_loss=0.0001406, whisper_loss=0.09917, over 15931.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01062, ecapa_loss=0.0001476, whisper_loss=0.09175, over 3866383.46 frames. ], batch size: 63, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:44:47,510 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-17 13:44:50,641 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3338000.0, ans=0.1 2024-08-17 13:45:29,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3338300.0, ans=0.125 2024-08-17 13:45:35,065 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.31 vs. limit=10.0 2024-08-17 13:45:37,131 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.87 vs. limit=22.5 2024-08-17 13:45:38,091 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 32 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-17 13:45:43,264 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.99 vs. limit=12.0 2024-08-17 13:45:54,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3338400.0, ans=0.125 2024-08-17 13:45:58,097 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 1850, loss[loss=0.1147, beats_loss=0.008178, ecapa_loss=0.0001714, whisper_loss=0.1048, over 18662.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01059, ecapa_loss=0.0001475, whisper_loss=0.09152, over 3865888.78 frames. ], batch size: 75, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:45:58,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3338500.0, ans=0.125 2024-08-17 13:46:23,653 WARNING [optim.py:496] (2/4) Scaling gradients by 0.028131451457738876, model_norm_threshold=54.08749008178711 2024-08-17 13:46:23,818 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.28, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.044e+06, grad_sumsq=1.044e+06, orig_rms_sq=1.000e+00 2024-08-17 13:46:30,656 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.02 vs. limit=15.0 2024-08-17 13:46:31,335 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-17 13:46:53,322 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.709e+01 2.302e+01 2.591e+01 3.149e+01 1.923e+03, threshold=5.181e+01, percent-clipped=2.0 2024-08-17 13:46:55,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3338800.0, ans=0.1 2024-08-17 13:47:11,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3338900.0, ans=0.125 2024-08-17 13:47:13,464 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 1900, loss[loss=0.1151, beats_loss=0.008379, ecapa_loss=0.0001754, whisper_loss=0.1049, over 15593.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01056, ecapa_loss=0.0001477, whisper_loss=0.09198, over 3862447.12 frames. ], batch size: 60, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:47:58,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3339200.0, ans=0.125 2024-08-17 13:48:05,357 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 34 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-17 13:48:21,849 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3339400.0, ans=0.0 2024-08-17 13:48:23,642 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.98 vs. limit=22.5 2024-08-17 13:48:26,497 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 39 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-17 13:48:27,784 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-17 13:48:30,766 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-17 13:48:32,808 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3339400.0, ans=0.0 2024-08-17 13:48:35,085 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 1950, loss[loss=0.1373, beats_loss=0.008068, ecapa_loss=0.0001689, whisper_loss=0.1275, over 22541.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01046, ecapa_loss=0.0001493, whisper_loss=0.09269, over 3902648.11 frames. ], batch size: 88, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:48:38,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3339500.0, ans=0.125 2024-08-17 13:48:46,250 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 28 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-17 13:48:59,224 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2024-08-17 13:49:03,676 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3339600.0, ans=0.1 2024-08-17 13:49:33,214 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-17 13:49:36,194 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+01 2.407e+01 2.684e+01 3.018e+01 1.325e+02, threshold=5.369e+01, percent-clipped=1.0 2024-08-17 13:49:46,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3339900.0, ans=0.0 2024-08-17 13:49:56,224 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 2000, loss[loss=0.08673, beats_loss=0.01026, ecapa_loss=0.0001432, whisper_loss=0.07503, over 23301.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01044, ecapa_loss=0.0001483, whisper_loss=0.09196, over 3876164.87 frames. ], batch size: 94, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:50:12,605 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.29 vs. limit=15.0 2024-08-17 13:51:03,206 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.94 vs. limit=15.0 2024-08-17 13:51:16,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3340500.0, ans=0.125 2024-08-17 13:51:17,314 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 2050, loss[loss=0.08708, beats_loss=0.01129, ecapa_loss=0.000149, whisper_loss=0.0743, over 18053.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01049, ecapa_loss=0.0001473, whisper_loss=0.09135, over 3876770.00 frames. ], batch size: 74, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:52:12,987 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3340700.0, ans=0.125 2024-08-17 13:52:20,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3340700.0, ans=0.125 2024-08-17 13:52:33,379 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.14 vs. limit=22.5 2024-08-17 13:52:42,253 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.326e+01 2.537e+01 2.750e+01 3.562e+02, threshold=5.074e+01, percent-clipped=2.0 2024-08-17 13:53:03,928 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 2100, loss[loss=0.07953, beats_loss=0.009079, ecapa_loss=0.000164, whisper_loss=0.06881, over 15758.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01048, ecapa_loss=0.0001486, whisper_loss=0.09108, over 3882595.40 frames. ], batch size: 63, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:53:04,517 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 28 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-17 13:53:12,639 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3341000.0, ans=0.0 2024-08-17 13:53:19,705 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 22 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-17 13:53:49,356 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3341300.0, ans=0.0 2024-08-17 13:53:56,191 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.71 vs. limit=10.0 2024-08-17 13:53:59,067 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3341300.0, ans=0.1 2024-08-17 13:54:21,252 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 2150, loss[loss=0.0961, beats_loss=0.01119, ecapa_loss=0.0001213, whisper_loss=0.08369, over 16373.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01056, ecapa_loss=0.0001484, whisper_loss=0.09095, over 3897408.02 frames. ], batch size: 64, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:54:24,293 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-17 13:54:27,683 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3341500.0, ans=0.1 2024-08-17 13:54:42,138 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 29 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-17 13:54:46,493 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 22 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-17 13:55:12,495 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3341800.0, ans=0.125 2024-08-17 13:55:16,218 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.587e+01 2.285e+01 2.479e+01 2.772e+01 4.505e+01, threshold=4.957e+01, percent-clipped=0.0 2024-08-17 13:55:26,504 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 23 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-17 13:55:35,072 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 2200, loss[loss=0.09488, beats_loss=0.01052, ecapa_loss=0.0001491, whisper_loss=0.08288, over 21200.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01054, ecapa_loss=0.0001494, whisper_loss=0.09143, over 3900832.90 frames. ], batch size: 88, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:55:44,026 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-17 13:55:44,395 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3342000.0, ans=0.0 2024-08-17 13:55:49,011 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 25 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-17 13:55:54,519 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 21 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-17 13:56:06,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3342200.0, ans=0.0 2024-08-17 13:56:19,631 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3342300.0, ans=0.125 2024-08-17 13:56:31,017 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 17 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-17 13:56:37,163 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3342400.0, ans=0.125 2024-08-17 13:56:37,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3342400.0, ans=0.125 2024-08-17 13:56:47,739 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 2250, loss[loss=0.08681, beats_loss=0.01099, ecapa_loss=0.0001662, whisper_loss=0.07416, over 19534.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01059, ecapa_loss=0.0001496, whisper_loss=0.09094, over 3895502.27 frames. ], batch size: 76, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:56:57,527 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3342500.0, ans=0.09899494936611666 2024-08-17 13:57:00,415 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3342600.0, ans=0.5 2024-08-17 13:57:04,725 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.88 vs. limit=15.0 2024-08-17 13:57:07,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3342600.0, ans=0.0 2024-08-17 13:57:32,049 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3342800.0, ans=0.125 2024-08-17 13:57:38,161 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.806e+01 2.294e+01 2.494e+01 2.818e+01 3.738e+02, threshold=4.988e+01, percent-clipped=1.0 2024-08-17 13:57:44,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3342900.0, ans=0.125 2024-08-17 13:57:57,077 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 2300, loss[loss=0.09292, beats_loss=0.01018, ecapa_loss=0.0001586, whisper_loss=0.08116, over 14825.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01058, ecapa_loss=0.0001496, whisper_loss=0.09055, over 3892542.77 frames. ], batch size: 60, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:58:04,526 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3343000.0, ans=0.1 2024-08-17 13:58:07,460 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3343000.0, ans=0.015 2024-08-17 13:58:07,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3343000.0, ans=0.125 2024-08-17 13:58:14,045 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.39 vs. limit=22.5 2024-08-17 13:58:15,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3343100.0, ans=0.2 2024-08-17 13:58:21,967 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.61 vs. limit=15.0 2024-08-17 13:58:30,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3343200.0, ans=0.0 2024-08-17 13:58:31,194 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 13 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-17 13:58:33,884 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.05 vs. limit=15.0 2024-08-17 13:59:04,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3343400.0, ans=0.2 2024-08-17 13:59:08,548 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 2350, loss[loss=0.07834, beats_loss=0.01268, ecapa_loss=0.0001496, whisper_loss=0.06416, over 16272.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01055, ecapa_loss=0.00015, whisper_loss=0.09115, over 3917358.83 frames. ], batch size: 69, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:59:12,279 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3343500.0, ans=0.0 2024-08-17 13:59:29,098 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3343600.0, ans=0.125 2024-08-17 13:59:41,207 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3343700.0, ans=0.125 2024-08-17 13:59:53,754 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 22 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-17 14:00:00,891 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.280e+01 2.527e+01 2.757e+01 5.363e+01, threshold=5.054e+01, percent-clipped=1.0 2024-08-17 14:00:07,904 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.91 vs. limit=22.5 2024-08-17 14:00:14,046 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.23 vs. limit=22.5 2024-08-17 14:00:19,286 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 2400, loss[loss=0.09048, beats_loss=0.012, ecapa_loss=0.0001147, whisper_loss=0.07734, over 16693.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01055, ecapa_loss=0.0001493, whisper_loss=0.09069, over 3895172.74 frames. ], batch size: 64, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:00:29,429 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 19 from LS+wenet, 27 from Vox, 45 fro AS 2024-08-17 14:00:29,689 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3344000.0, ans=0.2 2024-08-17 14:00:34,829 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3344100.0, ans=0.0 2024-08-17 14:00:36,979 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 14 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-17 14:00:45,748 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3344200.0, ans=0.1 2024-08-17 14:01:14,906 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3344400.0, ans=0.0 2024-08-17 14:01:21,852 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.45 vs. limit=10.0 2024-08-17 14:01:25,538 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 2450, loss[loss=0.1052, beats_loss=0.00971, ecapa_loss=0.0001461, whisper_loss=0.09405, over 16966.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01058, ecapa_loss=0.000149, whisper_loss=0.09028, over 3888656.54 frames. ], batch size: 66, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:01:25,700 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-17 14:01:36,432 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3344500.0, ans=0.0 2024-08-17 14:01:37,677 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3344600.0, ans=0.0 2024-08-17 14:02:16,797 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.328e+01 2.680e+01 2.976e+01 4.831e+01, threshold=5.360e+01, percent-clipped=0.0 2024-08-17 14:02:34,772 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.53 vs. limit=15.0 2024-08-17 14:02:35,421 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 2500, loss[loss=0.1023, beats_loss=0.01133, ecapa_loss=0.0001613, whisper_loss=0.08931, over 21777.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01054, ecapa_loss=0.0001496, whisper_loss=0.09104, over 3892145.90 frames. ], batch size: 89, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:02:37,579 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.70 vs. limit=22.5 2024-08-17 14:02:43,508 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.67 vs. limit=15.0 2024-08-17 14:02:50,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3345100.0, ans=0.125 2024-08-17 14:02:51,390 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-17 14:02:55,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3345100.0, ans=0.125 2024-08-17 14:03:15,316 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-17 14:03:31,461 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-17 14:03:31,704 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3345400.0, ans=0.125 2024-08-17 14:03:43,019 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 2550, loss[loss=0.1039, beats_loss=0.01088, ecapa_loss=0.0001434, whisper_loss=0.09156, over 15762.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0105, ecapa_loss=0.0001497, whisper_loss=0.09144, over 3900981.90 frames. ], batch size: 63, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:03:48,672 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3345500.0, ans=0.125 2024-08-17 14:03:51,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3345500.0, ans=0.125 2024-08-17 14:03:59,687 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-17 14:04:00,879 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 29 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-17 14:04:04,234 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 20 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-17 14:04:18,447 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-17 14:04:35,531 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.918e+01 2.361e+01 2.576e+01 2.935e+01 4.539e+01, threshold=5.152e+01, percent-clipped=0.0 2024-08-17 14:04:54,120 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3346000.0, ans=0.09899494936611666 2024-08-17 14:04:54,887 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 2600, loss[loss=0.07401, beats_loss=0.01336, ecapa_loss=0.0001505, whisper_loss=0.05914, over 21424.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01065, ecapa_loss=0.0001487, whisper_loss=0.0906, over 3908880.26 frames. ], batch size: 91, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:05:01,775 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-17 14:05:22,920 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3346200.0, ans=0.125 2024-08-17 14:05:26,314 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 20 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-17 14:05:29,108 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3346200.0, ans=0.125 2024-08-17 14:05:29,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3346200.0, ans=0.5 2024-08-17 14:05:44,302 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3346300.0, ans=0.125 2024-08-17 14:05:52,968 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 36 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-17 14:05:55,583 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3346400.0, ans=0.0 2024-08-17 14:05:59,867 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 14:06:02,625 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 2650, loss[loss=0.1001, beats_loss=0.01037, ecapa_loss=0.0001684, whisper_loss=0.08807, over 19976.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01062, ecapa_loss=0.0001494, whisper_loss=0.09107, over 3876503.04 frames. ], batch size: 81, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:06:05,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3346500.0, ans=0.1 2024-08-17 14:06:26,338 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-17 14:06:52,931 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.671e+01 2.232e+01 2.460e+01 2.794e+01 3.969e+01, threshold=4.921e+01, percent-clipped=0.0 2024-08-17 14:07:13,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3347000.0, ans=0.0 2024-08-17 14:07:13,879 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 2700, loss[loss=0.1099, beats_loss=0.01076, ecapa_loss=0.00015, whisper_loss=0.09764, over 15311.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01055, ecapa_loss=0.0001487, whisper_loss=0.09083, over 3856523.48 frames. ], batch size: 63, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:07:29,170 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3347100.0, ans=0.1 2024-08-17 14:07:30,702 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3347100.0, ans=0.0 2024-08-17 14:08:06,809 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-17 14:08:20,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3347400.0, ans=0.0 2024-08-17 14:08:30,358 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 2750, loss[loss=0.1173, beats_loss=0.009916, ecapa_loss=0.0001475, whisper_loss=0.1059, over 24072.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01057, ecapa_loss=0.0001475, whisper_loss=0.09106, over 3900882.79 frames. ], batch size: 94, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:08:32,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3347500.0, ans=0.05 2024-08-17 14:08:34,150 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.96 vs. limit=15.0 2024-08-17 14:08:39,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3347500.0, ans=0.0 2024-08-17 14:08:42,819 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.90 vs. limit=15.0 2024-08-17 14:08:59,241 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 31 from LS+wenet, 10 from Vox, 36 fro AS 2024-08-17 14:09:01,691 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 27 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-17 14:09:21,362 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3347800.0, ans=0.125 2024-08-17 14:09:23,890 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.299e+01 2.509e+01 2.759e+01 4.042e+01, threshold=5.018e+01, percent-clipped=0.0 2024-08-17 14:09:26,247 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=12.14 vs. limit=15.0 2024-08-17 14:09:39,903 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-17 14:09:42,981 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 2800, loss[loss=0.09404, beats_loss=0.01163, ecapa_loss=0.0001489, whisper_loss=0.08092, over 23289.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01063, ecapa_loss=0.0001478, whisper_loss=0.09041, over 3892772.64 frames. ], batch size: 98, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:10:04,424 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3348100.0, ans=0.125 2024-08-17 14:10:09,971 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3348100.0, ans=0.0 2024-08-17 14:10:17,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3348200.0, ans=0.0 2024-08-17 14:10:18,601 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3348200.0, ans=0.125 2024-08-17 14:10:21,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3348200.0, ans=0.125 2024-08-17 14:10:26,923 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 34 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-17 14:10:27,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3348300.0, ans=0.1 2024-08-17 14:10:28,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3348300.0, ans=0.125 2024-08-17 14:10:38,863 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3348300.0, ans=0.1 2024-08-17 14:10:58,378 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 2850, loss[loss=0.1088, beats_loss=0.009473, ecapa_loss=0.0001399, whisper_loss=0.09792, over 22471.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01059, ecapa_loss=0.0001486, whisper_loss=0.09114, over 3901517.30 frames. ], batch size: 89, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:11:05,952 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.34 vs. limit=15.0 2024-08-17 14:11:13,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3348600.0, ans=0.125 2024-08-17 14:11:23,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3348600.0, ans=0.2 2024-08-17 14:11:29,308 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.13 vs. limit=12.0 2024-08-17 14:11:38,740 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3348700.0, ans=0.2 2024-08-17 14:11:48,952 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-17 14:11:56,341 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.404e+01 2.584e+01 2.839e+01 1.572e+02, threshold=5.168e+01, percent-clipped=1.0 2024-08-17 14:11:57,103 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3348800.0, ans=0.125 2024-08-17 14:12:09,933 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.32 vs. limit=15.0 2024-08-17 14:12:17,620 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 2900, loss[loss=0.1033, beats_loss=0.01001, ecapa_loss=0.0001624, whisper_loss=0.09163, over 18070.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01063, ecapa_loss=0.0001487, whisper_loss=0.09048, over 3902470.88 frames. ], batch size: 73, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:12:30,829 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 15 from LS+wenet, 25 from Vox, 22 fro AS 2024-08-17 14:12:39,350 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 20 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-17 14:13:06,745 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 30 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-17 14:13:06,969 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3349300.0, ans=0.125 2024-08-17 14:13:06,989 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3349300.0, ans=0.025 2024-08-17 14:13:13,631 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-17 14:13:15,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3349400.0, ans=0.125 2024-08-17 14:13:26,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3349400.0, ans=0.0 2024-08-17 14:13:28,193 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 2950, loss[loss=0.08806, beats_loss=0.009768, ecapa_loss=0.0001588, whisper_loss=0.0767, over 13239.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01061, ecapa_loss=0.0001477, whisper_loss=0.09074, over 3861637.90 frames. ], batch size: 54, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:13:32,568 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-17 14:13:33,707 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-17 14:13:37,023 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.04 vs. limit=10.0 2024-08-17 14:13:49,827 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.24 vs. limit=15.0 2024-08-17 14:13:50,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3349600.0, ans=0.125 2024-08-17 14:13:52,121 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3349600.0, ans=0.0 2024-08-17 14:14:02,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3349700.0, ans=0.125 2024-08-17 14:14:15,799 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.281e+01 2.587e+01 2.940e+01 5.139e+01, threshold=5.174e+01, percent-clipped=0.0 2024-08-17 14:14:25,523 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3349900.0, ans=0.1 2024-08-17 14:14:28,027 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=3349900.0, ans=0.02 2024-08-17 14:14:32,936 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 3000, loss[loss=0.07531, beats_loss=0.01139, ecapa_loss=0.0001691, whisper_loss=0.06223, over 13257.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01063, ecapa_loss=0.000148, whisper_loss=0.09005, over 3857172.23 frames. ], batch size: 53, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:14:32,936 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-17 14:15:10,824 INFO [train_multi_KD3.py:1149] (2/4) Epoch 23, validation on ASR_libri: loss=0.251, beats_loss=0, ecapa_loss=0.0005243, whisper_loss=0.2458, over 922467.00 frames. 2024-08-17 14:15:29,033 INFO [train_multi_KD3.py:1149] (2/4) Epoch 23, validation on SV_voxceleb1: loss=0.004133, beats_loss=0, ecapa_loss=0.0004133, whisper_loss=0, over 939242.00 frames. 2024-08-17 14:16:19,302 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.3386, 2.0115, 1.7001, 1.7722], device='cuda:2') 2024-08-17 14:16:39,827 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.1708, 1.8790, 1.8614, 1.8322], device='cuda:2') 2024-08-17 14:17:17,977 INFO [train_multi_KD3.py:1149] (2/4) Epoch 23, validation on AT_audioset: loss=0.02323, beats_loss=0.02323, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-17 14:17:17,980 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31584MB 2024-08-17 14:17:18,286 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3350000.0, ans=0.1 2024-08-17 14:17:19,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3350000.0, ans=0.125 2024-08-17 14:17:55,981 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3350200.0, ans=0.0 2024-08-17 14:18:15,380 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3350400.0, ans=0.2 2024-08-17 14:18:16,824 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3350400.0, ans=0.125 2024-08-17 14:18:26,894 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 3050, loss[loss=0.102, beats_loss=0.009925, ecapa_loss=0.0001448, whisper_loss=0.09058, over 23168.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01067, ecapa_loss=0.0001484, whisper_loss=0.09065, over 3878241.01 frames. ], batch size: 93, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:18:28,737 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 14:18:33,370 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.77 vs. limit=22.5 2024-08-17 14:18:34,613 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3350500.0, ans=0.125 2024-08-17 14:18:37,844 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=15.0 2024-08-17 14:18:44,124 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 14:18:45,766 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.66 vs. limit=22.5 2024-08-17 14:18:52,437 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.72 vs. limit=15.0 2024-08-17 14:19:01,253 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-17 14:19:17,886 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.330e+01 2.532e+01 2.837e+01 4.705e+01, threshold=5.064e+01, percent-clipped=0.0 2024-08-17 14:19:21,100 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3350900.0, ans=0.125 2024-08-17 14:19:33,450 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3351000.0, ans=0.125 2024-08-17 14:19:33,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3351000.0, ans=0.0 2024-08-17 14:19:34,186 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 3100, loss[loss=0.08886, beats_loss=0.00902, ecapa_loss=0.000154, whisper_loss=0.0783, over 15925.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01057, ecapa_loss=0.0001489, whisper_loss=0.09089, over 3873332.65 frames. ], batch size: 63, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:19:34,634 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3351000.0, ans=0.125 2024-08-17 14:20:19,801 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-17 14:20:24,844 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 13 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-17 14:20:25,446 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.76 vs. limit=15.0 2024-08-17 14:20:27,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3351400.0, ans=0.0 2024-08-17 14:20:30,188 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3351400.0, ans=0.125 2024-08-17 14:20:37,354 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 3150, loss[loss=0.08513, beats_loss=0.01174, ecapa_loss=0.0001272, whisper_loss=0.07211, over 20290.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01057, ecapa_loss=0.0001484, whisper_loss=0.091, over 3875648.11 frames. ], batch size: 80, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:20:38,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3351500.0, ans=0.125 2024-08-17 14:20:45,914 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 23 from LS+wenet, 35 from Vox, 35 fro AS 2024-08-17 14:20:47,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3351500.0, ans=0.1 2024-08-17 14:20:49,598 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 16 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-17 14:20:53,990 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.96 vs. limit=15.0 2024-08-17 14:21:18,323 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3351800.0, ans=0.2 2024-08-17 14:21:23,117 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.388e+01 2.692e+01 3.115e+01 4.630e+01, threshold=5.383e+01, percent-clipped=0.0 2024-08-17 14:21:23,698 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3351800.0, ans=0.0 2024-08-17 14:21:27,153 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-17 14:21:28,480 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-17 14:21:39,581 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 3200, loss[loss=0.1105, beats_loss=0.008588, ecapa_loss=0.0001408, whisper_loss=0.1005, over 22313.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01065, ecapa_loss=0.0001485, whisper_loss=0.09057, over 3887720.67 frames. ], batch size: 88, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:21:45,040 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3352000.0, ans=0.125 2024-08-17 14:21:48,504 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-17 14:21:48,999 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.83 vs. limit=15.0 2024-08-17 14:21:57,063 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3352100.0, ans=0.1 2024-08-17 14:22:00,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3352100.0, ans=0.0 2024-08-17 14:22:01,037 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.56 vs. limit=22.5 2024-08-17 14:22:12,909 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-17 14:22:15,299 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-17 14:22:28,110 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3352400.0, ans=0.0 2024-08-17 14:22:34,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3352400.0, ans=0.125 2024-08-17 14:22:41,331 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 3250, loss[loss=0.1228, beats_loss=0.008107, ecapa_loss=0.000173, whisper_loss=0.1129, over 22683.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0106, ecapa_loss=0.0001501, whisper_loss=0.09121, over 3930860.93 frames. ], batch size: 90, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:22:41,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3352500.0, ans=0.95 2024-08-17 14:22:52,470 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 22 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-17 14:22:52,917 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.79 vs. limit=15.0 2024-08-17 14:22:57,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3352600.0, ans=0.125 2024-08-17 14:23:00,409 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3352600.0, ans=0.05 2024-08-17 14:23:23,041 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.67 vs. limit=15.0 2024-08-17 14:23:27,095 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.969e+01 2.470e+01 2.720e+01 3.088e+01 1.006e+02, threshold=5.440e+01, percent-clipped=1.0 2024-08-17 14:23:28,685 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3352800.0, ans=0.1 2024-08-17 14:23:30,574 WARNING [optim.py:496] (2/4) Scaling gradients by 0.0725674256682396, model_norm_threshold=54.39925765991211 2024-08-17 14:23:30,735 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.19, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.092e+05, grad_sumsq=1.076e+07, orig_rms_sq=1.015e-02 2024-08-17 14:23:32,336 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 23 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-17 14:23:35,364 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.93 vs. limit=15.0 2024-08-17 14:23:39,876 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.86 vs. limit=15.0 2024-08-17 14:23:43,234 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.00 vs. limit=15.0 2024-08-17 14:23:43,719 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 3300, loss[loss=0.08047, beats_loss=0.01021, ecapa_loss=0.000126, whisper_loss=0.069, over 16475.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01059, ecapa_loss=0.0001493, whisper_loss=0.09116, over 3961757.80 frames. ], batch size: 63, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:23:44,190 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3353000.0, ans=0.125 2024-08-17 14:24:09,988 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.29 vs. limit=12.0 2024-08-17 14:24:11,151 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 14:24:12,075 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 13 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-17 14:24:33,612 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3353400.0, ans=0.2 2024-08-17 14:24:34,431 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 25 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-17 14:24:41,157 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3353400.0, ans=0.2 2024-08-17 14:24:45,387 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 3350, loss[loss=0.07626, beats_loss=0.01174, ecapa_loss=0.0001486, whisper_loss=0.06304, over 17672.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01055, ecapa_loss=0.0001492, whisper_loss=0.09135, over 3934246.16 frames. ], batch size: 74, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:24:49,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3353500.0, ans=0.0 2024-08-17 14:25:10,506 WARNING [optim.py:496] (2/4) Scaling gradients by 0.057717882096767426, model_norm_threshold=54.39925765991211 2024-08-17 14:25:10,679 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.125e+05, grad_sumsq=1.125e+05, orig_rms_sq=1.000e+00 2024-08-17 14:25:23,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3353800.0, ans=0.125 2024-08-17 14:25:31,357 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.095e+01 2.365e+01 2.660e+01 2.915e+01 9.425e+02, threshold=5.320e+01, percent-clipped=5.0 2024-08-17 14:25:36,995 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3353900.0, ans=0.125 2024-08-17 14:25:37,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3353900.0, ans=0.0 2024-08-17 14:25:39,368 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3353900.0, ans=0.125 2024-08-17 14:25:42,601 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-17 14:25:44,203 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3353900.0, ans=0.1 2024-08-17 14:25:45,289 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-17 14:25:47,686 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 3400, loss[loss=0.09203, beats_loss=0.01014, ecapa_loss=0.0001586, whisper_loss=0.08031, over 22898.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01054, ecapa_loss=0.0001494, whisper_loss=0.09154, over 3945153.49 frames. ], batch size: 95, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:25:50,954 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2024-08-17 14:25:58,094 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3354000.0, ans=0.2 2024-08-17 14:25:58,168 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3354000.0, ans=0.125 2024-08-17 14:26:01,315 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 17 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-17 14:26:03,236 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.05 vs. limit=15.0 2024-08-17 14:26:08,575 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 36 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-17 14:26:10,269 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3354100.0, ans=0.125 2024-08-17 14:26:16,316 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3354200.0, ans=0.125 2024-08-17 14:26:20,538 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.whiten.whitening_limit, batch_count=3354200.0, ans=12.0 2024-08-17 14:26:22,904 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=3354200.0, ans=0.2 2024-08-17 14:26:29,879 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 39 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-17 14:26:45,855 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.57 vs. limit=15.0 2024-08-17 14:26:50,035 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 3450, loss[loss=0.1004, beats_loss=0.01266, ecapa_loss=7.95e-05, whisper_loss=0.08699, over 15453.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01053, ecapa_loss=0.0001494, whisper_loss=0.09205, over 3944654.54 frames. ], batch size: 55, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:26:54,136 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-17 14:27:00,941 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-17 14:27:05,368 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.75 vs. limit=22.5 2024-08-17 14:27:22,973 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 24 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-17 14:27:25,400 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3354700.0, ans=0.125 2024-08-17 14:27:27,023 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.30 vs. limit=15.0 2024-08-17 14:27:37,060 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.241e+01 2.529e+01 2.817e+01 6.166e+01, threshold=5.057e+01, percent-clipped=1.0 2024-08-17 14:27:37,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3354800.0, ans=0.0 2024-08-17 14:27:42,635 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3354900.0, ans=0.2 2024-08-17 14:27:48,687 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 14:27:52,951 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 3500, loss[loss=0.09939, beats_loss=0.01069, ecapa_loss=0.0001363, whisper_loss=0.08733, over 14606.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01054, ecapa_loss=0.0001495, whisper_loss=0.09122, over 3926508.78 frames. ], batch size: 58, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:27:53,385 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3355000.0, ans=0.125 2024-08-17 14:28:04,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3355100.0, ans=0.125 2024-08-17 14:28:15,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3355100.0, ans=0.1 2024-08-17 14:28:15,903 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-17 14:28:24,763 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3355200.0, ans=0.09899494936611666 2024-08-17 14:28:35,583 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3355300.0, ans=0.125 2024-08-17 14:28:38,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3355300.0, ans=0.125 2024-08-17 14:28:53,167 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 3550, loss[loss=0.1057, beats_loss=0.01233, ecapa_loss=0.0001486, whisper_loss=0.09193, over 19253.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01064, ecapa_loss=0.0001493, whisper_loss=0.09077, over 3922083.42 frames. ], batch size: 77, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:28:53,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3355500.0, ans=0.125 2024-08-17 14:28:55,772 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 15 from LS+wenet, 26 from Vox, 23 fro AS 2024-08-17 14:29:06,518 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-08-17 14:29:08,339 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3355600.0, ans=0.125 2024-08-17 14:29:10,745 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3355600.0, ans=0.0 2024-08-17 14:29:12,368 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.38 vs. limit=15.0 2024-08-17 14:29:27,737 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 22 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-17 14:29:36,836 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.83 vs. limit=10.0 2024-08-17 14:29:38,580 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.236e+01 2.510e+01 2.747e+01 4.772e+01, threshold=5.021e+01, percent-clipped=0.0 2024-08-17 14:29:48,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3355900.0, ans=0.0 2024-08-17 14:29:52,052 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-17 14:29:54,606 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 3600, loss[loss=0.08382, beats_loss=0.01514, ecapa_loss=0.0001108, whisper_loss=0.06757, over 23264.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01073, ecapa_loss=0.0001478, whisper_loss=0.09001, over 3896146.83 frames. ], batch size: 93, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:30:12,289 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3356100.0, ans=0.125 2024-08-17 14:30:18,327 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3356200.0, ans=0.0 2024-08-17 14:30:20,439 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 23 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-17 14:30:23,809 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-17 14:30:36,175 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3356300.0, ans=0.125 2024-08-17 14:30:39,830 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 20 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-17 14:30:47,554 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3356400.0, ans=0.125 2024-08-17 14:30:55,001 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-17 14:30:56,143 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 3650, loss[loss=0.1053, beats_loss=0.01169, ecapa_loss=0.000149, whisper_loss=0.0921, over 22131.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01072, ecapa_loss=0.0001499, whisper_loss=0.08925, over 3899849.36 frames. ], batch size: 92, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:30:59,184 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3356500.0, ans=0.1 2024-08-17 14:31:06,332 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3356500.0, ans=0.0 2024-08-17 14:31:07,676 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3356600.0, ans=0.0 2024-08-17 14:31:18,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3356600.0, ans=0.125 2024-08-17 14:31:30,322 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 35 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-17 14:31:30,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3356700.0, ans=0.2 2024-08-17 14:31:33,073 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3356800.0, ans=0.0 2024-08-17 14:31:37,556 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-17 14:31:40,170 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3356800.0, ans=0.125 2024-08-17 14:31:40,881 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.623e+01 2.172e+01 2.480e+01 2.940e+01 4.712e+01, threshold=4.960e+01, percent-clipped=0.0 2024-08-17 14:31:43,384 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-17 14:31:45,996 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-17 14:31:50,661 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 20 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-17 14:31:53,229 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 31 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-17 14:31:54,616 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3356900.0, ans=10.0 2024-08-17 14:31:56,903 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 3700, loss[loss=0.09352, beats_loss=0.009917, ecapa_loss=0.0001394, whisper_loss=0.08221, over 17131.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01062, ecapa_loss=0.0001493, whisper_loss=0.08959, over 3898091.24 frames. ], batch size: 66, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:32:16,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3357100.0, ans=0.125 2024-08-17 14:32:35,151 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 22 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-17 14:32:39,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3357300.0, ans=0.2 2024-08-17 14:32:40,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3357300.0, ans=0.125 2024-08-17 14:32:52,959 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3357400.0, ans=0.0 2024-08-17 14:32:57,948 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3357500.0, ans=0.125 2024-08-17 14:32:58,716 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 3750, loss[loss=0.1097, beats_loss=0.01173, ecapa_loss=0.0001436, whisper_loss=0.09655, over 18400.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0106, ecapa_loss=0.0001478, whisper_loss=0.08987, over 3859705.45 frames. ], batch size: 75, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:33:01,797 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.85 vs. limit=15.0 2024-08-17 14:33:28,806 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3357700.0, ans=0.125 2024-08-17 14:33:35,874 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 27 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-17 14:33:42,298 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 14 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-17 14:33:44,595 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.285e+01 2.536e+01 2.853e+01 4.848e+01, threshold=5.071e+01, percent-clipped=0.0 2024-08-17 14:33:53,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3357900.0, ans=0.015 2024-08-17 14:33:59,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3358000.0, ans=0.125 2024-08-17 14:34:00,721 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 3800, loss[loss=0.09682, beats_loss=0.01054, ecapa_loss=0.000162, whisper_loss=0.08466, over 20442.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01069, ecapa_loss=0.0001462, whisper_loss=0.08979, over 3869563.15 frames. ], batch size: 86, lr: 2.67e-03, grad_scale: 1.152921504606847e+18 2024-08-17 14:34:02,645 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.76 vs. limit=10.0 2024-08-17 14:34:10,732 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.686e-03 2024-08-17 14:34:20,339 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 21 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-17 14:35:00,167 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-17 14:35:02,359 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 3850, loss[loss=0.1035, beats_loss=0.01036, ecapa_loss=0.0001445, whisper_loss=0.09171, over 22537.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01067, ecapa_loss=0.000147, whisper_loss=0.0893, over 3876722.54 frames. ], batch size: 92, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:35:02,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3358500.0, ans=0.125 2024-08-17 14:35:20,522 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.66 vs. limit=15.0 2024-08-17 14:35:27,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3358700.0, ans=0.05 2024-08-17 14:35:36,139 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3358700.0, ans=0.2 2024-08-17 14:35:40,191 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.01 vs. limit=6.0 2024-08-17 14:35:44,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3358800.0, ans=0.1 2024-08-17 14:35:44,780 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.74 vs. limit=15.0 2024-08-17 14:35:48,074 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 26 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-17 14:35:49,068 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.772e+01 2.270e+01 2.483e+01 2.770e+01 3.754e+01, threshold=4.966e+01, percent-clipped=0.0 2024-08-17 14:36:03,806 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 3900, loss[loss=0.1175, beats_loss=0.008554, ecapa_loss=0.0001603, whisper_loss=0.1074, over 23644.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01058, ecapa_loss=0.0001473, whisper_loss=0.09051, over 3881765.38 frames. ], batch size: 94, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:36:06,434 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 37 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-17 14:36:15,237 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 33 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-17 14:36:15,489 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3359100.0, ans=0.1 2024-08-17 14:36:20,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3359100.0, ans=0.2 2024-08-17 14:36:22,464 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3359100.0, ans=0.0 2024-08-17 14:36:31,269 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=3359200.0, ans=0.1 2024-08-17 14:36:44,653 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-17 14:36:54,236 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3359400.0, ans=0.0 2024-08-17 14:36:57,958 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3359400.0, ans=0.0 2024-08-17 14:37:06,360 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 3950, loss[loss=0.1096, beats_loss=0.01046, ecapa_loss=0.000171, whisper_loss=0.09745, over 22824.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01063, ecapa_loss=0.000147, whisper_loss=0.09005, over 3895494.07 frames. ], batch size: 94, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:37:10,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3359500.0, ans=0.1 2024-08-17 14:37:12,032 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 23 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-17 14:37:17,402 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3359500.0, ans=0.125 2024-08-17 14:37:31,591 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-17 14:37:34,675 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3359700.0, ans=0.125 2024-08-17 14:37:35,911 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 16 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-17 14:37:38,956 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 20 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-17 14:37:40,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3359700.0, ans=0.125 2024-08-17 14:37:56,022 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 22 from LS+wenet, 23 from Vox, 12 fro AS 2024-08-17 14:37:58,589 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.742e+01 2.357e+01 2.553e+01 2.882e+01 5.685e+01, threshold=5.106e+01, percent-clipped=1.0 2024-08-17 14:38:03,456 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-17 14:38:03,862 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3359900.0, ans=0.125 2024-08-17 14:38:11,331 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.04 vs. limit=15.0 2024-08-17 14:38:13,689 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-17 14:38:17,793 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3360000.0, ans=0.125 2024-08-17 14:38:18,717 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 4000, loss[loss=0.099, beats_loss=0.0124, ecapa_loss=0.000124, whisper_loss=0.08535, over 21859.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01058, ecapa_loss=0.0001473, whisper_loss=0.0905, over 3906203.63 frames. ], batch size: 91, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:38:19,128 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=3360000.0, ans=0.1 2024-08-17 14:38:21,292 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 33 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-17 14:38:37,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3360100.0, ans=0.2 2024-08-17 14:38:43,739 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 25 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-17 14:38:45,211 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 16 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-17 14:38:45,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3360100.0, ans=0.125 2024-08-17 14:39:10,739 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3360300.0, ans=0.0 2024-08-17 14:39:11,701 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-17 14:39:17,467 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.04 vs. limit=10.0 2024-08-17 14:39:26,468 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3360400.0, ans=0.1 2024-08-17 14:39:34,779 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 4050, loss[loss=0.09381, beats_loss=0.01141, ecapa_loss=0.0001411, whisper_loss=0.08099, over 16267.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01062, ecapa_loss=0.0001479, whisper_loss=0.09013, over 3903790.89 frames. ], batch size: 65, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:39:47,938 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-17 14:40:00,973 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3360600.0, ans=0.125 2024-08-17 14:40:09,698 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3360700.0, ans=0.0 2024-08-17 14:40:11,274 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 24 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-17 14:40:17,289 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 32 from Vox, 32 fro AS 2024-08-17 14:40:29,952 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.805e+01 2.266e+01 2.528e+01 2.779e+01 4.224e+01, threshold=5.056e+01, percent-clipped=0.0 2024-08-17 14:40:35,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3360900.0, ans=0.125 2024-08-17 14:40:42,078 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 20 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-17 14:40:47,272 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 4100, loss[loss=0.08595, beats_loss=0.01219, ecapa_loss=0.0001573, whisper_loss=0.07218, over 21520.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01065, ecapa_loss=0.000148, whisper_loss=0.0899, over 3940700.24 frames. ], batch size: 93, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:40:55,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3361000.0, ans=0.125 2024-08-17 14:41:05,580 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.95 vs. limit=15.0 2024-08-17 14:41:07,646 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 32 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-17 14:41:09,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3361100.0, ans=0.0 2024-08-17 14:41:26,244 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3361200.0, ans=0.1 2024-08-17 14:41:33,031 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 23 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-17 14:41:39,143 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3361300.0, ans=0.0 2024-08-17 14:42:02,454 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 4150, loss[loss=0.1054, beats_loss=0.01136, ecapa_loss=0.0001459, whisper_loss=0.09261, over 17122.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01064, ecapa_loss=0.0001487, whisper_loss=0.09015, over 3926825.93 frames. ], batch size: 69, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:42:22,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3361600.0, ans=0.125 2024-08-17 14:42:56,128 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.031e+00 2024-08-17 14:42:56,842 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.341e+01 2.562e+01 2.816e+01 4.019e+01, threshold=5.124e+01, percent-clipped=0.0 2024-08-17 14:43:11,460 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3361900.0, ans=0.125 2024-08-17 14:43:13,750 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 4200, loss[loss=0.09898, beats_loss=0.01156, ecapa_loss=0.0001477, whisper_loss=0.08594, over 22527.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01055, ecapa_loss=0.0001502, whisper_loss=0.09038, over 3904958.11 frames. ], batch size: 91, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:43:17,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3362000.0, ans=0.0 2024-08-17 14:43:27,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3362100.0, ans=0.125 2024-08-17 14:43:53,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3362200.0, ans=0.2 2024-08-17 14:43:55,803 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3362200.0, ans=0.0 2024-08-17 14:43:57,226 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3362300.0, ans=0.125 2024-08-17 14:44:02,732 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3362300.0, ans=0.0 2024-08-17 14:44:09,960 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 25 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-17 14:44:17,473 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.52 vs. limit=15.0 2024-08-17 14:44:26,494 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 4250, loss[loss=0.09237, beats_loss=0.01143, ecapa_loss=0.0001647, whisper_loss=0.07929, over 19866.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01058, ecapa_loss=0.0001496, whisper_loss=0.08989, over 3877683.01 frames. ], batch size: 85, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:44:27,578 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3362500.0, ans=0.125 2024-08-17 14:44:30,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3362500.0, ans=0.0 2024-08-17 14:44:40,434 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3362600.0, ans=0.0 2024-08-17 14:44:41,571 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-17 14:44:49,186 WARNING [optim.py:496] (2/4) Scaling gradients by 0.06460781395435333, model_norm_threshold=51.23520278930664 2024-08-17 14:44:49,351 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.0.norm.log_scale with proportion 0.12, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.238e+04, grad_sumsq=7.238e+04, orig_rms_sq=1.000e+00 2024-08-17 14:44:51,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3362600.0, ans=0.0 2024-08-17 14:44:59,245 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.31 vs. limit=6.0 2024-08-17 14:45:13,697 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.72 vs. limit=15.0 2024-08-17 14:45:20,325 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3362800.0, ans=15.0 2024-08-17 14:45:21,084 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 22 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-17 14:45:23,607 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.849e+01 2.395e+01 2.661e+01 3.176e+01 7.930e+02, threshold=5.321e+01, percent-clipped=4.0 2024-08-17 14:45:25,180 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-17 14:45:32,628 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3362900.0, ans=0.1 2024-08-17 14:45:32,741 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3362900.0, ans=0.0 2024-08-17 14:45:41,399 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 4300, loss[loss=0.09926, beats_loss=0.009782, ecapa_loss=0.0001406, whisper_loss=0.08807, over 20998.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01057, ecapa_loss=0.0001498, whisper_loss=0.09001, over 3910927.34 frames. ], batch size: 80, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:46:00,195 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3363100.0, ans=0.125 2024-08-17 14:46:04,408 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 27 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-17 14:46:12,231 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.147e+01 2024-08-17 14:46:12,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3363200.0, ans=0.1 2024-08-17 14:46:15,142 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3363200.0, ans=0.2 2024-08-17 14:46:38,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3363300.0, ans=0.125 2024-08-17 14:46:39,194 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3363300.0, ans=0.2 2024-08-17 14:46:44,438 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 28 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-17 14:46:57,292 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 4350, loss[loss=0.09072, beats_loss=0.01138, ecapa_loss=0.0001371, whisper_loss=0.07798, over 17789.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01063, ecapa_loss=0.0001501, whisper_loss=0.08891, over 3890907.86 frames. ], batch size: 73, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:47:04,800 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.53 vs. limit=15.0 2024-08-17 14:47:30,697 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-17 14:47:38,059 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3363700.0, ans=0.1 2024-08-17 14:47:53,260 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.243e+01 2.531e+01 2.839e+01 4.488e+01, threshold=5.063e+01, percent-clipped=0.0 2024-08-17 14:47:57,984 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 29 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-17 14:48:02,462 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 24 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-17 14:48:10,577 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 4400, loss[loss=0.09998, beats_loss=0.01096, ecapa_loss=0.0001334, whisper_loss=0.08768, over 22885.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01055, ecapa_loss=0.0001505, whisper_loss=0.08968, over 3885756.72 frames. ], batch size: 89, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:48:11,048 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=8.209e-02 2024-08-17 14:48:12,903 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-17 14:48:14,008 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.986e+01 2024-08-17 14:48:16,106 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 24 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-17 14:48:45,063 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-17 14:48:46,626 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=3364200.0, ans=0.1 2024-08-17 14:48:55,794 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3364300.0, ans=0.125 2024-08-17 14:49:21,710 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 4450, loss[loss=0.08247, beats_loss=0.01039, ecapa_loss=0.0001323, whisper_loss=0.07075, over 20130.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01051, ecapa_loss=0.0001491, whisper_loss=0.08979, over 3901974.79 frames. ], batch size: 82, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:49:38,778 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 24 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-17 14:49:54,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3364700.0, ans=0.125 2024-08-17 14:50:08,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3364800.0, ans=0.0 2024-08-17 14:50:16,738 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.265e+01 2.477e+01 2.746e+01 3.870e+01, threshold=4.955e+01, percent-clipped=0.0 2024-08-17 14:50:33,271 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 4500, loss[loss=0.07803, beats_loss=0.0113, ecapa_loss=0.0001687, whisper_loss=0.06504, over 20541.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01051, ecapa_loss=0.0001478, whisper_loss=0.08992, over 3906572.64 frames. ], batch size: 88, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:50:34,670 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 17 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-17 14:50:45,734 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3365100.0, ans=0.2 2024-08-17 14:50:47,110 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.78 vs. limit=10.0 2024-08-17 14:50:48,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3365100.0, ans=0.125 2024-08-17 14:50:58,465 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.98 vs. limit=15.0 2024-08-17 14:50:59,625 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=3.177e-02 2024-08-17 14:51:01,447 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.88 vs. limit=22.5 2024-08-17 14:51:03,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3365200.0, ans=0.125 2024-08-17 14:51:09,069 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 27 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-17 14:51:13,095 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 29 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-17 14:51:13,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3365300.0, ans=0.125 2024-08-17 14:51:13,532 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3365300.0, ans=0.125 2024-08-17 14:51:21,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3365300.0, ans=0.125 2024-08-17 14:51:36,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3365400.0, ans=0.125 2024-08-17 14:51:42,255 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 4550, loss[loss=0.07995, beats_loss=0.01047, ecapa_loss=0.0001898, whisper_loss=0.06757, over 18876.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01061, ecapa_loss=0.0001474, whisper_loss=0.09002, over 3915116.73 frames. ], batch size: 79, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:51:42,877 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=3365500.0, ans=0.2 2024-08-17 14:51:49,291 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 30 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-17 14:52:02,012 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3365600.0, ans=0.125 2024-08-17 14:52:03,258 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3365600.0, ans=0.125 2024-08-17 14:52:22,434 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3365800.0, ans=0.0 2024-08-17 14:52:31,900 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.242e+01 2.494e+01 2.758e+01 4.249e+01, threshold=4.988e+01, percent-clipped=0.0 2024-08-17 14:52:47,111 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 4600, loss[loss=0.09232, beats_loss=0.01126, ecapa_loss=0.0001664, whisper_loss=0.0794, over 23019.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01061, ecapa_loss=0.0001474, whisper_loss=0.09026, over 3928293.72 frames. ], batch size: 97, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:52:55,667 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 37 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-17 14:52:59,454 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-17 14:53:10,858 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3366200.0, ans=0.125 2024-08-17 14:53:16,294 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.17 vs. limit=15.0 2024-08-17 14:53:16,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3366200.0, ans=0.0 2024-08-17 14:53:19,583 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.41 vs. limit=6.0 2024-08-17 14:53:34,859 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 33 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-17 14:53:44,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3366400.0, ans=0.125 2024-08-17 14:53:48,227 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 4650, loss[loss=0.1078, beats_loss=0.01035, ecapa_loss=0.0001415, whisper_loss=0.09607, over 22089.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01053, ecapa_loss=0.0001475, whisper_loss=0.09058, over 3908523.13 frames. ], batch size: 87, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:53:55,946 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-17 14:53:57,272 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-17 14:54:01,597 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.17 vs. limit=15.0 2024-08-17 14:54:02,247 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 17 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-17 14:54:03,470 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-17 14:54:05,267 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3366600.0, ans=0.125 2024-08-17 14:54:07,592 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3366600.0, ans=0.2 2024-08-17 14:54:17,346 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 16 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-17 14:54:27,654 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3366800.0, ans=0.0 2024-08-17 14:54:35,759 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.359e+01 2.591e+01 2.983e+01 4.791e+01, threshold=5.183e+01, percent-clipped=0.0 2024-08-17 14:54:51,077 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 4700, loss[loss=0.08829, beats_loss=0.01294, ecapa_loss=0.0001509, whisper_loss=0.07383, over 19728.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01054, ecapa_loss=0.0001472, whisper_loss=0.09052, over 3905979.23 frames. ], batch size: 82, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:54:51,451 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3367000.0, ans=0.125 2024-08-17 14:54:56,368 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-17 14:55:22,343 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 25 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-17 14:55:23,751 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3367200.0, ans=0.125 2024-08-17 14:55:28,983 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3367300.0, ans=0.1 2024-08-17 14:55:37,703 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-17 14:55:42,200 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 22 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-17 14:55:53,158 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 4750, loss[loss=0.08848, beats_loss=0.01449, ecapa_loss=0.0001273, whisper_loss=0.07271, over 15094.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01052, ecapa_loss=0.000147, whisper_loss=0.09033, over 3901226.33 frames. ], batch size: 61, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:55:53,967 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.57 vs. limit=12.0 2024-08-17 14:56:04,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3367600.0, ans=0.125 2024-08-17 14:56:09,522 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3367600.0, ans=0.0 2024-08-17 14:56:16,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3367600.0, ans=0.0 2024-08-17 14:56:21,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3367700.0, ans=0.125 2024-08-17 14:56:25,651 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 20 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-17 14:56:40,095 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.703e+01 2.331e+01 2.583e+01 3.020e+01 4.522e+01, threshold=5.166e+01, percent-clipped=0.0 2024-08-17 14:56:52,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3367900.0, ans=0.1 2024-08-17 14:56:53,326 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3367900.0, ans=0.2 2024-08-17 14:56:55,674 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 4800, loss[loss=0.07594, beats_loss=0.0109, ecapa_loss=0.0001364, whisper_loss=0.06367, over 21580.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01042, ecapa_loss=0.0001474, whisper_loss=0.09043, over 3912863.12 frames. ], batch size: 86, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:57:03,303 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-17 14:57:14,607 WARNING [optim.py:496] (2/4) Scaling gradients by 0.09035546332597733, model_norm_threshold=51.66227722167969 2024-08-17 14:57:14,768 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.0.norm.log_scale with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.290e+04, grad_sumsq=4.290e+04, orig_rms_sq=1.000e+00 2024-08-17 14:57:16,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3368100.0, ans=0.0 2024-08-17 14:57:18,675 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3368100.0, ans=0.0 2024-08-17 14:57:23,769 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3368200.0, ans=0.2 2024-08-17 14:57:23,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3368200.0, ans=0.0 2024-08-17 14:57:26,003 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 27 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-17 14:57:27,246 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-17 14:57:29,892 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-17 14:57:33,618 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 23 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-17 14:57:57,567 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 4850, loss[loss=0.08886, beats_loss=0.0116, ecapa_loss=0.0001204, whisper_loss=0.07606, over 19321.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0104, ecapa_loss=0.0001472, whisper_loss=0.09091, over 3897760.61 frames. ], batch size: 76, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:57:58,390 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.21 vs. limit=22.5 2024-08-17 14:58:00,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3368500.0, ans=0.0 2024-08-17 14:58:00,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3368500.0, ans=0.0 2024-08-17 14:58:01,347 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-17 14:58:02,429 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 27 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-17 14:58:11,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3368600.0, ans=0.125 2024-08-17 14:58:29,904 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3368700.0, ans=0.09899494936611666 2024-08-17 14:58:44,398 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.617e+01 2.372e+01 2.629e+01 2.886e+01 5.718e+02, threshold=5.259e+01, percent-clipped=2.0 2024-08-17 14:58:47,134 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3368900.0, ans=0.2 2024-08-17 14:58:48,287 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3368900.0, ans=0.0 2024-08-17 14:58:49,613 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.071e-01 2024-08-17 14:58:58,568 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 4900, loss[loss=0.09828, beats_loss=0.009647, ecapa_loss=0.0001279, whisper_loss=0.08735, over 18140.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01036, ecapa_loss=0.0001482, whisper_loss=0.09128, over 3890954.46 frames. ], batch size: 70, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:59:01,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3369000.0, ans=0.125 2024-08-17 14:59:01,703 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3369000.0, ans=0.125 2024-08-17 14:59:07,498 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3369000.0, ans=0.015 2024-08-17 14:59:07,644 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3369000.0, ans=0.1 2024-08-17 14:59:28,451 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-17 14:59:40,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3369300.0, ans=0.1 2024-08-17 14:59:54,403 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 30 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-17 15:00:00,767 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 4950, loss[loss=0.103, beats_loss=0.01223, ecapa_loss=0.0001442, whisper_loss=0.08929, over 22544.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01046, ecapa_loss=0.0001467, whisper_loss=0.09075, over 3918324.41 frames. ], batch size: 90, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:00:10,860 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 23 from LS+wenet, 34 from Vox, 29 fro AS 2024-08-17 15:00:13,422 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3369600.0, ans=0.125 2024-08-17 15:00:30,376 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-17 15:00:33,877 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 14 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-17 15:00:35,675 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3369700.0, ans=10.0 2024-08-17 15:00:40,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3369800.0, ans=0.0 2024-08-17 15:00:44,247 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.732e+01 2024-08-17 15:00:47,635 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.355e+01 2.551e+01 2.775e+01 1.010e+02, threshold=5.101e+01, percent-clipped=1.0 2024-08-17 15:00:49,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3369900.0, ans=0.2 2024-08-17 15:00:57,056 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.96 vs. limit=22.5 2024-08-17 15:00:57,685 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 13 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-17 15:00:59,171 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3369900.0, ans=0.0 2024-08-17 15:01:02,479 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 5000, loss[loss=0.1068, beats_loss=0.01109, ecapa_loss=0.0001128, whisper_loss=0.09462, over 23475.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01051, ecapa_loss=0.0001463, whisper_loss=0.09059, over 3922823.85 frames. ], batch size: 90, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:01:07,864 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-17 15:01:19,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3370100.0, ans=0.125 2024-08-17 15:01:41,250 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 20 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-17 15:01:55,037 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-17 15:02:04,847 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 5050, loss[loss=0.1074, beats_loss=0.01239, ecapa_loss=0.0001106, whisper_loss=0.0939, over 23065.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01042, ecapa_loss=0.0001481, whisper_loss=0.09081, over 3938358.89 frames. ], batch size: 89, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:02:06,435 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3370500.0, ans=0.1 2024-08-17 15:02:42,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3370800.0, ans=0.1 2024-08-17 15:02:52,221 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.426e+01 2.645e+01 3.149e+01 5.792e+01, threshold=5.289e+01, percent-clipped=1.0 2024-08-17 15:02:52,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3370800.0, ans=0.0 2024-08-17 15:02:59,293 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 13 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-17 15:03:06,658 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 5100, loss[loss=0.09691, beats_loss=0.011, ecapa_loss=0.0001318, whisper_loss=0.08459, over 17261.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01047, ecapa_loss=0.0001479, whisper_loss=0.09032, over 3931699.66 frames. ], batch size: 64, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:03:22,457 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 15 from Vox, 48 fro AS 2024-08-17 15:03:33,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3371200.0, ans=0.1 2024-08-17 15:03:54,225 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 16 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-17 15:04:03,298 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 18 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-17 15:04:04,914 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3371400.0, ans=0.025 2024-08-17 15:04:08,148 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 5150, loss[loss=0.07857, beats_loss=0.01539, ecapa_loss=0.0001098, whisper_loss=0.06209, over 21663.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01052, ecapa_loss=0.0001476, whisper_loss=0.08996, over 3903961.98 frames. ], batch size: 88, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:04:09,930 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=9.459e+00 2024-08-17 15:04:37,308 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=18.94 vs. limit=15.0 2024-08-17 15:04:51,284 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.49 vs. limit=10.0 2024-08-17 15:04:55,474 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.283e+01 2.471e+01 2.711e+01 4.570e+01, threshold=4.941e+01, percent-clipped=0.0 2024-08-17 15:05:10,549 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 5200, loss[loss=0.1062, beats_loss=0.009831, ecapa_loss=0.0001125, whisper_loss=0.0952, over 16334.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0106, ecapa_loss=0.0001473, whisper_loss=0.08991, over 3897360.29 frames. ], batch size: 60, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:05:15,194 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.14 vs. limit=15.0 2024-08-17 15:05:24,486 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 32 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-17 15:05:27,347 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 15:05:32,079 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 26 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-17 15:06:01,234 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3372400.0, ans=0.125 2024-08-17 15:06:04,512 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-17 15:06:05,685 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 29 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-17 15:06:09,563 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3372400.0, ans=0.1 2024-08-17 15:06:12,941 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 5250, loss[loss=0.107, beats_loss=0.01117, ecapa_loss=0.0001092, whisper_loss=0.09476, over 18256.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01056, ecapa_loss=0.0001474, whisper_loss=0.09052, over 3876019.60 frames. ], batch size: 70, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:06:19,597 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 23 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-17 15:06:22,417 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3372500.0, ans=0.125 2024-08-17 15:06:24,630 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 34 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-17 15:06:24,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3372600.0, ans=0.125 2024-08-17 15:06:26,110 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3372600.0, ans=0.0 2024-08-17 15:06:32,242 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3372600.0, ans=0.125 2024-08-17 15:06:46,074 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3372700.0, ans=0.125 2024-08-17 15:06:58,651 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-17 15:07:01,132 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.915e+01 2.404e+01 2.586e+01 2.813e+01 5.298e+01, threshold=5.172e+01, percent-clipped=1.0 2024-08-17 15:07:01,311 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-17 15:07:05,224 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3372900.0, ans=0.1 2024-08-17 15:07:06,519 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3372900.0, ans=0.0 2024-08-17 15:07:10,354 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3372900.0, ans=0.125 2024-08-17 15:07:15,868 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 5300, loss[loss=0.1066, beats_loss=0.009444, ecapa_loss=0.0001584, whisper_loss=0.09561, over 13647.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01065, ecapa_loss=0.0001475, whisper_loss=0.09007, over 3884403.06 frames. ], batch size: 54, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:07:17,493 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3373000.0, ans=0.2 2024-08-17 15:07:21,086 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-17 15:07:30,091 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3373100.0, ans=0.125 2024-08-17 15:07:33,499 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 15:07:34,509 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-17 15:07:39,654 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-17 15:07:44,432 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-17 15:07:47,233 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 36 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-17 15:08:12,730 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-17 15:08:17,727 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 5350, loss[loss=0.1135, beats_loss=0.00825, ecapa_loss=0.0001391, whisper_loss=0.1038, over 14835.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01059, ecapa_loss=0.0001468, whisper_loss=0.09034, over 3877123.32 frames. ], batch size: 54, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:08:19,080 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 12 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-17 15:08:22,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3373500.0, ans=0.07 2024-08-17 15:08:25,643 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3373500.0, ans=0.125 2024-08-17 15:08:32,784 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3373600.0, ans=0.125 2024-08-17 15:08:50,297 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3373700.0, ans=0.1 2024-08-17 15:08:54,091 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 23 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-17 15:08:55,485 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3373800.0, ans=0.025 2024-08-17 15:09:05,089 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.366e+01 2.581e+01 2.898e+01 3.375e+02, threshold=5.162e+01, percent-clipped=2.0 2024-08-17 15:09:20,102 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 5400, loss[loss=0.1144, beats_loss=0.008048, ecapa_loss=0.0001505, whisper_loss=0.1048, over 20519.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01049, ecapa_loss=0.0001478, whisper_loss=0.09071, over 3871192.08 frames. ], batch size: 81, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:10:11,221 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.112e+01 2024-08-17 15:10:16,688 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-17 15:10:21,379 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 5450, loss[loss=0.1128, beats_loss=0.009512, ecapa_loss=0.000155, whisper_loss=0.1017, over 23915.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01052, ecapa_loss=0.0001482, whisper_loss=0.09131, over 3884056.70 frames. ], batch size: 93, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:10:23,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3374500.0, ans=0.1 2024-08-17 15:10:26,252 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 20 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-17 15:10:30,892 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3374500.0, ans=0.0 2024-08-17 15:10:34,849 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.77 vs. limit=22.5 2024-08-17 15:10:37,999 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3374600.0, ans=0.0 2024-08-17 15:10:38,958 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-17 15:10:45,382 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3374700.0, ans=0.125 2024-08-17 15:11:02,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3374800.0, ans=0.0 2024-08-17 15:11:08,519 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.416e+01 2.763e+01 3.086e+01 2.790e+02, threshold=5.526e+01, percent-clipped=2.0 2024-08-17 15:11:11,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3374900.0, ans=0.025 2024-08-17 15:11:13,863 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3374900.0, ans=0.125 2024-08-17 15:11:23,280 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 5500, loss[loss=0.07538, beats_loss=0.01376, ecapa_loss=0.0001282, whisper_loss=0.06033, over 22126.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01057, ecapa_loss=0.0001486, whisper_loss=0.09041, over 3878251.85 frames. ], batch size: 93, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:11:26,376 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3375000.0, ans=0.125 2024-08-17 15:11:30,415 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3375000.0, ans=0.125 2024-08-17 15:11:45,157 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3375100.0, ans=0.0 2024-08-17 15:11:59,875 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3375300.0, ans=0.125 2024-08-17 15:12:04,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3375300.0, ans=0.125 2024-08-17 15:12:14,726 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 18 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-17 15:12:22,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3375400.0, ans=0.04949747468305833 2024-08-17 15:12:26,008 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 5550, loss[loss=0.1096, beats_loss=0.009744, ecapa_loss=0.0001472, whisper_loss=0.09843, over 17898.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01061, ecapa_loss=0.0001472, whisper_loss=0.09008, over 3856250.86 frames. ], batch size: 71, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:12:27,353 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 12 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-17 15:12:48,108 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.62 vs. limit=6.0 2024-08-17 15:12:48,990 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3375600.0, ans=0.0 2024-08-17 15:12:50,074 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3375700.0, ans=0.0 2024-08-17 15:12:54,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3375700.0, ans=0.0 2024-08-17 15:13:02,332 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3375800.0, ans=0.0 2024-08-17 15:13:06,763 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.19 vs. limit=6.0 2024-08-17 15:13:06,912 WARNING [optim.py:496] (2/4) Scaling gradients by 0.024026568979024887, model_norm_threshold=55.25676727294922 2024-08-17 15:13:07,075 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.10, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.047e+05, grad_sumsq=1.492e+05, orig_rms_sq=3.383e+00 2024-08-17 15:13:08,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3375800.0, ans=0.125 2024-08-17 15:13:11,674 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.72 vs. limit=15.0 2024-08-17 15:13:13,360 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.928e+01 2.279e+01 2.545e+01 2.862e+01 2.300e+03, threshold=5.090e+01, percent-clipped=1.0 2024-08-17 15:13:28,517 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 5600, loss[loss=0.08983, beats_loss=0.01044, ecapa_loss=0.0001859, whisper_loss=0.07753, over 21176.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01065, ecapa_loss=0.0001472, whisper_loss=0.09016, over 3844023.09 frames. ], batch size: 89, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:13:30,180 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3376000.0, ans=0.2 2024-08-17 15:13:31,148 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 31 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-17 15:13:36,111 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 26 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-17 15:13:36,346 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3376000.0, ans=0.0 2024-08-17 15:13:42,784 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3376100.0, ans=0.2 2024-08-17 15:14:28,530 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3376400.0, ans=0.0 2024-08-17 15:14:30,525 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 5650, loss[loss=0.09393, beats_loss=0.0135, ecapa_loss=0.0001284, whisper_loss=0.07915, over 21887.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01066, ecapa_loss=0.0001475, whisper_loss=0.09008, over 3871617.72 frames. ], batch size: 88, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:14:34,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3376500.0, ans=0.0 2024-08-17 15:14:34,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3376500.0, ans=0.125 2024-08-17 15:15:05,392 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-17 15:15:07,838 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-17 15:15:17,996 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.711e+01 2.354e+01 2.556e+01 3.090e+01 4.828e+01, threshold=5.111e+01, percent-clipped=0.0 2024-08-17 15:15:23,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3376900.0, ans=0.0 2024-08-17 15:15:27,675 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 33 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-17 15:15:31,395 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 37 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-17 15:15:32,707 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 5700, loss[loss=0.1173, beats_loss=0.008878, ecapa_loss=0.0001308, whisper_loss=0.1071, over 24209.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01063, ecapa_loss=0.0001468, whisper_loss=0.09016, over 3874666.99 frames. ], batch size: 94, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:15:39,269 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3377000.0, ans=0.125 2024-08-17 15:15:41,261 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-17 15:15:46,522 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.70 vs. limit=15.0 2024-08-17 15:15:50,490 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.376e-02 2024-08-17 15:15:54,612 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.68 vs. limit=15.0 2024-08-17 15:15:56,510 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 19 from LS+wenet, 22 from Vox, 48 fro AS 2024-08-17 15:16:10,264 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-17 15:16:17,086 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3377300.0, ans=0.125 2024-08-17 15:16:25,639 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3377400.0, ans=0.0 2024-08-17 15:16:35,464 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 5750, loss[loss=0.08287, beats_loss=0.01259, ecapa_loss=0.0001927, whisper_loss=0.06835, over 19142.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01056, ecapa_loss=0.0001481, whisper_loss=0.09061, over 3867821.33 frames. ], batch size: 85, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:16:46,892 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3377600.0, ans=0.125 2024-08-17 15:16:48,922 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-17 15:16:57,948 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 36 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-17 15:17:02,189 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2024-08-17 15:17:05,937 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3377700.0, ans=0.2 2024-08-17 15:17:09,938 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3377700.0, ans=10.0 2024-08-17 15:17:22,651 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.322e+01 2.527e+01 2.770e+01 4.049e+01, threshold=5.054e+01, percent-clipped=0.0 2024-08-17 15:17:37,655 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 5800, loss[loss=0.07793, beats_loss=0.01403, ecapa_loss=0.0001344, whisper_loss=0.06256, over 18142.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01056, ecapa_loss=0.0001483, whisper_loss=0.09075, over 3921816.31 frames. ], batch size: 75, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:17:38,070 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3378000.0, ans=10.0 2024-08-17 15:17:40,704 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3378000.0, ans=0.125 2024-08-17 15:17:45,339 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-17 15:17:47,063 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3378000.0, ans=0.125 2024-08-17 15:17:53,069 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-17 15:17:54,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3378100.0, ans=0.125 2024-08-17 15:18:06,655 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-17 15:18:19,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3378300.0, ans=0.0 2024-08-17 15:18:20,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3378300.0, ans=0.2 2024-08-17 15:18:21,578 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 17 from LS+wenet, 28 from Vox, 47 fro AS 2024-08-17 15:18:27,059 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3378400.0, ans=0.125 2024-08-17 15:18:30,428 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-17 15:18:40,456 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 5850, loss[loss=0.1067, beats_loss=0.01135, ecapa_loss=0.0001539, whisper_loss=0.09385, over 19158.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01055, ecapa_loss=0.0001477, whisper_loss=0.09109, over 3912454.94 frames. ], batch size: 79, lr: 2.66e-03, grad_scale: 1.152921504606847e+18 2024-08-17 15:18:51,061 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3378500.0, ans=0.125 2024-08-17 15:19:05,907 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 25 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-17 15:19:12,522 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-17 15:19:19,641 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 16 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-17 15:19:28,424 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.739e+01 2.390e+01 2.604e+01 2.871e+01 4.556e+01, threshold=5.208e+01, percent-clipped=0.0 2024-08-17 15:19:34,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3378900.0, ans=0.0 2024-08-17 15:19:37,711 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 19 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-17 15:19:39,060 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-17 15:19:43,541 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 5900, loss[loss=0.09937, beats_loss=0.01208, ecapa_loss=0.0001189, whisper_loss=0.0861, over 22665.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0106, ecapa_loss=0.0001465, whisper_loss=0.09091, over 3928954.62 frames. ], batch size: 88, lr: 2.66e-03, grad_scale: 1.152921504606847e+18 2024-08-17 15:19:43,651 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-17 15:19:45,034 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3379000.0, ans=0.1 2024-08-17 15:19:55,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3379100.0, ans=0.125 2024-08-17 15:20:00,253 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3379100.0, ans=0.0 2024-08-17 15:20:29,727 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-17 15:20:30,395 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.69 vs. limit=15.0 2024-08-17 15:20:45,989 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 5950, loss[loss=0.1058, beats_loss=0.01029, ecapa_loss=0.000142, whisper_loss=0.09414, over 16081.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01059, ecapa_loss=0.0001473, whisper_loss=0.09053, over 3932651.67 frames. ], batch size: 62, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:20:52,193 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 30 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-17 15:21:14,546 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.62 vs. limit=15.0 2024-08-17 15:21:23,042 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 31 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-17 15:21:30,512 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3379800.0, ans=0.2 2024-08-17 15:21:35,231 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.244e+01 2.483e+01 2.977e+01 4.324e+01, threshold=4.966e+01, percent-clipped=0.0 2024-08-17 15:21:49,208 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 6000, loss[loss=0.103, beats_loss=0.009808, ecapa_loss=0.0001366, whisper_loss=0.09184, over 15483.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01046, ecapa_loss=0.0001476, whisper_loss=0.09072, over 3878534.68 frames. ], batch size: 61, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:21:49,208 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-17 15:22:23,010 INFO [train_multi_KD3.py:1149] (2/4) Epoch 23, validation on ASR_libri: loss=0.2521, beats_loss=0, ecapa_loss=0.0005335, whisper_loss=0.2467, over 922467.00 frames. 2024-08-17 15:22:37,684 INFO [train_multi_KD3.py:1149] (2/4) Epoch 23, validation on SV_voxceleb1: loss=0.004145, beats_loss=0, ecapa_loss=0.0004145, whisper_loss=0, over 939242.00 frames. 2024-08-17 15:24:12,942 INFO [train_multi_KD3.py:1149] (2/4) Epoch 23, validation on AT_audioset: loss=0.02339, beats_loss=0.02339, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-17 15:24:12,945 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31584MB 2024-08-17 15:24:27,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3380100.0, ans=0.07 2024-08-17 15:24:33,982 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.99 vs. limit=10.0 2024-08-17 15:24:36,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3380100.0, ans=0.0 2024-08-17 15:24:45,263 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3380200.0, ans=0.2 2024-08-17 15:25:06,506 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3380400.0, ans=0.125 2024-08-17 15:25:10,532 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3380400.0, ans=0.125 2024-08-17 15:25:18,883 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3380500.0, ans=0.1 2024-08-17 15:25:19,735 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 6050, loss[loss=0.1427, beats_loss=0.009105, ecapa_loss=0.0001292, whisper_loss=0.1323, over 19266.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01049, ecapa_loss=0.000148, whisper_loss=0.09014, over 3862200.44 frames. ], batch size: 71, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:25:24,138 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3380500.0, ans=0.125 2024-08-17 15:25:40,976 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.36 vs. limit=15.0 2024-08-17 15:25:46,316 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3380700.0, ans=0.1 2024-08-17 15:25:52,388 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-17 15:26:02,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3380800.0, ans=0.125 2024-08-17 15:26:13,054 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.302e+01 2.473e+01 2.788e+01 3.926e+01, threshold=4.946e+01, percent-clipped=0.0 2024-08-17 15:26:17,357 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 12 from Vox, 40 fro AS 2024-08-17 15:26:28,373 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 6100, loss[loss=0.104, beats_loss=0.01034, ecapa_loss=0.0001388, whisper_loss=0.09232, over 21480.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01056, ecapa_loss=0.0001478, whisper_loss=0.09036, over 3878923.60 frames. ], batch size: 84, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:26:43,719 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.54 vs. limit=15.0 2024-08-17 15:26:55,264 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 35 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-17 15:26:56,724 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3381200.0, ans=0.1 2024-08-17 15:27:01,679 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.18 vs. limit=15.0 2024-08-17 15:27:03,503 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-17 15:27:07,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3381300.0, ans=0.0 2024-08-17 15:27:21,193 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3381400.0, ans=0.0 2024-08-17 15:27:26,184 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 20 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-17 15:27:29,849 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3381400.0, ans=0.125 2024-08-17 15:27:36,383 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.47 vs. limit=15.0 2024-08-17 15:27:36,744 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 6150, loss[loss=0.1011, beats_loss=0.01162, ecapa_loss=0.0001506, whisper_loss=0.088, over 23286.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01054, ecapa_loss=0.000147, whisper_loss=0.09064, over 3872074.35 frames. ], batch size: 93, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:27:37,717 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-17 15:27:49,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3381500.0, ans=0.125 2024-08-17 15:27:56,585 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-17 15:28:00,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3381600.0, ans=0.1 2024-08-17 15:28:11,798 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.55 vs. limit=10.0 2024-08-17 15:28:16,316 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3381700.0, ans=0.025 2024-08-17 15:28:18,211 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3381700.0, ans=0.125 2024-08-17 15:28:42,866 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3381800.0, ans=0.2 2024-08-17 15:28:45,124 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.233e+01 2.467e+01 2.726e+01 4.750e+01, threshold=4.934e+01, percent-clipped=0.0 2024-08-17 15:28:48,256 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3381900.0, ans=0.2 2024-08-17 15:29:04,949 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 6200, loss[loss=0.09496, beats_loss=0.01072, ecapa_loss=0.0001519, whisper_loss=0.08272, over 14730.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01049, ecapa_loss=0.0001484, whisper_loss=0.09008, over 3861971.86 frames. ], batch size: 58, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:30:04,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3382300.0, ans=0.0 2024-08-17 15:30:24,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3382400.0, ans=0.125 2024-08-17 15:30:24,498 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3382400.0, ans=0.0 2024-08-17 15:30:37,476 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 6250, loss[loss=0.09546, beats_loss=0.01004, ecapa_loss=0.0001586, whisper_loss=0.08383, over 13630.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01047, ecapa_loss=0.0001486, whisper_loss=0.08976, over 3839565.17 frames. ], batch size: 58, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:31:01,390 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3382600.0, ans=0.0 2024-08-17 15:31:05,309 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3382600.0, ans=0.125 2024-08-17 15:31:07,875 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.85 vs. limit=15.0 2024-08-17 15:31:25,756 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3382700.0, ans=0.0 2024-08-17 15:31:48,997 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3382800.0, ans=0.0 2024-08-17 15:31:49,606 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.59 vs. limit=22.5 2024-08-17 15:31:53,970 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.680e+01 2.369e+01 2.606e+01 2.979e+01 4.813e+02, threshold=5.211e+01, percent-clipped=2.0 2024-08-17 15:32:07,999 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 37 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-17 15:32:16,282 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 6300, loss[loss=0.1086, beats_loss=0.01137, ecapa_loss=0.0001499, whisper_loss=0.09573, over 20868.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01047, ecapa_loss=0.0001497, whisper_loss=0.09006, over 3864936.39 frames. ], batch size: 81, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:32:24,771 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 20 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-17 15:32:56,537 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3383200.0, ans=0.125 2024-08-17 15:33:11,084 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-17 15:33:18,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3383300.0, ans=0.0 2024-08-17 15:33:21,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3383300.0, ans=0.0 2024-08-17 15:33:27,210 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.92 vs. limit=12.0 2024-08-17 15:33:28,211 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-17 15:33:50,724 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.81 vs. limit=15.0 2024-08-17 15:33:51,352 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 6350, loss[loss=0.09304, beats_loss=0.009791, ecapa_loss=0.0001732, whisper_loss=0.08152, over 20943.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01045, ecapa_loss=0.0001512, whisper_loss=0.08996, over 3868463.82 frames. ], batch size: 89, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:33:53,956 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 15:34:10,031 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3383600.0, ans=0.1 2024-08-17 15:34:16,276 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.24 vs. limit=15.0 2024-08-17 15:34:16,834 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-17 15:34:51,026 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.290e+01 2.555e+01 2.760e+01 4.440e+01, threshold=5.110e+01, percent-clipped=0.0 2024-08-17 15:34:56,023 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.95 vs. limit=22.5 2024-08-17 15:35:04,035 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 29 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-17 15:35:07,416 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 6400, loss[loss=0.1001, beats_loss=0.01056, ecapa_loss=0.0001747, whisper_loss=0.08784, over 15924.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01044, ecapa_loss=0.0001496, whisper_loss=0.09033, over 3867575.10 frames. ], batch size: 65, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:35:32,008 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-17 15:35:37,420 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3384100.0, ans=0.0 2024-08-17 15:36:12,045 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3384300.0, ans=0.0 2024-08-17 15:36:19,683 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-17 15:36:30,993 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 6450, loss[loss=0.1031, beats_loss=0.009742, ecapa_loss=0.0001419, whisper_loss=0.0919, over 20837.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01038, ecapa_loss=0.0001503, whisper_loss=0.09055, over 3866988.32 frames. ], batch size: 82, lr: 2.65e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:37:08,819 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.13 vs. limit=22.5 2024-08-17 15:37:35,802 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 28 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-17 15:37:36,307 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3384800.0, ans=0.0 2024-08-17 15:37:37,323 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.419e+01 2.570e+01 2.770e+01 6.732e+01, threshold=5.141e+01, percent-clipped=1.0 2024-08-17 15:37:38,201 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3384900.0, ans=0.125 2024-08-17 15:37:53,987 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 36 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-17 15:37:56,847 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 6500, loss[loss=0.1069, beats_loss=0.009866, ecapa_loss=0.0001554, whisper_loss=0.09552, over 22382.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01046, ecapa_loss=0.0001489, whisper_loss=0.09082, over 3862082.66 frames. ], batch size: 91, lr: 2.65e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:38:00,424 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 17 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-17 15:38:03,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3385000.0, ans=0.2 2024-08-17 15:38:15,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3385100.0, ans=0.125 2024-08-17 15:38:16,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3385100.0, ans=0.0 2024-08-17 15:38:40,143 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.54 vs. limit=15.0 2024-08-17 15:38:41,597 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.72 vs. limit=15.0 2024-08-17 15:39:01,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3385300.0, ans=0.125 2024-08-17 15:39:04,602 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-17 15:39:10,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3385400.0, ans=0.125 2024-08-17 15:39:23,350 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 6550, loss[loss=0.1059, beats_loss=0.009495, ecapa_loss=0.0001471, whisper_loss=0.09492, over 21206.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01049, ecapa_loss=0.0001489, whisper_loss=0.09064, over 3853454.10 frames. ], batch size: 84, lr: 2.65e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:39:31,492 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3385500.0, ans=0.07 2024-08-17 15:39:37,110 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 35 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-17 15:39:51,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3385600.0, ans=0.1 2024-08-17 15:40:00,170 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3385700.0, ans=0.125 2024-08-17 15:40:02,921 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 31 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-17 15:40:19,380 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3385800.0, ans=0.1 2024-08-17 15:40:27,175 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.807e+01 2.341e+01 2.572e+01 2.842e+01 4.504e+01, threshold=5.144e+01, percent-clipped=0.0 2024-08-17 15:40:29,788 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3385900.0, ans=0.125 2024-08-17 15:40:47,881 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 6600, loss[loss=0.1075, beats_loss=0.01112, ecapa_loss=0.0001623, whisper_loss=0.09478, over 22560.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0106, ecapa_loss=0.0001484, whisper_loss=0.09054, over 3864402.81 frames. ], batch size: 93, lr: 2.65e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:40:57,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3386000.0, ans=0.125 2024-08-17 15:41:20,920 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 17 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-17 15:41:22,288 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3386100.0, ans=0.1 2024-08-17 15:41:30,741 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-17 15:41:57,748 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 25 from LS+wenet, 19 from Vox, 17 fro AS 2024-08-17 15:42:25,573 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3386500.0, ans=0.1 2024-08-17 15:42:26,170 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=19.98 vs. limit=22.5 2024-08-17 15:42:26,520 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 6650, loss[loss=0.1025, beats_loss=0.009216, ecapa_loss=0.0002103, whisper_loss=0.09123, over 13165.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01056, ecapa_loss=0.0001491, whisper_loss=0.09069, over 3808647.05 frames. ], batch size: 58, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 15:42:35,755 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-17 15:42:39,727 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-17 15:42:41,649 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 22 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-17 15:43:06,775 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.30 vs. limit=10.0 2024-08-17 15:43:16,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3386700.0, ans=0.1 2024-08-17 15:43:47,257 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.597e+01 2.309e+01 2.525e+01 2.864e+01 3.753e+01, threshold=5.050e+01, percent-clipped=0.0 2024-08-17 15:43:48,279 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3386900.0, ans=0.1 2024-08-17 15:43:49,229 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 31 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-17 15:44:04,257 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 6700, loss[loss=0.1065, beats_loss=0.01009, ecapa_loss=0.0001688, whisper_loss=0.09474, over 19025.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01061, ecapa_loss=0.0001486, whisper_loss=0.08975, over 3804296.05 frames. ], batch size: 79, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 15:44:12,588 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-17 15:44:14,690 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-17 15:44:32,868 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-17 15:44:47,326 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=3387200.0, ans=0.02 2024-08-17 15:44:50,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3387200.0, ans=0.1 2024-08-17 15:45:11,833 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 11 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-17 15:45:23,365 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3387400.0, ans=0.1 2024-08-17 15:45:34,934 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 6750, loss[loss=0.1072, beats_loss=0.01252, ecapa_loss=0.0001545, whisper_loss=0.09317, over 20119.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01066, ecapa_loss=0.0001477, whisper_loss=0.09014, over 3863549.85 frames. ], batch size: 82, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 15:46:24,325 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3387700.0, ans=0.125 2024-08-17 15:46:25,819 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 16 from Vox, 48 fro AS 2024-08-17 15:46:42,306 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3387800.0, ans=0.1 2024-08-17 15:46:46,800 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-17 15:46:52,987 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-17 15:46:54,504 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.393e+01 2.641e+01 2.928e+01 4.492e+01, threshold=5.282e+01, percent-clipped=0.0 2024-08-17 15:47:03,947 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 32 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-17 15:47:09,755 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.87 vs. limit=22.5 2024-08-17 15:47:14,560 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 6800, loss[loss=0.07869, beats_loss=0.01259, ecapa_loss=0.0001342, whisper_loss=0.06476, over 15364.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01061, ecapa_loss=0.0001484, whisper_loss=0.09024, over 3880559.72 frames. ], batch size: 61, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 15:47:24,572 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 28 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-17 15:47:31,342 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 27 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-17 15:47:37,448 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 29 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-17 15:47:54,713 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-17 15:48:18,639 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3388300.0, ans=0.1 2024-08-17 15:48:31,000 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 29 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-17 15:48:40,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3388400.0, ans=0.125 2024-08-17 15:48:53,432 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 6850, loss[loss=0.09448, beats_loss=0.01008, ecapa_loss=0.0001646, whisper_loss=0.08275, over 21654.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01064, ecapa_loss=0.0001485, whisper_loss=0.08962, over 3881316.01 frames. ], batch size: 90, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 15:49:03,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3388500.0, ans=0.0 2024-08-17 15:49:15,722 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.21 vs. limit=6.0 2024-08-17 15:49:39,263 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 27 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-17 15:49:41,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3388700.0, ans=0.1 2024-08-17 15:49:56,407 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=3388800.0, ans=15.0 2024-08-17 15:49:57,221 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 29 from LS+wenet, 20 from Vox, 13 fro AS 2024-08-17 15:50:02,373 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3388800.0, ans=0.1 2024-08-17 15:50:02,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3388800.0, ans=0.09899494936611666 2024-08-17 15:50:06,447 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.332e+01 2.534e+01 2.809e+01 1.632e+02, threshold=5.068e+01, percent-clipped=1.0 2024-08-17 15:50:26,225 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3389000.0, ans=0.125 2024-08-17 15:50:27,107 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 6900, loss[loss=0.09429, beats_loss=0.01039, ecapa_loss=0.0001441, whisper_loss=0.08246, over 20887.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01055, ecapa_loss=0.0001488, whisper_loss=0.09006, over 3872311.75 frames. ], batch size: 84, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 15:50:29,723 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3389000.0, ans=0.125 2024-08-17 15:50:35,391 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 26 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-17 15:51:05,612 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-08-17 15:51:13,969 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3389200.0, ans=0.125 2024-08-17 15:51:22,784 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3389200.0, ans=0.1 2024-08-17 15:52:08,036 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 6950, loss[loss=0.07987, beats_loss=0.009927, ecapa_loss=0.0001474, whisper_loss=0.06847, over 17499.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01052, ecapa_loss=0.0001479, whisper_loss=0.0907, over 3896663.79 frames. ], batch size: 71, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 15:52:19,306 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.52 vs. limit=15.0 2024-08-17 15:52:22,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3389500.0, ans=0.1 2024-08-17 15:52:31,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3389600.0, ans=0.125 2024-08-17 15:52:36,017 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3389600.0, ans=0.125 2024-08-17 15:52:40,851 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-17 15:52:41,433 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3389600.0, ans=0.0 2024-08-17 15:52:41,877 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.60 vs. limit=6.0 2024-08-17 15:52:50,543 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 16 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-17 15:52:57,847 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-17 15:53:02,328 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.85 vs. limit=15.0 2024-08-17 15:53:08,887 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 21 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-17 15:53:24,871 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.932e+01 2.373e+01 2.584e+01 2.825e+01 4.780e+01, threshold=5.168e+01, percent-clipped=0.0 2024-08-17 15:53:30,169 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-17 15:53:34,417 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.17 vs. limit=10.0 2024-08-17 15:53:35,832 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3389900.0, ans=0.1 2024-08-17 15:53:41,239 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 7000, loss[loss=0.107, beats_loss=0.01108, ecapa_loss=0.0001695, whisper_loss=0.09421, over 22275.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01052, ecapa_loss=0.0001485, whisper_loss=0.09071, over 3898554.21 frames. ], batch size: 92, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 15:53:50,033 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 35 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-17 15:53:59,818 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3390100.0, ans=0.0 2024-08-17 15:54:05,648 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 23 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-17 15:54:11,228 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.85 vs. limit=15.0 2024-08-17 15:54:26,047 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 14 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-17 15:54:30,123 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3390300.0, ans=0.125 2024-08-17 15:54:33,862 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 24 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-17 15:55:00,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3390400.0, ans=0.1 2024-08-17 15:55:06,675 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 7050, loss[loss=0.0723, beats_loss=0.01189, ecapa_loss=0.0001483, whisper_loss=0.05893, over 18454.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01053, ecapa_loss=0.0001478, whisper_loss=0.0909, over 3876318.25 frames. ], batch size: 76, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 15:55:09,928 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3390500.0, ans=0.0 2024-08-17 15:55:13,214 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3390500.0, ans=0.0 2024-08-17 15:55:26,311 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3390600.0, ans=0.05 2024-08-17 15:56:16,184 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.336e+01 2.552e+01 2.787e+01 4.135e+01, threshold=5.105e+01, percent-clipped=0.0 2024-08-17 15:56:18,821 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3390900.0, ans=0.125 2024-08-17 15:56:20,634 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3390900.0, ans=0.1 2024-08-17 15:56:32,497 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 7100, loss[loss=0.1081, beats_loss=0.008764, ecapa_loss=0.0002049, whisper_loss=0.0973, over 16133.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01063, ecapa_loss=0.0001471, whisper_loss=0.08995, over 3832108.01 frames. ], batch size: 68, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 15:56:33,226 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.47 vs. limit=10.0 2024-08-17 15:56:38,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3391000.0, ans=0.0 2024-08-17 15:56:46,218 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3391000.0, ans=0.0 2024-08-17 15:57:11,035 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-17 15:57:18,731 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 12 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-17 15:57:21,948 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 34 from Vox, 31 fro AS 2024-08-17 15:57:22,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3391200.0, ans=0.2 2024-08-17 15:57:23,492 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=3391300.0, ans=10.0 2024-08-17 15:57:28,960 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3391300.0, ans=0.125 2024-08-17 15:57:31,858 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3391300.0, ans=0.09899494936611666 2024-08-17 15:57:40,039 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3391300.0, ans=0.07 2024-08-17 15:57:49,185 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 20 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-17 15:57:49,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3391400.0, ans=0.125 2024-08-17 15:57:59,304 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 7150, loss[loss=0.1111, beats_loss=0.008358, ecapa_loss=0.000151, whisper_loss=0.1013, over 15422.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01064, ecapa_loss=0.000149, whisper_loss=0.08944, over 3816474.98 frames. ], batch size: 59, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 15:58:08,522 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.76 vs. limit=15.0 2024-08-17 15:58:22,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3391600.0, ans=0.125 2024-08-17 15:58:29,849 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.74 vs. limit=12.0 2024-08-17 15:58:38,471 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.23 vs. limit=15.0 2024-08-17 15:58:41,303 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.86 vs. limit=15.0 2024-08-17 15:59:01,022 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 22 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-17 15:59:04,147 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.298e+01 2.545e+01 2.788e+01 4.387e+02, threshold=5.090e+01, percent-clipped=2.0 2024-08-17 15:59:04,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3391900.0, ans=0.125 2024-08-17 15:59:10,742 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 28 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-17 15:59:12,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3391900.0, ans=0.2 2024-08-17 15:59:20,140 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 7200, loss[loss=0.1013, beats_loss=0.01266, ecapa_loss=0.0001117, whisper_loss=0.0875, over 24395.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0106, ecapa_loss=0.0001502, whisper_loss=0.0894, over 3848356.33 frames. ], batch size: 94, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 15:59:36,918 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3392100.0, ans=0.125 2024-08-17 16:00:21,045 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 27 from Vox, 24 fro AS 2024-08-17 16:00:25,758 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-17 16:00:30,685 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3392400.0, ans=0.0 2024-08-17 16:00:33,311 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3392400.0, ans=0.125 2024-08-17 16:00:40,437 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-17 16:00:41,775 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 7250, loss[loss=0.1078, beats_loss=0.009156, ecapa_loss=0.0001311, whisper_loss=0.09729, over 22437.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01053, ecapa_loss=0.0001493, whisper_loss=0.09011, over 3879465.87 frames. ], batch size: 87, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:00:46,506 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.72 vs. limit=6.0 2024-08-17 16:00:58,281 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 34 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-17 16:01:00,838 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3392600.0, ans=0.1 2024-08-17 16:01:02,461 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-17 16:01:02,811 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3392600.0, ans=0.125 2024-08-17 16:01:25,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3392700.0, ans=0.1 2024-08-17 16:01:34,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3392800.0, ans=0.125 2024-08-17 16:01:41,420 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2024-08-17 16:01:47,475 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.715e+01 2.301e+01 2.611e+01 2.944e+01 3.835e+01, threshold=5.222e+01, percent-clipped=0.0 2024-08-17 16:01:49,911 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3392900.0, ans=0.125 2024-08-17 16:01:51,076 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 22 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-17 16:02:03,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3393000.0, ans=0.125 2024-08-17 16:02:05,198 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 7300, loss[loss=0.1168, beats_loss=0.009736, ecapa_loss=0.0001755, whisper_loss=0.1053, over 22038.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01055, ecapa_loss=0.0001484, whisper_loss=0.0905, over 3862697.40 frames. ], batch size: 89, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:02:24,111 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3393100.0, ans=0.125 2024-08-17 16:02:29,264 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.88 vs. limit=15.0 2024-08-17 16:02:37,504 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3393200.0, ans=0.2 2024-08-17 16:02:45,890 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 30 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-17 16:02:52,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3393200.0, ans=0.0 2024-08-17 16:03:14,722 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3393400.0, ans=0.125 2024-08-17 16:03:26,430 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3393500.0, ans=0.0 2024-08-17 16:03:27,563 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 7350, loss[loss=0.114, beats_loss=0.009755, ecapa_loss=0.000165, whisper_loss=0.1026, over 16424.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01064, ecapa_loss=0.0001468, whisper_loss=0.09079, over 3869284.92 frames. ], batch size: 65, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:03:51,516 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3393600.0, ans=0.125 2024-08-17 16:03:54,825 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.87 vs. limit=15.0 2024-08-17 16:04:00,400 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3393700.0, ans=0.0 2024-08-17 16:04:02,918 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-17 16:04:03,498 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.72 vs. limit=15.0 2024-08-17 16:04:20,549 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 27 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-17 16:04:24,450 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3393800.0, ans=0.2 2024-08-17 16:04:31,283 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.343e+01 2.608e+01 3.088e+01 3.317e+02, threshold=5.216e+01, percent-clipped=4.0 2024-08-17 16:04:43,233 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-17 16:04:46,833 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 7400, loss[loss=0.1083, beats_loss=0.01036, ecapa_loss=0.0001441, whisper_loss=0.09645, over 22492.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01065, ecapa_loss=0.0001474, whisper_loss=0.09084, over 3875486.60 frames. ], batch size: 88, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:04:46,970 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 20 from LS+wenet, 31 from Vox, 43 fro AS 2024-08-17 16:04:50,470 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.13 vs. limit=15.0 2024-08-17 16:05:02,120 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.41 vs. limit=6.0 2024-08-17 16:05:12,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3394100.0, ans=0.0 2024-08-17 16:05:21,554 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 28 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-17 16:05:30,341 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-17 16:05:35,077 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.92 vs. limit=15.0 2024-08-17 16:06:12,539 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 7450, loss[loss=0.09127, beats_loss=0.01148, ecapa_loss=0.0001608, whisper_loss=0.07817, over 14247.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0106, ecapa_loss=0.0001466, whisper_loss=0.09104, over 3879929.55 frames. ], batch size: 60, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:06:23,232 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 37 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-17 16:06:39,108 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 21 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-17 16:06:40,040 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.10 vs. limit=15.0 2024-08-17 16:06:51,256 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.134e+00 2024-08-17 16:06:52,631 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-17 16:06:55,710 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3394700.0, ans=0.07 2024-08-17 16:06:56,783 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 20 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-17 16:06:59,041 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-17 16:07:19,192 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3394800.0, ans=0.2 2024-08-17 16:07:22,048 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.262e+01 2.481e+01 2.703e+01 3.740e+01, threshold=4.961e+01, percent-clipped=0.0 2024-08-17 16:07:24,647 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3394900.0, ans=0.0 2024-08-17 16:07:32,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3394900.0, ans=0.125 2024-08-17 16:07:38,539 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 7500, loss[loss=0.1105, beats_loss=0.01018, ecapa_loss=0.0001925, whisper_loss=0.09838, over 21776.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01057, ecapa_loss=0.0001471, whisper_loss=0.09128, over 3893182.31 frames. ], batch size: 93, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:07:48,989 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3395000.0, ans=0.1 2024-08-17 16:07:51,810 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3395000.0, ans=0.125 2024-08-17 16:08:03,713 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3395100.0, ans=0.1 2024-08-17 16:08:14,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3395200.0, ans=0.0 2024-08-17 16:08:18,546 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 41 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-17 16:08:22,685 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3395200.0, ans=0.125 2024-08-17 16:08:28,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3395300.0, ans=0.1 2024-08-17 16:08:41,806 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 33 from Vox, 25 fro AS 2024-08-17 16:08:42,046 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3395300.0, ans=0.125 2024-08-17 16:09:04,620 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 7550, loss[loss=0.1212, beats_loss=0.008085, ecapa_loss=0.0001535, whisper_loss=0.1116, over 22972.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01051, ecapa_loss=0.0001473, whisper_loss=0.09228, over 3938908.16 frames. ], batch size: 90, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:09:19,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3395500.0, ans=0.0 2024-08-17 16:09:19,829 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3395500.0, ans=10.0 2024-08-17 16:09:37,402 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 27 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-17 16:09:37,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3395700.0, ans=0.125 2024-08-17 16:09:46,642 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-17 16:09:46,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3395700.0, ans=0.125 2024-08-17 16:09:51,485 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-17 16:09:53,899 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3395800.0, ans=0.0 2024-08-17 16:10:09,641 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.390e+01 2.660e+01 3.033e+01 1.573e+02, threshold=5.320e+01, percent-clipped=3.0 2024-08-17 16:10:24,658 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 7600, loss[loss=0.1153, beats_loss=0.009196, ecapa_loss=0.0001401, whisper_loss=0.1047, over 23859.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01056, ecapa_loss=0.0001471, whisper_loss=0.0916, over 3955688.16 frames. ], batch size: 91, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:10:28,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3396000.0, ans=0.2 2024-08-17 16:10:46,442 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3396100.0, ans=0.2 2024-08-17 16:10:55,782 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3396200.0, ans=0.1 2024-08-17 16:10:58,770 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 19 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-17 16:11:00,355 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3396200.0, ans=0.09899494936611666 2024-08-17 16:11:08,072 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3396200.0, ans=0.125 2024-08-17 16:11:12,581 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 21 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-17 16:11:30,639 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 15 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-17 16:11:40,162 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-17 16:11:41,212 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 7650, loss[loss=0.1127, beats_loss=0.01043, ecapa_loss=0.0001692, whisper_loss=0.1006, over 17592.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.0105, ecapa_loss=0.0001472, whisper_loss=0.0919, over 3954061.77 frames. ], batch size: 73, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:11:52,395 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-17 16:11:52,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3396500.0, ans=0.125 2024-08-17 16:12:02,956 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3396600.0, ans=0.125 2024-08-17 16:12:04,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3396600.0, ans=0.125 2024-08-17 16:12:19,217 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3396700.0, ans=0.07 2024-08-17 16:12:41,291 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.270e+01 2.469e+01 2.738e+01 5.063e+01, threshold=4.938e+01, percent-clipped=0.0 2024-08-17 16:12:43,492 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3396900.0, ans=0.0 2024-08-17 16:12:56,592 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 7700, loss[loss=0.07267, beats_loss=0.01165, ecapa_loss=0.000149, whisper_loss=0.05953, over 15394.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01052, ecapa_loss=0.0001483, whisper_loss=0.0913, over 3944572.09 frames. ], batch size: 62, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:13:09,037 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3397000.0, ans=0.1 2024-08-17 16:13:10,800 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.27 vs. limit=15.0 2024-08-17 16:13:17,306 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-17 16:13:17,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3397100.0, ans=0.0 2024-08-17 16:13:26,644 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3397200.0, ans=0.0 2024-08-17 16:13:27,807 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3397200.0, ans=0.125 2024-08-17 16:13:30,827 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3397200.0, ans=0.125 2024-08-17 16:13:31,091 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=3397200.0, ans=10.0 2024-08-17 16:13:43,104 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3397300.0, ans=0.125 2024-08-17 16:13:44,756 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3397300.0, ans=0.125 2024-08-17 16:13:46,163 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3397300.0, ans=0.125 2024-08-17 16:13:52,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3397300.0, ans=0.125 2024-08-17 16:13:56,747 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3397400.0, ans=0.1 2024-08-17 16:14:02,108 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3397400.0, ans=0.125 2024-08-17 16:14:03,545 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-17 16:14:09,396 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 7750, loss[loss=0.1088, beats_loss=0.008999, ecapa_loss=0.0001809, whisper_loss=0.09797, over 21916.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01048, ecapa_loss=0.0001494, whisper_loss=0.09099, over 3945821.85 frames. ], batch size: 89, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:14:27,142 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 23 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-17 16:14:30,430 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3397600.0, ans=0.125 2024-08-17 16:14:30,918 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.73 vs. limit=15.0 2024-08-17 16:14:37,140 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 23 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-17 16:14:42,997 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 20 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-17 16:14:45,743 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 41 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-17 16:14:46,190 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=15.0 2024-08-17 16:14:47,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3397700.0, ans=0.125 2024-08-17 16:14:49,145 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3397700.0, ans=0.95 2024-08-17 16:14:55,918 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-17 16:15:02,536 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 16 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-17 16:15:06,090 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.283e+01 2.591e+01 2.932e+01 1.157e+02, threshold=5.183e+01, percent-clipped=2.0 2024-08-17 16:15:20,526 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 7800, loss[loss=0.08646, beats_loss=0.01029, ecapa_loss=0.0001377, whisper_loss=0.0748, over 15106.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01046, ecapa_loss=0.0001478, whisper_loss=0.09133, over 3928771.04 frames. ], batch size: 56, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:15:20,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3398000.0, ans=0.1 2024-08-17 16:15:25,382 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3398000.0, ans=0.125 2024-08-17 16:15:26,614 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 16:15:40,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3398100.0, ans=0.125 2024-08-17 16:15:41,256 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 21 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-17 16:15:56,987 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3398200.0, ans=0.0 2024-08-17 16:15:59,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3398200.0, ans=0.0 2024-08-17 16:16:33,287 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 7850, loss[loss=0.1053, beats_loss=0.01027, ecapa_loss=0.0001429, whisper_loss=0.09359, over 14594.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01047, ecapa_loss=0.0001487, whisper_loss=0.09165, over 3896571.87 frames. ], batch size: 55, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:16:43,475 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-17 16:16:51,362 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-17 16:17:15,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=3398700.0, ans=0.5 2024-08-17 16:17:26,765 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 10 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-17 16:17:36,817 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 23 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-17 16:17:49,832 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.263e+01 2.535e+01 2.863e+01 4.043e+01, threshold=5.070e+01, percent-clipped=0.0 2024-08-17 16:18:11,868 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 18 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-17 16:18:13,655 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 7900, loss[loss=0.08073, beats_loss=0.01356, ecapa_loss=0.0001436, whisper_loss=0.06574, over 19990.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01053, ecapa_loss=0.0001481, whisper_loss=0.09101, over 3913889.30 frames. ], batch size: 81, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:18:37,890 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-17 16:18:40,334 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.55 vs. limit=15.0 2024-08-17 16:18:45,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3399100.0, ans=0.125 2024-08-17 16:18:53,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3399100.0, ans=0.0 2024-08-17 16:18:56,132 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.68 vs. limit=15.0 2024-08-17 16:18:58,517 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-17 16:19:02,081 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3399200.0, ans=0.125 2024-08-17 16:19:34,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3399300.0, ans=0.0 2024-08-17 16:19:35,676 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3399300.0, ans=0.0 2024-08-17 16:19:55,885 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 7950, loss[loss=0.1083, beats_loss=0.008813, ecapa_loss=0.0001573, whisper_loss=0.09791, over 23323.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01057, ecapa_loss=0.0001483, whisper_loss=0.09079, over 3910268.14 frames. ], batch size: 91, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:20:01,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3399500.0, ans=0.125 2024-08-17 16:20:04,204 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3399500.0, ans=0.1 2024-08-17 16:20:20,426 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 16 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-17 16:20:26,501 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3399600.0, ans=0.125 2024-08-17 16:20:50,089 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 13 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-17 16:21:09,963 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.02 vs. limit=22.5 2024-08-17 16:21:13,573 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3399800.0, ans=0.0 2024-08-17 16:21:20,338 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.432e+01 2.654e+01 2.954e+01 3.124e+02, threshold=5.307e+01, percent-clipped=2.0 2024-08-17 16:21:28,277 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3399900.0, ans=0.0 2024-08-17 16:21:30,355 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 29 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-17 16:21:39,159 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 21 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-17 16:21:44,313 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 8000, loss[loss=0.0952, beats_loss=0.01252, ecapa_loss=0.0001179, whisper_loss=0.0815, over 19904.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01055, ecapa_loss=0.0001475, whisper_loss=0.09113, over 3910681.41 frames. ], batch size: 79, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:21:49,760 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 17 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-17 16:21:49,918 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3400000.0, ans=0.2 2024-08-17 16:21:52,716 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.45 vs. limit=15.0 2024-08-17 16:22:09,853 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3400100.0, ans=0.2 2024-08-17 16:22:54,497 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3400300.0, ans=0.125 2024-08-17 16:23:13,962 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 20 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-17 16:23:23,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3400400.0, ans=0.125 2024-08-17 16:23:24,375 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-17 16:23:30,265 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 8050, loss[loss=0.08057, beats_loss=0.01324, ecapa_loss=0.0001499, whisper_loss=0.06583, over 22493.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01054, ecapa_loss=0.0001475, whisper_loss=0.09037, over 3867164.14 frames. ], batch size: 96, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:23:40,784 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 14 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-17 16:24:00,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3400600.0, ans=0.0 2024-08-17 16:24:10,760 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 28 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-17 16:24:17,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3400700.0, ans=0.125 2024-08-17 16:24:22,069 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-17 16:24:24,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3400700.0, ans=0.2 2024-08-17 16:24:24,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3400700.0, ans=0.125 2024-08-17 16:24:40,494 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.40 vs. limit=15.0 2024-08-17 16:24:52,758 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.730e+01 2.357e+01 2.561e+01 2.862e+01 4.337e+01, threshold=5.122e+01, percent-clipped=0.0 2024-08-17 16:24:52,977 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 13 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-17 16:25:02,810 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3400900.0, ans=0.125 2024-08-17 16:25:08,457 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 8100, loss[loss=0.1107, beats_loss=0.009065, ecapa_loss=0.0001447, whisper_loss=0.1002, over 15685.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01063, ecapa_loss=0.0001471, whisper_loss=0.09015, over 3889692.21 frames. ], batch size: 59, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:25:11,536 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-17 16:25:12,275 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3401000.0, ans=0.1 2024-08-17 16:25:17,460 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3401000.0, ans=0.125 2024-08-17 16:25:23,420 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3401100.0, ans=0.1 2024-08-17 16:25:30,808 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-17 16:25:32,628 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3401100.0, ans=0.2 2024-08-17 16:25:47,389 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-17 16:25:53,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3401300.0, ans=0.125 2024-08-17 16:25:53,908 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-17 16:26:01,764 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-17 16:26:03,304 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3401400.0, ans=0.125 2024-08-17 16:26:06,936 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-17 16:26:09,620 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 16 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-17 16:26:13,424 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 8150, loss[loss=0.1168, beats_loss=0.009578, ecapa_loss=0.000136, whisper_loss=0.1058, over 23070.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01064, ecapa_loss=0.0001461, whisper_loss=0.08977, over 3902085.01 frames. ], batch size: 90, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:27:00,069 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-17 16:27:05,510 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.914e+01 2.351e+01 2.682e+01 3.225e+01 8.305e+01, threshold=5.364e+01, percent-clipped=1.0 2024-08-17 16:27:09,527 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 27 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-17 16:27:11,998 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 23 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-17 16:27:15,099 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3401900.0, ans=0.1 2024-08-17 16:27:18,566 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 8200, loss[loss=0.1088, beats_loss=0.009557, ecapa_loss=0.0001472, whisper_loss=0.09782, over 22050.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01063, ecapa_loss=0.0001465, whisper_loss=0.08977, over 3876468.16 frames. ], batch size: 87, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:27:21,711 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3402000.0, ans=0.125 2024-08-17 16:27:45,085 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-17 16:27:45,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3402200.0, ans=0.125 2024-08-17 16:27:45,423 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=5.642e+00 2024-08-17 16:27:46,762 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.73 vs. limit=15.0 2024-08-17 16:27:48,032 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3402200.0, ans=0.0 2024-08-17 16:28:04,742 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3402300.0, ans=0.0 2024-08-17 16:28:04,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3402300.0, ans=0.1 2024-08-17 16:28:16,379 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 19 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-17 16:28:23,688 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 8250, loss[loss=0.0809, beats_loss=0.01071, ecapa_loss=0.0001893, whisper_loss=0.0683, over 20217.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01056, ecapa_loss=0.0001478, whisper_loss=0.0899, over 3896597.60 frames. ], batch size: 92, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:28:36,601 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 21 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-17 16:28:40,838 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3402600.0, ans=0.5 2024-08-17 16:28:47,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3402600.0, ans=0.125 2024-08-17 16:28:48,660 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 27 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-17 16:28:51,511 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 15 from Vox, 47 fro AS 2024-08-17 16:28:58,942 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-17 16:28:59,402 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=15.0 2024-08-17 16:29:08,773 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-17 16:29:11,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3402800.0, ans=0.0 2024-08-17 16:29:16,324 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.415e+01 2.647e+01 2.985e+01 4.296e+01, threshold=5.294e+01, percent-clipped=0.0 2024-08-17 16:29:29,099 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3403000.0, ans=0.0 2024-08-17 16:29:29,961 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 8300, loss[loss=0.1335, beats_loss=0.008845, ecapa_loss=0.0001601, whisper_loss=0.123, over 22630.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01059, ecapa_loss=0.0001475, whisper_loss=0.08994, over 3921724.70 frames. ], batch size: 91, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:29:31,549 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-17 16:29:37,226 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3403000.0, ans=0.125 2024-08-17 16:29:38,856 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.50 vs. limit=15.0 2024-08-17 16:29:41,489 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2.whitening_limit, batch_count=3403000.0, ans=15.0 2024-08-17 16:29:45,580 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 16:29:47,026 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3403100.0, ans=0.125 2024-08-17 16:30:01,227 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 24 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-17 16:30:13,457 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3403300.0, ans=0.2 2024-08-17 16:30:35,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3403500.0, ans=0.0 2024-08-17 16:30:36,003 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 8350, loss[loss=0.1007, beats_loss=0.009373, ecapa_loss=0.0001756, whisper_loss=0.08953, over 19781.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0106, ecapa_loss=0.0001481, whisper_loss=0.09063, over 3942055.65 frames. ], batch size: 79, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:30:43,884 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 16:30:53,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3403600.0, ans=0.2 2024-08-17 16:30:55,572 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3403600.0, ans=0.0 2024-08-17 16:31:00,358 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-17 16:31:12,282 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3403700.0, ans=0.1 2024-08-17 16:31:13,230 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 22 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-17 16:31:27,434 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.334e+01 2.617e+01 2.973e+01 3.819e+01, threshold=5.233e+01, percent-clipped=0.0 2024-08-17 16:31:28,251 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3403900.0, ans=0.1 2024-08-17 16:31:40,547 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 8400, loss[loss=0.08978, beats_loss=0.01463, ecapa_loss=0.0001013, whisper_loss=0.07413, over 23584.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01062, ecapa_loss=0.0001478, whisper_loss=0.09106, over 3941038.29 frames. ], batch size: 92, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:31:46,832 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 17 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-17 16:31:54,986 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-17 16:31:58,865 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-17 16:32:14,859 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-17 16:32:24,469 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.92 vs. limit=22.5 2024-08-17 16:32:35,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3404400.0, ans=0.0 2024-08-17 16:32:45,857 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 8450, loss[loss=0.1134, beats_loss=0.01271, ecapa_loss=0.0001319, whisper_loss=0.09932, over 21906.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01064, ecapa_loss=0.0001487, whisper_loss=0.09106, over 3954333.80 frames. ], batch size: 86, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:32:50,080 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 25 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-17 16:32:51,668 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3404500.0, ans=0.125 2024-08-17 16:32:55,240 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 31 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-17 16:33:05,986 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 31 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-17 16:33:12,857 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3404700.0, ans=0.1 2024-08-17 16:33:14,014 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-17 16:33:14,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3404700.0, ans=0.125 2024-08-17 16:33:18,963 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 16 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-17 16:33:23,058 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 22 from LS+wenet, 28 from Vox, 44 fro AS 2024-08-17 16:33:38,310 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.287e+01 2.644e+01 3.078e+01 2.118e+02, threshold=5.288e+01, percent-clipped=3.0 2024-08-17 16:33:42,767 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.37 vs. limit=10.0 2024-08-17 16:33:43,483 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3404900.0, ans=0.0 2024-08-17 16:33:49,175 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3404900.0, ans=0.1 2024-08-17 16:33:52,467 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 8500, loss[loss=0.113, beats_loss=0.009523, ecapa_loss=0.000182, whisper_loss=0.1016, over 21630.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01064, ecapa_loss=0.0001476, whisper_loss=0.09116, over 3969422.66 frames. ], batch size: 90, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:33:54,089 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3405000.0, ans=0.125 2024-08-17 16:33:56,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3405000.0, ans=0.125 2024-08-17 16:34:36,238 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3405300.0, ans=0.0 2024-08-17 16:34:58,485 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 8550, loss[loss=0.1094, beats_loss=0.0108, ecapa_loss=0.0001379, whisper_loss=0.09722, over 23401.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0106, ecapa_loss=0.0001487, whisper_loss=0.09164, over 3984250.17 frames. ], batch size: 91, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:35:04,480 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3405500.0, ans=0.09899494936611666 2024-08-17 16:35:19,870 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3405600.0, ans=0.125 2024-08-17 16:35:28,715 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-17 16:35:43,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3405800.0, ans=0.5 2024-08-17 16:35:51,228 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.747e+01 2.281e+01 2.591e+01 2.805e+01 5.234e+01, threshold=5.182e+01, percent-clipped=0.0 2024-08-17 16:36:04,654 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 8600, loss[loss=0.09379, beats_loss=0.009237, ecapa_loss=0.0001707, whisper_loss=0.08285, over 16527.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01058, ecapa_loss=0.0001486, whisper_loss=0.09132, over 3964005.54 frames. ], batch size: 67, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:36:55,379 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3406300.0, ans=0.0 2024-08-17 16:36:57,055 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3406300.0, ans=0.2 2024-08-17 16:37:12,895 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 8650, loss[loss=0.09537, beats_loss=0.01077, ecapa_loss=0.0001108, whisper_loss=0.0835, over 20217.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01064, ecapa_loss=0.0001475, whisper_loss=0.09081, over 3911007.60 frames. ], batch size: 77, lr: 2.65e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:37:18,882 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3406500.0, ans=0.1 2024-08-17 16:37:19,929 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-17 16:37:26,416 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-17 16:37:40,189 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 17 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-17 16:37:56,648 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.83 vs. limit=15.0 2024-08-17 16:37:57,708 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.09 vs. limit=15.0 2024-08-17 16:38:06,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3406900.0, ans=0.2 2024-08-17 16:38:06,321 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3406900.0, ans=0.04949747468305833 2024-08-17 16:38:06,984 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.354e+01 2.688e+01 3.025e+01 2.265e+02, threshold=5.375e+01, percent-clipped=1.0 2024-08-17 16:38:10,924 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3406900.0, ans=0.0 2024-08-17 16:38:21,515 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 8700, loss[loss=0.1218, beats_loss=0.008108, ecapa_loss=0.0001474, whisper_loss=0.1123, over 18410.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01058, ecapa_loss=0.0001475, whisper_loss=0.091, over 3873673.60 frames. ], batch size: 69, lr: 2.65e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:38:34,968 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 16 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-17 16:38:42,402 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.30 vs. limit=15.0 2024-08-17 16:38:43,855 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3407100.0, ans=0.125 2024-08-17 16:38:47,848 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3407100.0, ans=0.0 2024-08-17 16:39:03,339 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3407300.0, ans=0.125 2024-08-17 16:39:05,060 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.63 vs. limit=6.0 2024-08-17 16:39:12,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3407300.0, ans=0.09899494936611666 2024-08-17 16:39:18,371 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 34 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-17 16:39:23,425 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.42 vs. limit=15.0 2024-08-17 16:39:25,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3407400.0, ans=0.2 2024-08-17 16:39:30,581 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3407500.0, ans=0.0 2024-08-17 16:39:32,026 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 8750, loss[loss=0.1154, beats_loss=0.01061, ecapa_loss=0.0001665, whisper_loss=0.1031, over 22005.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01056, ecapa_loss=0.0001485, whisper_loss=0.09073, over 3892111.39 frames. ], batch size: 92, lr: 2.65e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:39:42,748 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 22 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-17 16:39:46,980 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 25 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-17 16:39:56,633 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-17 16:39:59,899 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3407700.0, ans=0.0 2024-08-17 16:40:14,819 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 19 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-17 16:40:19,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3407800.0, ans=0.125 2024-08-17 16:40:23,458 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3407800.0, ans=0.125 2024-08-17 16:40:27,092 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.849e+01 2.350e+01 2.581e+01 3.005e+01 1.666e+02, threshold=5.162e+01, percent-clipped=1.0 2024-08-17 16:40:29,381 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.88 vs. limit=22.5 2024-08-17 16:40:40,506 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 8800, loss[loss=0.0872, beats_loss=0.01174, ecapa_loss=0.0001652, whisper_loss=0.07381, over 20864.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01061, ecapa_loss=0.0001484, whisper_loss=0.09056, over 3889899.76 frames. ], batch size: 85, lr: 2.65e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:40:46,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3408000.0, ans=0.125 2024-08-17 16:40:59,325 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 16 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-17 16:41:00,740 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3408100.0, ans=0.1 2024-08-17 16:41:06,567 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3408200.0, ans=0.2 2024-08-17 16:41:21,589 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3408300.0, ans=0.125 2024-08-17 16:41:30,771 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 29 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-17 16:41:42,284 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.02 vs. limit=15.0 2024-08-17 16:41:45,837 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3408400.0, ans=0.0 2024-08-17 16:41:48,361 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 8850, loss[loss=0.09251, beats_loss=0.01001, ecapa_loss=0.0001345, whisper_loss=0.08116, over 20362.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01071, ecapa_loss=0.0001478, whisper_loss=0.08977, over 3870109.09 frames. ], batch size: 82, lr: 2.65e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:41:51,320 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3408500.0, ans=0.125 2024-08-17 16:41:58,709 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3408500.0, ans=0.0 2024-08-17 16:42:07,825 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 22 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-17 16:42:38,563 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-17 16:42:40,264 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-17 16:42:44,296 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.312e+01 2.500e+01 2.835e+01 4.395e+02, threshold=5.001e+01, percent-clipped=3.0 2024-08-17 16:42:47,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3408900.0, ans=0.07 2024-08-17 16:42:51,017 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3408900.0, ans=0.0 2024-08-17 16:42:59,099 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 8900, loss[loss=0.1016, beats_loss=0.009127, ecapa_loss=0.0002074, whisper_loss=0.0904, over 22324.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01068, ecapa_loss=0.0001479, whisper_loss=0.09027, over 3897720.01 frames. ], batch size: 95, lr: 2.65e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:43:02,389 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 18 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-17 16:43:04,364 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.72 vs. limit=15.0 2024-08-17 16:43:10,405 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-17 16:43:25,530 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3409100.0, ans=0.0 2024-08-17 16:43:26,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3409200.0, ans=0.125 2024-08-17 16:43:32,759 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3409200.0, ans=0.125 2024-08-17 16:43:39,195 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-17 16:43:42,523 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.01 vs. limit=15.0 2024-08-17 16:44:07,937 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 8950, loss[loss=0.1057, beats_loss=0.01233, ecapa_loss=0.0001484, whisper_loss=0.09184, over 22491.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01057, ecapa_loss=0.0001493, whisper_loss=0.09033, over 3891356.30 frames. ], batch size: 92, lr: 2.65e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:44:13,667 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3409500.0, ans=0.125 2024-08-17 16:44:17,742 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3409500.0, ans=0.125 2024-08-17 16:44:42,841 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3409700.0, ans=0.125 2024-08-17 16:44:53,487 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.65 vs. limit=15.0 2024-08-17 16:44:56,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3409800.0, ans=0.125 2024-08-17 16:45:00,446 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.019e+01 2.330e+01 2.600e+01 2.951e+01 4.837e+01, threshold=5.199e+01, percent-clipped=0.0 2024-08-17 16:45:01,325 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.99 vs. limit=15.0 2024-08-17 16:45:13,425 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.75 vs. limit=22.5 2024-08-17 16:45:13,740 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 9000, loss[loss=0.09863, beats_loss=0.01067, ecapa_loss=0.0001781, whisper_loss=0.08618, over 21983.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01048, ecapa_loss=0.0001501, whisper_loss=0.09078, over 3892812.92 frames. ], batch size: 93, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:45:13,740 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-17 16:45:48,989 INFO [train_multi_KD3.py:1149] (2/4) Epoch 23, validation on ASR_libri: loss=0.2519, beats_loss=0, ecapa_loss=0.0005245, whisper_loss=0.2466, over 922467.00 frames. 2024-08-17 16:46:06,627 INFO [train_multi_KD3.py:1149] (2/4) Epoch 23, validation on SV_voxceleb1: loss=0.004189, beats_loss=0, ecapa_loss=0.0004189, whisper_loss=0, over 939242.00 frames. 2024-08-17 16:47:38,426 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.7596, 4.0348, 4.5880, 4.7143], device='cuda:2') 2024-08-17 16:47:47,291 INFO [train_multi_KD3.py:1149] (2/4) Epoch 23, validation on AT_audioset: loss=0.02324, beats_loss=0.02324, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-17 16:47:47,295 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31584MB 2024-08-17 16:48:09,039 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3410100.0, ans=0.0 2024-08-17 16:48:19,784 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-17 16:48:32,775 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.45 vs. limit=22.5 2024-08-17 16:48:43,967 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-17 16:48:53,353 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3410500.0, ans=0.125 2024-08-17 16:48:54,346 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 9050, loss[loss=0.07868, beats_loss=0.01072, ecapa_loss=0.0001418, whisper_loss=0.06654, over 18973.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01048, ecapa_loss=0.0001491, whisper_loss=0.09093, over 3885072.25 frames. ], batch size: 78, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:48:55,061 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3410500.0, ans=0.0 2024-08-17 16:49:07,627 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-17 16:49:13,267 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-17 16:49:21,177 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-17 16:49:47,780 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.651e+01 2.352e+01 2.621e+01 2.985e+01 9.819e+01, threshold=5.241e+01, percent-clipped=2.0 2024-08-17 16:49:50,324 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-17 16:49:51,960 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3410900.0, ans=0.0 2024-08-17 16:49:59,362 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.32 vs. limit=22.5 2024-08-17 16:50:01,034 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 9100, loss[loss=0.1141, beats_loss=0.01065, ecapa_loss=0.0001578, whisper_loss=0.1018, over 23076.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01045, ecapa_loss=0.0001495, whisper_loss=0.09062, over 3865810.68 frames. ], batch size: 94, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:50:01,143 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-17 16:50:01,453 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3411000.0, ans=0.125 2024-08-17 16:50:31,604 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 29 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-17 16:50:38,632 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 24 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-17 16:50:39,825 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 28 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-17 16:50:56,524 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 29 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-17 16:51:04,681 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.71 vs. limit=6.0 2024-08-17 16:51:08,619 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 9150, loss[loss=0.09004, beats_loss=0.01066, ecapa_loss=0.0001558, whisper_loss=0.07782, over 19929.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01042, ecapa_loss=0.0001489, whisper_loss=0.09064, over 3871011.70 frames. ], batch size: 78, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:51:13,956 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.32 vs. limit=15.0 2024-08-17 16:51:14,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3411500.0, ans=0.0 2024-08-17 16:51:20,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3411500.0, ans=0.0 2024-08-17 16:51:28,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3411600.0, ans=0.95 2024-08-17 16:51:41,031 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3411700.0, ans=0.1 2024-08-17 16:52:04,833 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.286e+01 2.611e+01 2.888e+01 4.834e+01, threshold=5.222e+01, percent-clipped=0.0 2024-08-17 16:52:07,671 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3411900.0, ans=0.2 2024-08-17 16:52:10,123 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-17 16:52:12,865 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 12 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-17 16:52:18,866 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 9200, loss[loss=0.09769, beats_loss=0.01024, ecapa_loss=0.0001357, whisper_loss=0.08609, over 18462.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01048, ecapa_loss=0.0001486, whisper_loss=0.0897, over 3840296.33 frames. ], batch size: 72, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:52:22,665 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3412000.0, ans=0.0 2024-08-17 16:52:53,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3412200.0, ans=0.0 2024-08-17 16:53:02,034 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3412300.0, ans=0.0 2024-08-17 16:53:18,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3412400.0, ans=0.1 2024-08-17 16:53:22,740 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3412400.0, ans=0.125 2024-08-17 16:53:23,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3412400.0, ans=0.0 2024-08-17 16:53:25,694 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.85 vs. limit=22.5 2024-08-17 16:53:28,882 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 9250, loss[loss=0.1044, beats_loss=0.01097, ecapa_loss=0.0001728, whisper_loss=0.09172, over 14435.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01053, ecapa_loss=0.0001493, whisper_loss=0.08979, over 3835762.85 frames. ], batch size: 58, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:53:34,011 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-17 16:53:40,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3412500.0, ans=0.0 2024-08-17 16:54:00,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=3412700.0, ans=0.05 2024-08-17 16:54:17,837 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 26 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-17 16:54:23,585 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3412900.0, ans=0.125 2024-08-17 16:54:24,315 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.401e+01 2.711e+01 2.916e+01 4.438e+01, threshold=5.422e+01, percent-clipped=0.0 2024-08-17 16:54:30,922 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3412900.0, ans=0.07 2024-08-17 16:54:39,003 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 9300, loss[loss=0.1126, beats_loss=0.01122, ecapa_loss=0.0001364, whisper_loss=0.1, over 22335.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0106, ecapa_loss=0.0001483, whisper_loss=0.0898, over 3865687.29 frames. ], batch size: 89, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:54:43,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3413000.0, ans=0.125 2024-08-17 16:54:47,274 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 37 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-17 16:54:57,816 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.82 vs. limit=10.0 2024-08-17 16:54:58,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3413100.0, ans=10.0 2024-08-17 16:55:15,440 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3413200.0, ans=0.125 2024-08-17 16:55:18,114 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.45 vs. limit=10.0 2024-08-17 16:55:35,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3413400.0, ans=0.1 2024-08-17 16:55:48,062 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 9350, loss[loss=0.08031, beats_loss=0.0157, ecapa_loss=0.0001162, whisper_loss=0.06345, over 20006.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01059, ecapa_loss=0.0001486, whisper_loss=0.08948, over 3842954.35 frames. ], batch size: 82, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:56:00,558 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-17 16:56:02,569 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.90 vs. limit=22.5 2024-08-17 16:56:04,530 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 18 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-17 16:56:16,146 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.95 vs. limit=22.5 2024-08-17 16:56:38,869 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 20 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-17 16:56:42,164 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.915e+01 2.341e+01 2.647e+01 3.045e+01 1.659e+02, threshold=5.294e+01, percent-clipped=2.0 2024-08-17 16:56:43,391 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 22 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-17 16:56:44,914 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 27 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-17 16:56:55,721 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 9400, loss[loss=0.09731, beats_loss=0.01262, ecapa_loss=0.0001545, whisper_loss=0.08315, over 18113.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01066, ecapa_loss=0.0001487, whisper_loss=0.08928, over 3853803.84 frames. ], batch size: 78, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:57:01,330 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3414000.0, ans=0.0 2024-08-17 16:57:11,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3414100.0, ans=0.125 2024-08-17 16:57:33,279 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3414300.0, ans=0.04949747468305833 2024-08-17 16:57:36,123 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3414300.0, ans=0.2 2024-08-17 16:57:38,481 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3414300.0, ans=0.125 2024-08-17 16:57:47,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3414400.0, ans=0.2 2024-08-17 16:57:56,837 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-17 16:58:00,228 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 9450, loss[loss=0.09871, beats_loss=0.01234, ecapa_loss=0.0001442, whisper_loss=0.08492, over 20903.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01062, ecapa_loss=0.0001506, whisper_loss=0.08987, over 3872935.84 frames. ], batch size: 87, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:58:16,546 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.66 vs. limit=15.0 2024-08-17 16:58:28,474 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-17 16:58:31,201 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 22 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-17 16:58:45,361 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-17 16:58:49,402 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 16:58:51,545 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.271e+01 2.511e+01 2.774e+01 4.657e+01, threshold=5.021e+01, percent-clipped=0.0 2024-08-17 16:58:54,823 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3414900.0, ans=0.125 2024-08-17 16:59:04,745 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 9500, loss[loss=0.1022, beats_loss=0.01085, ecapa_loss=0.0001552, whisper_loss=0.08978, over 17734.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01069, ecapa_loss=0.0001504, whisper_loss=0.08949, over 3856787.84 frames. ], batch size: 73, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:59:11,847 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.576e-02 2024-08-17 16:59:17,730 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 31 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-17 16:59:27,624 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-17 16:59:29,413 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.21 vs. limit=22.5 2024-08-17 16:59:30,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3415200.0, ans=10.0 2024-08-17 16:59:34,457 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.19 vs. limit=15.0 2024-08-17 16:59:36,791 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 22 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-17 16:59:38,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3415200.0, ans=0.04949747468305833 2024-08-17 16:59:47,297 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3415300.0, ans=0.5 2024-08-17 16:59:49,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3415300.0, ans=0.0 2024-08-17 16:59:54,654 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=20.74 vs. limit=15.0 2024-08-17 17:00:08,345 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 9550, loss[loss=0.1072, beats_loss=0.0118, ecapa_loss=0.000136, whisper_loss=0.094, over 23302.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01062, ecapa_loss=0.0001499, whisper_loss=0.08934, over 3852442.75 frames. ], batch size: 95, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:00:08,450 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-17 17:00:09,510 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 20 from LS+wenet, 34 from Vox, 37 fro AS 2024-08-17 17:00:48,989 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.17 vs. limit=15.0 2024-08-17 17:00:57,911 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.741e+01 2.226e+01 2.523e+01 2.913e+01 4.217e+01, threshold=5.047e+01, percent-clipped=0.0 2024-08-17 17:01:00,757 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3415900.0, ans=0.125 2024-08-17 17:01:09,131 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.26 vs. limit=12.0 2024-08-17 17:01:10,919 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 9600, loss[loss=0.09888, beats_loss=0.009692, ecapa_loss=0.0001545, whisper_loss=0.08764, over 15555.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01057, ecapa_loss=0.0001492, whisper_loss=0.08903, over 3869393.03 frames. ], batch size: 61, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:01:19,950 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3416000.0, ans=0.125 2024-08-17 17:01:22,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3416100.0, ans=0.04949747468305833 2024-08-17 17:01:32,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3416100.0, ans=0.125 2024-08-17 17:01:33,568 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-17 17:01:34,846 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 30 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-17 17:01:39,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3416200.0, ans=0.1 2024-08-17 17:01:53,343 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2024-08-17 17:01:59,998 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 38 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-17 17:02:03,550 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 23 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-17 17:02:13,489 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 9650, loss[loss=0.1029, beats_loss=0.008994, ecapa_loss=0.0001255, whisper_loss=0.09262, over 14661.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01051, ecapa_loss=0.0001484, whisper_loss=0.08949, over 3876390.16 frames. ], batch size: 56, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:02:15,186 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3416500.0, ans=0.025 2024-08-17 17:02:40,203 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 20 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-17 17:02:40,752 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.94 vs. limit=15.0 2024-08-17 17:02:42,958 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3416700.0, ans=0.0 2024-08-17 17:02:49,840 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.19 vs. limit=15.0 2024-08-17 17:02:55,508 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 28 from LS+wenet, 16 from Vox, 17 fro AS 2024-08-17 17:03:02,212 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.62 vs. limit=12.0 2024-08-17 17:03:03,964 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.357e+01 2.621e+01 2.968e+01 4.527e+01, threshold=5.241e+01, percent-clipped=0.0 2024-08-17 17:03:16,847 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 9700, loss[loss=0.1121, beats_loss=0.009043, ecapa_loss=0.0001601, whisper_loss=0.1015, over 23066.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01049, ecapa_loss=0.0001483, whisper_loss=0.08992, over 3876408.33 frames. ], batch size: 92, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:03:17,573 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.74 vs. limit=12.0 2024-08-17 17:03:18,136 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 22 from LS+wenet, 9 from Vox, 25 fro AS 2024-08-17 17:03:19,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3417000.0, ans=0.0 2024-08-17 17:03:36,463 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-17 17:03:44,428 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 20 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-17 17:03:47,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3417200.0, ans=0.0 2024-08-17 17:03:57,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3417300.0, ans=0.0 2024-08-17 17:03:58,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3417300.0, ans=0.2 2024-08-17 17:03:58,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3417300.0, ans=0.04949747468305833 2024-08-17 17:04:13,356 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 17 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-17 17:04:13,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3417400.0, ans=0.125 2024-08-17 17:04:19,309 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 9750, loss[loss=0.1016, beats_loss=0.01026, ecapa_loss=0.0001375, whisper_loss=0.08997, over 19523.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01048, ecapa_loss=0.0001478, whisper_loss=0.08956, over 3838506.02 frames. ], batch size: 75, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:04:21,731 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 29 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-17 17:04:28,325 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 23 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-17 17:04:40,755 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 25 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-17 17:04:43,545 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-17 17:04:46,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3417700.0, ans=0.07 2024-08-17 17:04:55,142 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-17 17:04:55,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3417700.0, ans=0.125 2024-08-17 17:04:59,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3417800.0, ans=0.2 2024-08-17 17:05:11,018 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.328e+01 2.610e+01 2.968e+01 3.624e+01, threshold=5.221e+01, percent-clipped=0.0 2024-08-17 17:05:19,209 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3417900.0, ans=0.0 2024-08-17 17:05:20,644 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3417900.0, ans=0.125 2024-08-17 17:05:23,977 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 9800, loss[loss=0.1075, beats_loss=0.01087, ecapa_loss=0.000118, whisper_loss=0.09544, over 16470.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01053, ecapa_loss=0.0001465, whisper_loss=0.08959, over 3832239.76 frames. ], batch size: 64, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:05:25,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3418000.0, ans=0.0 2024-08-17 17:05:28,126 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-17 17:05:33,130 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-17 17:05:49,581 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 18 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-17 17:05:51,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3418200.0, ans=0.125 2024-08-17 17:06:04,190 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3418300.0, ans=0.2 2024-08-17 17:06:06,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3418300.0, ans=0.0 2024-08-17 17:06:21,862 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3418400.0, ans=0.1 2024-08-17 17:06:27,860 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 9850, loss[loss=0.0674, beats_loss=0.01073, ecapa_loss=0.0001522, whisper_loss=0.05515, over 18773.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01052, ecapa_loss=0.0001473, whisper_loss=0.08936, over 3808522.67 frames. ], batch size: 79, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:06:28,006 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 30 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-17 17:06:35,668 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 31 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-17 17:07:11,774 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3418800.0, ans=0.1 2024-08-17 17:07:17,680 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.311e+01 2.569e+01 2.955e+01 4.968e+01, threshold=5.138e+01, percent-clipped=0.0 2024-08-17 17:07:21,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3418900.0, ans=0.0 2024-08-17 17:07:27,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3418900.0, ans=0.125 2024-08-17 17:07:30,184 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 9900, loss[loss=0.09311, beats_loss=0.01129, ecapa_loss=0.0001372, whisper_loss=0.08045, over 19514.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01054, ecapa_loss=0.000148, whisper_loss=0.08964, over 3807693.59 frames. ], batch size: 77, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:07:35,452 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-17 17:07:40,065 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-17 17:07:51,519 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-17 17:08:08,096 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3419300.0, ans=0.0 2024-08-17 17:08:18,227 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.75 vs. limit=15.0 2024-08-17 17:08:32,723 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 9950, loss[loss=0.1039, beats_loss=0.01045, ecapa_loss=0.0001701, whisper_loss=0.09171, over 20550.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01063, ecapa_loss=0.0001481, whisper_loss=0.08999, over 3845463.69 frames. ], batch size: 83, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:08:35,559 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-17 17:08:41,829 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3419500.0, ans=0.035 2024-08-17 17:08:49,055 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 17 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-17 17:08:50,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3419600.0, ans=0.1 2024-08-17 17:09:03,307 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 14 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-17 17:09:04,541 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 19 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-17 17:09:20,962 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3419800.0, ans=0.0 2024-08-17 17:09:22,871 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.277e+01 2.547e+01 2.876e+01 4.081e+01, threshold=5.094e+01, percent-clipped=0.0 2024-08-17 17:09:23,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3419900.0, ans=0.125 2024-08-17 17:09:35,742 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 10000, loss[loss=0.1081, beats_loss=0.009751, ecapa_loss=0.0001539, whisper_loss=0.0968, over 21926.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01069, ecapa_loss=0.0001473, whisper_loss=0.08967, over 3863440.59 frames. ], batch size: 87, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:09:54,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3420100.0, ans=0.0 2024-08-17 17:10:05,012 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3420200.0, ans=0.2 2024-08-17 17:10:19,692 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3420300.0, ans=0.125 2024-08-17 17:10:21,909 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-17 17:10:25,945 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3420400.0, ans=0.2 2024-08-17 17:10:38,226 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 10050, loss[loss=0.09643, beats_loss=0.01287, ecapa_loss=0.0001445, whisper_loss=0.08211, over 21929.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01063, ecapa_loss=0.0001486, whisper_loss=0.09038, over 3873873.49 frames. ], batch size: 91, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:10:46,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3420500.0, ans=0.125 2024-08-17 17:10:53,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3420600.0, ans=0.125 2024-08-17 17:11:03,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3420700.0, ans=0.125 2024-08-17 17:11:22,665 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3420800.0, ans=0.2 2024-08-17 17:11:28,098 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.989e+01 2.359e+01 2.579e+01 2.991e+01 2.335e+02, threshold=5.159e+01, percent-clipped=2.0 2024-08-17 17:11:35,708 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 36 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-17 17:11:40,713 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 10100, loss[loss=0.09253, beats_loss=0.01145, ecapa_loss=0.0001407, whisper_loss=0.07967, over 22698.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01063, ecapa_loss=0.0001491, whisper_loss=0.09063, over 3905737.82 frames. ], batch size: 91, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:11:44,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3421000.0, ans=0.125 2024-08-17 17:11:46,950 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3421000.0, ans=0.0 2024-08-17 17:11:55,635 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-17 17:12:18,521 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 24 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-17 17:12:27,594 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3421300.0, ans=0.125 2024-08-17 17:12:43,573 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 10150, loss[loss=0.09258, beats_loss=0.01248, ecapa_loss=0.0001354, whisper_loss=0.07874, over 20907.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01069, ecapa_loss=0.0001485, whisper_loss=0.09007, over 3912485.63 frames. ], batch size: 84, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:12:50,242 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 17:13:00,104 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 26 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-17 17:13:20,318 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-17 17:13:33,381 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.337e+01 2.592e+01 2.912e+01 4.503e+01, threshold=5.183e+01, percent-clipped=0.0 2024-08-17 17:13:40,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3421900.0, ans=0.125 2024-08-17 17:13:45,729 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 10200, loss[loss=0.1317, beats_loss=0.007477, ecapa_loss=0.0001702, whisper_loss=0.1225, over 21920.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01069, ecapa_loss=0.0001483, whisper_loss=0.09024, over 3933316.34 frames. ], batch size: 87, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:13:47,498 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3422000.0, ans=0.125 2024-08-17 17:14:02,156 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3422100.0, ans=0.0 2024-08-17 17:14:05,657 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 11 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-17 17:14:15,972 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3422200.0, ans=0.125 2024-08-17 17:14:20,218 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.44 vs. limit=22.5 2024-08-17 17:14:25,999 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3422300.0, ans=0.125 2024-08-17 17:14:28,472 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3422300.0, ans=0.125 2024-08-17 17:14:33,172 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-17 17:14:47,668 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 10250, loss[loss=0.09172, beats_loss=0.01237, ecapa_loss=0.0001547, whisper_loss=0.0778, over 14180.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01079, ecapa_loss=0.000147, whisper_loss=0.0897, over 3940939.78 frames. ], batch size: 57, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:14:53,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3422500.0, ans=0.1 2024-08-17 17:15:01,591 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-17 17:15:07,292 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.13 vs. limit=15.0 2024-08-17 17:15:10,255 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 16 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-17 17:15:22,981 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 17 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-17 17:15:27,214 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.23 vs. limit=22.5 2024-08-17 17:15:29,806 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=8.764e+00 2024-08-17 17:15:32,170 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3422800.0, ans=0.2 2024-08-17 17:15:34,433 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-17 17:15:37,970 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.362e+01 2.622e+01 2.945e+01 4.411e+01, threshold=5.245e+01, percent-clipped=0.0 2024-08-17 17:15:43,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3422900.0, ans=0.125 2024-08-17 17:15:50,578 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 10300, loss[loss=0.07821, beats_loss=0.01186, ecapa_loss=0.0001555, whisper_loss=0.06479, over 13628.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01076, ecapa_loss=0.000148, whisper_loss=0.09007, over 3903086.72 frames. ], batch size: 58, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:15:50,719 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 36 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-17 17:16:26,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3423300.0, ans=0.0 2024-08-17 17:16:27,252 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.42 vs. limit=15.0 2024-08-17 17:16:37,154 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.24 vs. limit=15.0 2024-08-17 17:16:44,616 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3423400.0, ans=0.125 2024-08-17 17:16:52,954 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 10350, loss[loss=0.1093, beats_loss=0.009609, ecapa_loss=0.0001809, whisper_loss=0.09785, over 21299.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01076, ecapa_loss=0.0001478, whisper_loss=0.09084, over 3905142.81 frames. ], batch size: 90, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:16:57,301 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3423500.0, ans=0.1 2024-08-17 17:17:03,700 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3423500.0, ans=0.125 2024-08-17 17:17:30,745 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.03 vs. limit=15.0 2024-08-17 17:17:32,899 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=3423800.0, ans=15.0 2024-08-17 17:17:35,078 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.022e-02 2024-08-17 17:17:36,103 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-17 17:17:43,587 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.997e+01 2.300e+01 2.585e+01 2.974e+01 4.112e+01, threshold=5.170e+01, percent-clipped=0.0 2024-08-17 17:17:50,185 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3423900.0, ans=0.125 2024-08-17 17:17:56,143 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 10400, loss[loss=0.07797, beats_loss=0.01123, ecapa_loss=0.000164, whisper_loss=0.0651, over 18058.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01078, ecapa_loss=0.0001487, whisper_loss=0.09046, over 3908186.63 frames. ], batch size: 74, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:18:00,481 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3424000.0, ans=0.2 2024-08-17 17:18:13,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3424100.0, ans=0.0 2024-08-17 17:18:17,732 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3424100.0, ans=0.0 2024-08-17 17:18:28,426 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 29 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-17 17:18:31,153 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3424200.0, ans=0.09899494936611666 2024-08-17 17:18:37,247 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=3424300.0, ans=10.0 2024-08-17 17:18:45,274 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.42 vs. limit=6.0 2024-08-17 17:18:56,242 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3424400.0, ans=0.125 2024-08-17 17:18:58,104 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 10450, loss[loss=0.1071, beats_loss=0.01039, ecapa_loss=0.0001419, whisper_loss=0.09533, over 23434.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01068, ecapa_loss=0.000148, whisper_loss=0.09115, over 3875331.68 frames. ], batch size: 94, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:19:03,634 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3424500.0, ans=0.1 2024-08-17 17:19:15,077 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.61 vs. limit=15.0 2024-08-17 17:19:24,498 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=3424700.0, ans=0.95 2024-08-17 17:19:25,517 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-17 17:19:25,803 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3424700.0, ans=0.125 2024-08-17 17:19:38,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3424800.0, ans=0.125 2024-08-17 17:19:47,899 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.411e+01 2.718e+01 3.284e+01 2.620e+02, threshold=5.436e+01, percent-clipped=5.0 2024-08-17 17:19:56,280 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.70 vs. limit=15.0 2024-08-17 17:20:00,397 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 10500, loss[loss=0.1234, beats_loss=0.007712, ecapa_loss=0.0001937, whisper_loss=0.1138, over 17620.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01064, ecapa_loss=0.0001495, whisper_loss=0.09091, over 3886008.36 frames. ], batch size: 69, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:20:03,741 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3425000.0, ans=0.0 2024-08-17 17:20:12,433 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-17 17:20:37,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3425300.0, ans=0.125 2024-08-17 17:20:40,683 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3425300.0, ans=0.2 2024-08-17 17:20:55,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3425400.0, ans=0.0 2024-08-17 17:20:59,234 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3425400.0, ans=0.035 2024-08-17 17:21:02,100 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=3425500.0, ans=6.0 2024-08-17 17:21:02,660 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 10550, loss[loss=0.07322, beats_loss=0.01346, ecapa_loss=0.0001351, whisper_loss=0.0584, over 19932.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01057, ecapa_loss=0.0001487, whisper_loss=0.09191, over 3902391.38 frames. ], batch size: 83, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:21:09,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3425500.0, ans=0.125 2024-08-17 17:21:22,727 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3425600.0, ans=0.0 2024-08-17 17:21:22,918 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.84 vs. limit=15.0 2024-08-17 17:21:30,941 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-17 17:21:39,310 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3425800.0, ans=0.0 2024-08-17 17:21:51,592 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.414e+01 2.704e+01 3.069e+01 2.193e+02, threshold=5.408e+01, percent-clipped=2.0 2024-08-17 17:21:58,981 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 18 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-17 17:22:03,967 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 10600, loss[loss=0.1069, beats_loss=0.009029, ecapa_loss=0.0001792, whisper_loss=0.0961, over 15521.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01057, ecapa_loss=0.0001485, whisper_loss=0.09113, over 3846926.92 frames. ], batch size: 62, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:22:18,543 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.21 vs. limit=15.0 2024-08-17 17:22:21,899 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3426100.0, ans=0.07 2024-08-17 17:22:40,788 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3426300.0, ans=0.04949747468305833 2024-08-17 17:22:45,576 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 27 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-17 17:22:48,039 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-17 17:22:52,164 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3426300.0, ans=0.0 2024-08-17 17:22:57,129 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.14 vs. limit=12.0 2024-08-17 17:23:01,018 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3426400.0, ans=0.0 2024-08-17 17:23:06,524 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 10650, loss[loss=0.1008, beats_loss=0.01192, ecapa_loss=0.0001303, whisper_loss=0.08756, over 16235.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01057, ecapa_loss=0.0001481, whisper_loss=0.09135, over 3877635.28 frames. ], batch size: 66, lr: 2.64e-03, grad_scale: 1.152921504606847e+18 2024-08-17 17:23:07,893 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-17 17:23:11,654 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-17 17:23:27,733 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-17 17:23:33,258 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.19 vs. limit=15.0 2024-08-17 17:23:47,979 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3426800.0, ans=0.1 2024-08-17 17:23:48,279 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.72 vs. limit=15.0 2024-08-17 17:23:51,553 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3426800.0, ans=0.035 2024-08-17 17:23:56,150 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.375e+01 2.587e+01 2.940e+01 1.166e+02, threshold=5.174e+01, percent-clipped=1.0 2024-08-17 17:23:59,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3426900.0, ans=0.1 2024-08-17 17:24:08,955 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 10700, loss[loss=0.09448, beats_loss=0.00933, ecapa_loss=0.0001426, whisper_loss=0.08373, over 14711.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01062, ecapa_loss=0.0001471, whisper_loss=0.09131, over 3884814.78 frames. ], batch size: 57, lr: 2.64e-03, grad_scale: 1.152921504606847e+18 2024-08-17 17:24:26,745 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.704e+05 2024-08-17 17:24:33,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3427200.0, ans=0.0 2024-08-17 17:24:34,780 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 16 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-17 17:24:42,747 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3427200.0, ans=0.125 2024-08-17 17:24:44,025 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3427200.0, ans=0.0 2024-08-17 17:24:46,263 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 21 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-17 17:24:50,366 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.793e+01 2024-08-17 17:24:53,825 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 26 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-17 17:25:07,755 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3427400.0, ans=0.0 2024-08-17 17:25:10,934 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 10750, loss[loss=0.1093, beats_loss=0.0119, ecapa_loss=0.0001412, whisper_loss=0.096, over 21848.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01069, ecapa_loss=0.000146, whisper_loss=0.09118, over 3907492.27 frames. ], batch size: 90, lr: 2.64e-03, grad_scale: 1.152921504606847e+18 2024-08-17 17:25:25,846 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 20 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-17 17:25:30,044 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.01 vs. limit=22.5 2024-08-17 17:25:38,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3427700.0, ans=0.1 2024-08-17 17:25:51,227 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3427800.0, ans=0.125 2024-08-17 17:26:00,268 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.471e+01 2.708e+01 3.054e+01 4.365e+01, threshold=5.417e+01, percent-clipped=0.0 2024-08-17 17:26:00,966 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3427900.0, ans=0.1 2024-08-17 17:26:12,894 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 10800, loss[loss=0.094, beats_loss=0.01157, ecapa_loss=0.0001571, whisper_loss=0.08087, over 18919.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01072, ecapa_loss=0.0001459, whisper_loss=0.0908, over 3911998.82 frames. ], batch size: 76, lr: 2.64e-03, grad_scale: 1.152921504606847e+18 2024-08-17 17:26:14,357 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 19 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-17 17:26:15,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3428000.0, ans=0.125 2024-08-17 17:26:26,939 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3428100.0, ans=0.125 2024-08-17 17:26:39,471 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 12 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-17 17:26:52,650 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.06 vs. limit=22.5 2024-08-17 17:26:58,579 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 17:27:01,046 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3428300.0, ans=0.125 2024-08-17 17:27:03,592 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3428400.0, ans=0.0 2024-08-17 17:27:15,567 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 10850, loss[loss=0.1074, beats_loss=0.01039, ecapa_loss=0.0001333, whisper_loss=0.09571, over 21410.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0107, ecapa_loss=0.0001474, whisper_loss=0.09054, over 3917822.54 frames. ], batch size: 83, lr: 2.64e-03, grad_scale: 1.152921504606847e+18 2024-08-17 17:27:31,116 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 31 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-17 17:27:45,557 WARNING [optim.py:496] (2/4) Scaling gradients by 0.07737734913825989, model_norm_threshold=54.16817855834961 2024-08-17 17:27:45,721 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.25, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.205e+05, grad_sumsq=1.205e+05, orig_rms_sq=1.000e+00 2024-08-17 17:27:53,279 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-17 17:28:05,444 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.66 vs. limit=12.0 2024-08-17 17:28:05,720 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.955e+01 2.361e+01 2.674e+01 3.069e+01 7.001e+02, threshold=5.348e+01, percent-clipped=1.0 2024-08-17 17:28:07,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3428900.0, ans=0.0 2024-08-17 17:28:16,109 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3428900.0, ans=0.2 2024-08-17 17:28:18,420 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 10900, loss[loss=0.07065, beats_loss=0.01405, ecapa_loss=8.19e-05, whisper_loss=0.05579, over 14990.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01069, ecapa_loss=0.0001468, whisper_loss=0.09125, over 3920815.63 frames. ], batch size: 54, lr: 2.64e-03, grad_scale: 1.152921504606847e+18 2024-08-17 17:28:32,155 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3429100.0, ans=0.0 2024-08-17 17:28:33,133 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 28 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-17 17:28:39,328 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-17 17:28:44,681 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3429200.0, ans=0.2 2024-08-17 17:28:49,689 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3429200.0, ans=0.0 2024-08-17 17:28:52,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3429200.0, ans=0.125 2024-08-17 17:28:54,981 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.25 vs. limit=6.0 2024-08-17 17:28:59,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3429300.0, ans=0.125 2024-08-17 17:29:00,818 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=8.500e-01 2024-08-17 17:29:18,955 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-17 17:29:20,181 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 10950, loss[loss=0.1111, beats_loss=0.01075, ecapa_loss=0.000148, whisper_loss=0.0989, over 22313.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01073, ecapa_loss=0.0001465, whisper_loss=0.0907, over 3928935.32 frames. ], batch size: 89, lr: 2.64e-03, grad_scale: 1.152921504606847e+18 2024-08-17 17:29:29,922 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.77 vs. limit=22.5 2024-08-17 17:29:36,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3429600.0, ans=0.1 2024-08-17 17:29:38,978 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 26 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-17 17:29:49,289 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3429700.0, ans=0.125 2024-08-17 17:29:51,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3429700.0, ans=0.0 2024-08-17 17:29:55,717 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.54 vs. limit=15.0 2024-08-17 17:30:03,364 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.76 vs. limit=15.0 2024-08-17 17:30:10,091 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+01 2.341e+01 2.537e+01 2.893e+01 3.514e+01, threshold=5.074e+01, percent-clipped=0.0 2024-08-17 17:30:14,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3429900.0, ans=0.07 2024-08-17 17:30:21,960 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.55 vs. limit=12.0 2024-08-17 17:30:22,506 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 11000, loss[loss=0.08464, beats_loss=0.0123, ecapa_loss=0.0001275, whisper_loss=0.07107, over 21900.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0107, ecapa_loss=0.0001459, whisper_loss=0.09089, over 3925849.16 frames. ], batch size: 88, lr: 2.64e-03, grad_scale: 1.152921504606847e+18 2024-08-17 17:30:23,995 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 28 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-17 17:30:25,361 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3430000.0, ans=0.125 2024-08-17 17:30:35,113 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3430100.0, ans=0.0 2024-08-17 17:30:41,317 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3430100.0, ans=0.125 2024-08-17 17:30:41,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3430100.0, ans=0.125 2024-08-17 17:30:43,846 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3430100.0, ans=0.0 2024-08-17 17:30:45,089 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3430100.0, ans=0.1 2024-08-17 17:30:57,116 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 29 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-17 17:31:02,869 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.54 vs. limit=15.0 2024-08-17 17:31:22,849 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.03 vs. limit=12.0 2024-08-17 17:31:24,579 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 11050, loss[loss=0.1054, beats_loss=0.01041, ecapa_loss=0.0001589, whisper_loss=0.09337, over 21926.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01059, ecapa_loss=0.0001468, whisper_loss=0.09205, over 3947975.32 frames. ], batch size: 90, lr: 2.64e-03, grad_scale: 1.152921504606847e+18 2024-08-17 17:31:24,979 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3430500.0, ans=0.0 2024-08-17 17:31:28,716 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3430500.0, ans=0.125 2024-08-17 17:31:38,861 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3430600.0, ans=0.125 2024-08-17 17:31:59,045 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 20 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-17 17:32:01,923 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3430800.0, ans=0.125 2024-08-17 17:32:07,926 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 31 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-17 17:32:15,237 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.316e+01 2.591e+01 2.930e+01 4.532e+01, threshold=5.182e+01, percent-clipped=0.0 2024-08-17 17:32:20,331 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3430900.0, ans=0.025 2024-08-17 17:32:22,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3430900.0, ans=0.125 2024-08-17 17:32:25,925 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 11100, loss[loss=0.0884, beats_loss=0.01091, ecapa_loss=0.0001916, whisper_loss=0.07557, over 19933.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01058, ecapa_loss=0.0001479, whisper_loss=0.09151, over 3924649.26 frames. ], batch size: 88, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:32:31,031 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3431000.0, ans=0.0 2024-08-17 17:32:39,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3431100.0, ans=0.125 2024-08-17 17:32:40,485 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3431100.0, ans=0.125 2024-08-17 17:32:42,514 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 35 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-17 17:32:43,581 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-17 17:32:59,848 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 17:33:01,035 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=3431300.0, ans=0.5 2024-08-17 17:33:03,332 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 13 from Vox, 43 fro AS 2024-08-17 17:33:09,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3431300.0, ans=0.125 2024-08-17 17:33:46,616 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 0, loss[loss=0.1029, beats_loss=0.01132, ecapa_loss=0.0001342, whisper_loss=0.09024, over 16130.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01132, ecapa_loss=0.0001342, whisper_loss=0.09024, over 16130.00 frames. ], batch size: 63, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:33:46,617 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-17 17:34:22,013 INFO [train_multi_KD3.py:1149] (2/4) Epoch 24, validation on ASR_libri: loss=0.2501, beats_loss=0, ecapa_loss=0.0005267, whisper_loss=0.2449, over 922467.00 frames. 2024-08-17 17:34:36,628 INFO [train_multi_KD3.py:1149] (2/4) Epoch 24, validation on SV_voxceleb1: loss=0.004161, beats_loss=0, ecapa_loss=0.0004161, whisper_loss=0, over 939242.00 frames. 2024-08-17 17:36:22,864 INFO [train_multi_KD3.py:1149] (2/4) Epoch 24, validation on AT_audioset: loss=0.02331, beats_loss=0.02331, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-17 17:36:22,867 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31584MB 2024-08-17 17:36:24,443 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 22 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-17 17:36:24,683 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3431420.0, ans=0.2 2024-08-17 17:36:51,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3431520.0, ans=0.0 2024-08-17 17:36:52,564 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 18 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-17 17:37:10,272 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3431620.0, ans=0.0 2024-08-17 17:37:14,801 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3431720.0, ans=0.1 2024-08-17 17:37:33,180 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-17 17:37:49,137 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.990e+01 2.526e+01 2.875e+01 3.255e+01 4.776e+01, threshold=5.751e+01, percent-clipped=0.0 2024-08-17 17:37:51,228 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 50, loss[loss=0.08574, beats_loss=0.008115, ecapa_loss=0.0001596, whisper_loss=0.07603, over 16936.00 frames. ], tot_loss[loss=0.09824, beats_loss=0.009941, ecapa_loss=0.0001575, whisper_loss=0.08673, over 847887.54 frames. ], batch size: 69, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:38:50,831 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 20 from LS+wenet, 30 from Vox, 27 fro AS 2024-08-17 17:38:57,972 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3432320.0, ans=0.0 2024-08-17 17:38:59,843 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3432320.0, ans=0.125 2024-08-17 17:39:18,446 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 100, loss[loss=0.09341, beats_loss=0.009742, ecapa_loss=0.000136, whisper_loss=0.08231, over 18133.00 frames. ], tot_loss[loss=0.09939, beats_loss=0.009611, ecapa_loss=0.0001543, whisper_loss=0.08824, over 1498933.82 frames. ], batch size: 70, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:39:34,898 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3432520.0, ans=0.0 2024-08-17 17:39:41,366 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 13 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-17 17:39:54,961 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3432620.0, ans=0.125 2024-08-17 17:39:56,821 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3432620.0, ans=0.0 2024-08-17 17:40:24,826 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.24 vs. limit=22.5 2024-08-17 17:40:28,624 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-17 17:40:37,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3432820.0, ans=0.2 2024-08-17 17:40:41,427 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.203e+01 2.631e+01 2.865e+01 3.182e+01 4.534e+01, threshold=5.730e+01, percent-clipped=0.0 2024-08-17 17:40:42,702 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 150, loss[loss=0.1076, beats_loss=0.008599, ecapa_loss=0.0001307, whisper_loss=0.09774, over 17780.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.009479, ecapa_loss=0.0001531, whisper_loss=0.09045, over 2013206.60 frames. ], batch size: 66, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:40:45,999 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-17 17:40:54,339 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-17 17:41:09,381 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 12 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-17 17:41:15,241 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3433120.0, ans=0.1 2024-08-17 17:41:21,818 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3433120.0, ans=0.0 2024-08-17 17:41:24,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3433220.0, ans=0.125 2024-08-17 17:41:24,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3433220.0, ans=0.0 2024-08-17 17:41:45,308 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-17 17:41:45,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3433320.0, ans=0.125 2024-08-17 17:41:50,203 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 200, loss[loss=0.1213, beats_loss=0.00767, ecapa_loss=0.0001457, whisper_loss=0.1121, over 16270.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.009562, ecapa_loss=0.0001546, whisper_loss=0.09283, over 2404700.39 frames. ], batch size: 60, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:41:50,612 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3433420.0, ans=0.125 2024-08-17 17:42:20,497 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3433620.0, ans=0.0 2024-08-17 17:42:53,761 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.081e+01 2.383e+01 2.582e+01 2.960e+01 4.276e+01, threshold=5.164e+01, percent-clipped=0.0 2024-08-17 17:42:54,977 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 250, loss[loss=0.1168, beats_loss=0.01025, ecapa_loss=0.0001483, whisper_loss=0.1051, over 20256.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.009731, ecapa_loss=0.0001524, whisper_loss=0.09218, over 2689188.89 frames. ], batch size: 80, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:42:55,495 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3433920.0, ans=0.125 2024-08-17 17:42:59,209 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3433920.0, ans=0.0 2024-08-17 17:43:44,460 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3434220.0, ans=0.07 2024-08-17 17:43:49,818 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3434320.0, ans=0.2 2024-08-17 17:43:52,375 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3434320.0, ans=0.0 2024-08-17 17:43:57,265 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 26 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-17 17:43:58,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3434420.0, ans=0.125 2024-08-17 17:43:59,624 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 300, loss[loss=0.1091, beats_loss=0.0118, ecapa_loss=0.0001457, whisper_loss=0.09587, over 19349.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.009881, ecapa_loss=0.0001508, whisper_loss=0.09269, over 2951849.50 frames. ], batch size: 76, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:44:31,421 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.66 vs. limit=15.0 2024-08-17 17:45:05,623 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.312e+01 2.476e+01 2.764e+01 4.106e+02, threshold=4.951e+01, percent-clipped=1.0 2024-08-17 17:45:06,991 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 350, loss[loss=0.1102, beats_loss=0.01105, ecapa_loss=0.0001588, whisper_loss=0.09755, over 22185.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01006, ecapa_loss=0.0001513, whisper_loss=0.09123, over 3134750.24 frames. ], batch size: 92, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:45:08,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3434920.0, ans=0.0 2024-08-17 17:45:13,269 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.75 vs. limit=6.0 2024-08-17 17:45:18,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3434920.0, ans=0.0 2024-08-17 17:45:19,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3435020.0, ans=0.1 2024-08-17 17:45:32,010 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3435020.0, ans=10.0 2024-08-17 17:45:32,223 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.65 vs. limit=15.0 2024-08-17 17:45:33,477 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3435120.0, ans=0.0 2024-08-17 17:45:41,419 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3435120.0, ans=0.07 2024-08-17 17:45:43,174 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.57 vs. limit=15.0 2024-08-17 17:45:47,828 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-17 17:46:00,442 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3435320.0, ans=0.125 2024-08-17 17:46:01,514 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-17 17:46:10,526 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.94 vs. limit=15.0 2024-08-17 17:46:11,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3435320.0, ans=0.125 2024-08-17 17:46:15,331 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 400, loss[loss=0.09184, beats_loss=0.01092, ecapa_loss=0.0001486, whisper_loss=0.07944, over 22181.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01019, ecapa_loss=0.0001506, whisper_loss=0.09027, over 3309273.90 frames. ], batch size: 89, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:46:15,420 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 21 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-17 17:46:15,821 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3435420.0, ans=0.1 2024-08-17 17:46:24,880 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3435420.0, ans=0.0 2024-08-17 17:46:39,006 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.923e+01 2024-08-17 17:46:40,272 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3435520.0, ans=0.2 2024-08-17 17:46:40,301 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3435520.0, ans=0.125 2024-08-17 17:47:22,636 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.287e+01 2.548e+01 2.895e+01 1.655e+02, threshold=5.097e+01, percent-clipped=3.0 2024-08-17 17:47:23,949 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 450, loss[loss=0.09886, beats_loss=0.01309, ecapa_loss=0.0001455, whisper_loss=0.08432, over 22956.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01025, ecapa_loss=0.0001507, whisper_loss=0.09028, over 3453131.72 frames. ], batch size: 96, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:47:24,595 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.96 vs. limit=12.0 2024-08-17 17:47:27,135 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.27 vs. limit=15.0 2024-08-17 17:47:31,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3435920.0, ans=0.0 2024-08-17 17:47:50,455 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.184e+01 2024-08-17 17:47:55,405 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 12 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-17 17:47:55,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3436120.0, ans=0.0 2024-08-17 17:47:59,385 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 36 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-17 17:48:20,272 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.15 vs. limit=15.0 2024-08-17 17:48:27,022 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3436320.0, ans=0.1 2024-08-17 17:48:31,559 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 500, loss[loss=0.07696, beats_loss=0.01243, ecapa_loss=0.000141, whisper_loss=0.06313, over 17036.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01024, ecapa_loss=0.0001497, whisper_loss=0.08973, over 3515786.83 frames. ], batch size: 67, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:48:36,160 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.46 vs. limit=15.0 2024-08-17 17:48:37,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3436420.0, ans=0.1 2024-08-17 17:48:43,071 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 32 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-17 17:48:57,909 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3436620.0, ans=0.0 2024-08-17 17:48:57,971 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3436620.0, ans=0.125 2024-08-17 17:49:04,149 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3436620.0, ans=0.1 2024-08-17 17:49:28,573 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3436820.0, ans=0.125 2024-08-17 17:49:29,629 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 19 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-17 17:49:30,215 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.37 vs. limit=15.0 2024-08-17 17:49:35,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3436820.0, ans=0.125 2024-08-17 17:49:39,166 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.385e+01 2.606e+01 2.957e+01 2.283e+02, threshold=5.212e+01, percent-clipped=2.0 2024-08-17 17:49:40,485 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 550, loss[loss=0.07951, beats_loss=0.01258, ecapa_loss=0.000134, whisper_loss=0.06558, over 17386.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01029, ecapa_loss=0.0001487, whisper_loss=0.08997, over 3610502.51 frames. ], batch size: 69, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:49:43,904 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3436920.0, ans=0.0 2024-08-17 17:49:55,088 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-17 17:50:02,168 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 27 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-17 17:50:22,599 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 30 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-17 17:50:49,256 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3437420.0, ans=0.2 2024-08-17 17:50:50,288 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 600, loss[loss=0.1088, beats_loss=0.01101, ecapa_loss=0.0001276, whisper_loss=0.09648, over 22665.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01041, ecapa_loss=0.0001467, whisper_loss=0.09027, over 3654241.88 frames. ], batch size: 89, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:51:07,240 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3437520.0, ans=0.125 2024-08-17 17:51:18,371 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 24 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-17 17:51:19,675 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 26 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-17 17:51:35,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3437720.0, ans=0.0 2024-08-17 17:51:47,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3437820.0, ans=0.05 2024-08-17 17:51:57,451 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.305e+01 2.570e+01 2.922e+01 6.139e+01, threshold=5.141e+01, percent-clipped=1.0 2024-08-17 17:51:58,861 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 650, loss[loss=0.1002, beats_loss=0.012, ecapa_loss=0.0001376, whisper_loss=0.08687, over 23136.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01038, ecapa_loss=0.0001476, whisper_loss=0.0901, over 3703162.94 frames. ], batch size: 91, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:52:01,963 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-17 17:52:18,165 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3438020.0, ans=0.0 2024-08-17 17:52:32,480 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3438120.0, ans=0.0 2024-08-17 17:52:41,825 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.85 vs. limit=15.0 2024-08-17 17:52:43,988 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 29 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-17 17:52:57,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3438320.0, ans=0.125 2024-08-17 17:53:08,239 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.12 vs. limit=15.0 2024-08-17 17:53:09,967 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 700, loss[loss=0.07665, beats_loss=0.01101, ecapa_loss=0.0001215, whisper_loss=0.06443, over 18341.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01039, ecapa_loss=0.000148, whisper_loss=0.09014, over 3742850.27 frames. ], batch size: 71, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:53:21,187 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-17 17:53:26,232 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.42 vs. limit=6.0 2024-08-17 17:53:29,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3438520.0, ans=0.125 2024-08-17 17:53:33,287 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 15 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-17 17:53:44,784 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=3438620.0, ans=0.1 2024-08-17 17:53:59,882 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3438720.0, ans=0.125 2024-08-17 17:54:06,273 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3438720.0, ans=0.2 2024-08-17 17:54:26,145 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.293e+01 2.490e+01 2.735e+01 3.624e+01, threshold=4.979e+01, percent-clipped=0.0 2024-08-17 17:54:27,684 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 750, loss[loss=0.09407, beats_loss=0.01228, ecapa_loss=0.0001345, whisper_loss=0.08044, over 21506.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01036, ecapa_loss=0.0001482, whisper_loss=0.09012, over 3777027.44 frames. ], batch size: 88, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:54:28,175 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3438920.0, ans=0.0 2024-08-17 17:54:41,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3439020.0, ans=0.2 2024-08-17 17:54:47,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3439020.0, ans=0.125 2024-08-17 17:54:54,505 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 25 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-17 17:55:00,423 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3439120.0, ans=0.125 2024-08-17 17:55:02,171 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 24 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-17 17:55:03,486 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-17 17:55:12,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3439220.0, ans=0.125 2024-08-17 17:55:26,088 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3439220.0, ans=0.1 2024-08-17 17:55:47,415 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 27 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-17 17:55:48,488 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 800, loss[loss=0.1091, beats_loss=0.009163, ecapa_loss=0.0001574, whisper_loss=0.09834, over 18826.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01038, ecapa_loss=0.0001482, whisper_loss=0.08909, over 3763711.45 frames. ], batch size: 75, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:56:00,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3439420.0, ans=0.2 2024-08-17 17:56:07,142 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-17 17:56:07,316 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3439520.0, ans=0.2 2024-08-17 17:56:09,027 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3439520.0, ans=0.125 2024-08-17 17:56:10,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3439520.0, ans=0.125 2024-08-17 17:56:14,748 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3439520.0, ans=0.05 2024-08-17 17:56:36,595 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3439720.0, ans=0.0 2024-08-17 17:56:37,792 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3439720.0, ans=0.0 2024-08-17 17:56:43,209 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.77 vs. limit=15.0 2024-08-17 17:56:45,487 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 17 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-17 17:56:50,214 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3439820.0, ans=0.0 2024-08-17 17:56:57,432 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3439820.0, ans=0.1 2024-08-17 17:57:03,547 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.287e+01 2.519e+01 2.785e+01 3.931e+01, threshold=5.037e+01, percent-clipped=0.0 2024-08-17 17:57:05,175 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 850, loss[loss=0.08182, beats_loss=0.008844, ecapa_loss=0.0001589, whisper_loss=0.07139, over 21542.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01036, ecapa_loss=0.0001481, whisper_loss=0.08891, over 3774008.55 frames. ], batch size: 90, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:57:22,851 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 33 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-17 17:58:28,331 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 17 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-17 17:58:30,583 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 19 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-17 17:58:51,125 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 900, loss[loss=0.1225, beats_loss=0.008877, ecapa_loss=0.0001462, whisper_loss=0.1122, over 22991.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01032, ecapa_loss=0.0001472, whisper_loss=0.08902, over 3768243.72 frames. ], batch size: 88, lr: 2.58e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 17:59:13,692 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3440520.0, ans=0.125 2024-08-17 17:59:26,673 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3440620.0, ans=0.1 2024-08-17 17:59:29,206 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 15 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-17 17:59:44,219 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 22 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-17 17:59:54,299 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.80 vs. limit=15.0 2024-08-17 18:00:13,398 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 29 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-17 18:00:29,629 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 32 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-17 18:00:35,873 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.343e+01 2.504e+01 2.806e+01 4.204e+01, threshold=5.007e+01, percent-clipped=0.0 2024-08-17 18:00:35,892 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 950, loss[loss=0.07971, beats_loss=0.01326, ecapa_loss=0.0001254, whisper_loss=0.0652, over 13850.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01035, ecapa_loss=0.0001478, whisper_loss=0.08885, over 3788531.36 frames. ], batch size: 57, lr: 2.58e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:00:40,400 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-17 18:00:45,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3440920.0, ans=0.125 2024-08-17 18:01:50,373 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 17 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-17 18:01:54,500 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.70 vs. limit=15.0 2024-08-17 18:02:00,779 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 15 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-17 18:02:05,236 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3441220.0, ans=0.07 2024-08-17 18:02:06,223 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-17 18:02:09,188 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3441320.0, ans=0.125 2024-08-17 18:02:30,949 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 1000, loss[loss=0.109, beats_loss=0.008057, ecapa_loss=0.0001948, whisper_loss=0.09897, over 15004.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01042, ecapa_loss=0.0001465, whisper_loss=0.08891, over 3820228.36 frames. ], batch size: 63, lr: 2.58e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:02:36,608 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 20 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-17 18:02:44,720 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-17 18:03:45,642 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-17 18:04:11,902 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-17 18:04:26,148 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.946e+01 2.287e+01 2.499e+01 2.719e+01 4.151e+01, threshold=4.997e+01, percent-clipped=0.0 2024-08-17 18:04:26,170 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 1050, loss[loss=0.1093, beats_loss=0.009365, ecapa_loss=0.0001447, whisper_loss=0.09853, over 16800.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01048, ecapa_loss=0.0001458, whisper_loss=0.08829, over 3815187.34 frames. ], batch size: 65, lr: 2.58e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:04:48,935 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=3442020.0, ans=0.02 2024-08-17 18:04:58,917 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3442020.0, ans=0.125 2024-08-17 18:05:01,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3442020.0, ans=0.125 2024-08-17 18:05:04,330 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-17 18:05:29,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3442220.0, ans=0.1 2024-08-17 18:05:32,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3442220.0, ans=0.125 2024-08-17 18:05:36,121 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 22 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-17 18:05:36,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3442320.0, ans=0.0 2024-08-17 18:05:37,355 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 36 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-17 18:05:44,455 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.84 vs. limit=22.5 2024-08-17 18:05:53,143 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 1100, loss[loss=0.09311, beats_loss=0.01085, ecapa_loss=0.0001779, whisper_loss=0.08048, over 18451.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01045, ecapa_loss=0.000146, whisper_loss=0.08918, over 3829570.95 frames. ], batch size: 75, lr: 2.58e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:05:54,828 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 25 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-17 18:06:15,867 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.42 vs. limit=15.0 2024-08-17 18:06:26,846 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 21 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-17 18:06:30,687 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3442620.0, ans=0.09899494936611666 2024-08-17 18:06:35,966 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 18:06:41,307 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3442720.0, ans=0.1 2024-08-17 18:07:06,069 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.407e+01 2.705e+01 2.966e+01 4.079e+01, threshold=5.411e+01, percent-clipped=0.0 2024-08-17 18:07:06,101 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 1150, loss[loss=0.1065, beats_loss=0.009352, ecapa_loss=0.0001699, whisper_loss=0.09547, over 21787.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01031, ecapa_loss=0.0001471, whisper_loss=0.0901, over 3819723.28 frames. ], batch size: 88, lr: 2.58e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:07:11,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3442920.0, ans=0.2 2024-08-17 18:07:16,122 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 34 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-17 18:07:16,480 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3442920.0, ans=0.1 2024-08-17 18:07:16,526 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3442920.0, ans=0.09899494936611666 2024-08-17 18:07:31,775 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3443020.0, ans=0.125 2024-08-17 18:07:36,852 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.86 vs. limit=15.0 2024-08-17 18:07:50,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3443220.0, ans=10.0 2024-08-17 18:07:56,668 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 12 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-17 18:08:08,236 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 23 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-17 18:08:14,180 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3443320.0, ans=0.2 2024-08-17 18:08:19,089 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 1200, loss[loss=0.1187, beats_loss=0.01015, ecapa_loss=0.0001442, whisper_loss=0.1071, over 20889.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01049, ecapa_loss=0.0001467, whisper_loss=0.0889, over 3819695.53 frames. ], batch size: 84, lr: 2.58e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:08:20,658 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 17 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-17 18:08:42,321 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3443520.0, ans=0.125 2024-08-17 18:08:44,957 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3443520.0, ans=0.1 2024-08-17 18:09:06,295 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.29 vs. limit=12.0 2024-08-17 18:09:16,703 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3443820.0, ans=0.0 2024-08-17 18:09:16,876 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.57 vs. limit=15.0 2024-08-17 18:09:22,481 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 19 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-17 18:09:33,373 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.226e+01 2.659e+01 3.142e+01 2.875e+02, threshold=5.318e+01, percent-clipped=2.0 2024-08-17 18:09:33,397 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 1250, loss[loss=0.1028, beats_loss=0.01085, ecapa_loss=0.0001221, whisper_loss=0.09072, over 16863.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01047, ecapa_loss=0.0001457, whisper_loss=0.08986, over 3830086.52 frames. ], batch size: 64, lr: 2.58e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:09:33,986 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3443920.0, ans=0.125 2024-08-17 18:10:01,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3444020.0, ans=0.125 2024-08-17 18:10:38,249 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-17 18:10:48,852 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 1300, loss[loss=0.09708, beats_loss=0.01122, ecapa_loss=0.0001225, whisper_loss=0.08464, over 14455.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01042, ecapa_loss=0.0001449, whisper_loss=0.09076, over 3840012.63 frames. ], batch size: 56, lr: 2.58e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:11:11,224 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3444520.0, ans=0.125 2024-08-17 18:11:28,072 WARNING [optim.py:496] (2/4) Scaling gradients by 0.06072128564119339, model_norm_threshold=53.17913055419922 2024-08-17 18:11:28,246 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.2.out_combiner.bypass_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.036e+05, grad_sumsq=1.803e+05, orig_rms_sq=5.745e-01 2024-08-17 18:11:36,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3444720.0, ans=0.125 2024-08-17 18:11:38,902 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.334e+05 2024-08-17 18:12:02,735 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 27 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-17 18:12:05,244 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.747e+01 2.210e+01 2.591e+01 3.011e+01 8.758e+02, threshold=5.182e+01, percent-clipped=3.0 2024-08-17 18:12:05,266 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 1350, loss[loss=0.1124, beats_loss=0.008088, ecapa_loss=0.0001707, whisper_loss=0.1026, over 15420.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01043, ecapa_loss=0.0001456, whisper_loss=0.09058, over 3818366.72 frames. ], batch size: 60, lr: 2.58e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:12:07,519 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3444920.0, ans=0.0 2024-08-17 18:12:11,489 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 24 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-17 18:12:15,916 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3444920.0, ans=0.125 2024-08-17 18:12:21,178 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.54 vs. limit=15.0 2024-08-17 18:12:25,323 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3445020.0, ans=0.125 2024-08-17 18:12:29,741 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.02 vs. limit=22.5 2024-08-17 18:12:45,080 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.32 vs. limit=22.5 2024-08-17 18:12:46,013 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3445120.0, ans=0.0 2024-08-17 18:13:12,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3445320.0, ans=0.1 2024-08-17 18:13:12,356 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3445320.0, ans=0.0 2024-08-17 18:13:19,922 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3445420.0, ans=0.125 2024-08-17 18:13:20,691 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 1400, loss[loss=0.07838, beats_loss=0.01356, ecapa_loss=0.0001266, whisper_loss=0.06355, over 19252.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01039, ecapa_loss=0.0001455, whisper_loss=0.09052, over 3812486.77 frames. ], batch size: 78, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:13:22,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3445420.0, ans=0.0 2024-08-17 18:14:22,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3445820.0, ans=0.125 2024-08-17 18:14:24,130 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3445820.0, ans=0.125 2024-08-17 18:14:28,217 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 25 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-17 18:15:06,380 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.243e+01 2.508e+01 2.795e+01 3.559e+01, threshold=5.016e+01, percent-clipped=0.0 2024-08-17 18:15:06,401 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 1450, loss[loss=0.08981, beats_loss=0.01092, ecapa_loss=0.0001566, whisper_loss=0.07733, over 17687.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01041, ecapa_loss=0.0001454, whisper_loss=0.08979, over 3818670.34 frames. ], batch size: 75, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:15:09,632 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 25 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-17 18:15:12,740 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 18:15:12,790 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3445920.0, ans=0.125 2024-08-17 18:15:17,186 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3445920.0, ans=0.0 2024-08-17 18:15:24,646 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 27 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-17 18:15:36,703 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 21 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-17 18:15:56,405 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3446220.0, ans=0.125 2024-08-17 18:16:02,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3446220.0, ans=0.125 2024-08-17 18:16:20,931 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 1500, loss[loss=0.09023, beats_loss=0.01017, ecapa_loss=0.0001582, whisper_loss=0.07847, over 17973.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01049, ecapa_loss=0.0001441, whisper_loss=0.08965, over 3830374.97 frames. ], batch size: 70, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:16:37,055 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 18 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-17 18:17:33,203 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.89 vs. limit=15.0 2024-08-17 18:17:34,411 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3446820.0, ans=0.125 2024-08-17 18:17:36,839 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.317e+01 2.500e+01 2.871e+01 1.026e+02, threshold=5.000e+01, percent-clipped=3.0 2024-08-17 18:17:36,871 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 1550, loss[loss=0.1015, beats_loss=0.0108, ecapa_loss=0.00015, whisper_loss=0.08919, over 22037.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01053, ecapa_loss=0.0001442, whisper_loss=0.0893, over 3830119.31 frames. ], batch size: 86, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:18:14,721 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3447120.0, ans=0.125 2024-08-17 18:18:17,933 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 29 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-17 18:18:39,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3447320.0, ans=0.125 2024-08-17 18:18:51,577 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 1600, loss[loss=0.09903, beats_loss=0.009566, ecapa_loss=0.0001157, whisper_loss=0.08831, over 17669.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01048, ecapa_loss=0.0001442, whisper_loss=0.08979, over 3848940.01 frames. ], batch size: 63, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:18:59,225 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-17 18:19:22,234 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3447620.0, ans=0.015 2024-08-17 18:19:22,360 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3447620.0, ans=0.1 2024-08-17 18:19:26,578 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 23 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-17 18:19:40,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3447720.0, ans=0.2 2024-08-17 18:19:45,903 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 30 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-17 18:19:50,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3447820.0, ans=0.0 2024-08-17 18:19:59,421 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-17 18:20:05,386 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.245e+01 2.501e+01 2.930e+01 4.153e+01, threshold=5.003e+01, percent-clipped=0.0 2024-08-17 18:20:05,407 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 1650, loss[loss=0.1111, beats_loss=0.01059, ecapa_loss=0.000148, whisper_loss=0.09906, over 19498.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01042, ecapa_loss=0.0001447, whisper_loss=0.09001, over 3846655.10 frames. ], batch size: 75, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:20:05,708 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3447920.0, ans=0.1 2024-08-17 18:20:28,081 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3448020.0, ans=0.2 2024-08-17 18:20:36,755 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2024-08-17 18:20:46,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3448120.0, ans=0.2 2024-08-17 18:20:56,113 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 38 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-17 18:21:00,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3448220.0, ans=0.0 2024-08-17 18:21:02,163 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 22 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-17 18:21:04,841 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3448320.0, ans=0.125 2024-08-17 18:21:06,330 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-17 18:21:11,857 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3448320.0, ans=0.2 2024-08-17 18:21:17,153 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 1700, loss[loss=0.1192, beats_loss=0.006778, ecapa_loss=0.0001694, whisper_loss=0.1108, over 16704.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01042, ecapa_loss=0.0001449, whisper_loss=0.09041, over 3842636.74 frames. ], batch size: 64, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:21:22,476 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2024-08-17 18:22:13,002 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.92 vs. limit=15.0 2024-08-17 18:22:23,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3448820.0, ans=0.125 2024-08-17 18:22:26,058 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.381e+01 2.629e+01 2.847e+01 4.282e+01, threshold=5.258e+01, percent-clipped=0.0 2024-08-17 18:22:26,079 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 1750, loss[loss=0.1126, beats_loss=0.008935, ecapa_loss=0.0001248, whisper_loss=0.1024, over 19213.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01038, ecapa_loss=0.0001453, whisper_loss=0.09034, over 3826046.97 frames. ], batch size: 72, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:22:26,527 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 29 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-17 18:22:32,002 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3448920.0, ans=0.035 2024-08-17 18:22:36,986 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-17 18:22:42,267 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 32 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-17 18:22:43,881 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 18:22:43,948 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3449020.0, ans=0.125 2024-08-17 18:23:12,710 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=23.12 vs. limit=22.5 2024-08-17 18:23:20,059 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 22 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-17 18:23:21,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3449320.0, ans=0.0 2024-08-17 18:23:23,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3449320.0, ans=0.125 2024-08-17 18:23:29,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3449320.0, ans=0.125 2024-08-17 18:23:33,293 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 1800, loss[loss=0.09036, beats_loss=0.01124, ecapa_loss=0.0001344, whisper_loss=0.07777, over 16793.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0104, ecapa_loss=0.0001449, whisper_loss=0.09019, over 3845711.16 frames. ], batch size: 66, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:23:41,070 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.25 vs. limit=22.5 2024-08-17 18:24:08,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3449620.0, ans=0.125 2024-08-17 18:24:17,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3449720.0, ans=0.025 2024-08-17 18:24:20,990 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.57 vs. limit=15.0 2024-08-17 18:24:28,156 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3449820.0, ans=0.125 2024-08-17 18:24:39,193 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3449820.0, ans=0.0 2024-08-17 18:24:42,541 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.197e+01 2.415e+01 2.703e+01 3.683e+01, threshold=4.830e+01, percent-clipped=0.0 2024-08-17 18:24:42,562 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 1850, loss[loss=0.1171, beats_loss=0.01066, ecapa_loss=0.0001474, whisper_loss=0.1049, over 23987.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01038, ecapa_loss=0.0001453, whisper_loss=0.0905, over 3819674.61 frames. ], batch size: 92, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:24:53,646 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.32 vs. limit=15.0 2024-08-17 18:25:08,284 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 27 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-17 18:25:13,202 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-17 18:25:20,491 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.09 vs. limit=15.0 2024-08-17 18:25:25,736 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.78 vs. limit=15.0 2024-08-17 18:25:33,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3450220.0, ans=0.125 2024-08-17 18:25:35,262 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3450220.0, ans=0.1 2024-08-17 18:25:43,228 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 27 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-17 18:25:50,903 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 1900, loss[loss=0.1098, beats_loss=0.01047, ecapa_loss=0.0001249, whisper_loss=0.09811, over 22008.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01039, ecapa_loss=0.0001446, whisper_loss=0.09072, over 3822485.95 frames. ], batch size: 86, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:25:51,253 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 25 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-17 18:25:54,924 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 17 from Vox, 50 fro AS 2024-08-17 18:26:02,304 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3450420.0, ans=0.125 2024-08-17 18:26:08,376 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.27 vs. limit=10.0 2024-08-17 18:26:09,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3450520.0, ans=0.07 2024-08-17 18:26:16,020 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3450520.0, ans=0.2 2024-08-17 18:26:19,887 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3450620.0, ans=0.125 2024-08-17 18:26:34,875 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.47 vs. limit=22.5 2024-08-17 18:26:51,968 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.47 vs. limit=15.0 2024-08-17 18:26:59,091 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.266e+01 2.492e+01 2.719e+01 3.794e+02, threshold=4.984e+01, percent-clipped=0.0 2024-08-17 18:26:59,114 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 1950, loss[loss=0.1086, beats_loss=0.01066, ecapa_loss=0.0001527, whisper_loss=0.09646, over 22848.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01033, ecapa_loss=0.0001457, whisper_loss=0.09104, over 3806699.05 frames. ], batch size: 90, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:27:08,717 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3450920.0, ans=0.125 2024-08-17 18:27:21,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3451020.0, ans=0.125 2024-08-17 18:27:21,839 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3451020.0, ans=0.2 2024-08-17 18:27:34,194 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.14 vs. limit=5.0 2024-08-17 18:27:45,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3451220.0, ans=0.1 2024-08-17 18:27:46,252 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 12 from LS+wenet, 9 from Vox, 35 fro AS 2024-08-17 18:27:59,465 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3451320.0, ans=0.1 2024-08-17 18:28:04,495 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 2000, loss[loss=0.1111, beats_loss=0.009693, ecapa_loss=0.0001252, whisper_loss=0.1001, over 14684.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01038, ecapa_loss=0.0001452, whisper_loss=0.09012, over 3790467.84 frames. ], batch size: 56, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:28:14,683 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3451420.0, ans=0.125 2024-08-17 18:28:18,347 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.742e+01 2024-08-17 18:28:23,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3451520.0, ans=0.0 2024-08-17 18:28:26,155 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 22 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-17 18:28:29,755 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.44 vs. limit=15.0 2024-08-17 18:28:32,868 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 19 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-17 18:28:58,313 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.86 vs. limit=22.5 2024-08-17 18:29:00,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3451820.0, ans=0.125 2024-08-17 18:29:09,466 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.46 vs. limit=15.0 2024-08-17 18:29:11,718 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 13 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-17 18:29:12,767 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.396e+01 2.689e+01 3.009e+01 4.514e+01, threshold=5.377e+01, percent-clipped=1.0 2024-08-17 18:29:12,787 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 2050, loss[loss=0.07984, beats_loss=0.008389, ecapa_loss=0.0001769, whisper_loss=0.06968, over 13407.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01045, ecapa_loss=0.0001454, whisper_loss=0.08955, over 3813931.23 frames. ], batch size: 54, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:29:26,311 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.32 vs. limit=22.5 2024-08-17 18:29:33,612 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 25 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-17 18:29:36,089 WARNING [optim.py:496] (2/4) Scaling gradients by 0.08474668860435486, model_norm_threshold=53.77110290527344 2024-08-17 18:29:36,252 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.0.norm.log_scale with proportion 0.19, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.640e+04, grad_sumsq=7.640e+04, orig_rms_sq=1.000e+00 2024-08-17 18:29:38,109 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3452120.0, ans=0.0 2024-08-17 18:29:40,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3452120.0, ans=0.125 2024-08-17 18:29:56,100 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3452220.0, ans=0.0 2024-08-17 18:29:59,731 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 22 from LS+wenet, 21 from Vox, 50 fro AS 2024-08-17 18:30:06,185 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 21 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-17 18:30:16,096 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-17 18:30:18,597 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 2100, loss[loss=0.1112, beats_loss=0.01038, ecapa_loss=0.0001296, whisper_loss=0.09956, over 22233.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01043, ecapa_loss=0.0001436, whisper_loss=0.09005, over 3833396.08 frames. ], batch size: 87, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:30:46,098 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.51 vs. limit=15.0 2024-08-17 18:30:49,633 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 19 from LS+wenet, 27 from Vox, 45 fro AS 2024-08-17 18:31:08,716 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3452720.0, ans=0.2 2024-08-17 18:31:22,494 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-17 18:31:23,643 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.347e+01 2.592e+01 2.946e+01 6.345e+02, threshold=5.183e+01, percent-clipped=4.0 2024-08-17 18:31:23,663 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 2150, loss[loss=0.1082, beats_loss=0.009493, ecapa_loss=0.0001368, whisper_loss=0.09734, over 15191.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0106, ecapa_loss=0.0001425, whisper_loss=0.09002, over 3866707.71 frames. ], batch size: 58, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:31:34,479 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 26 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-17 18:31:37,507 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.23 vs. limit=15.0 2024-08-17 18:31:51,096 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 23 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-17 18:31:55,752 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.96 vs. limit=22.5 2024-08-17 18:31:57,081 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3453120.0, ans=0.0 2024-08-17 18:32:04,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3453220.0, ans=0.125 2024-08-17 18:32:13,736 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-17 18:32:29,493 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 2200, loss[loss=0.1124, beats_loss=0.009011, ecapa_loss=0.0002039, whisper_loss=0.1013, over 18820.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0106, ecapa_loss=0.0001423, whisper_loss=0.09067, over 3844266.15 frames. ], batch size: 82, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:32:31,163 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3453420.0, ans=0.0 2024-08-17 18:32:33,798 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3453420.0, ans=0.125 2024-08-17 18:32:35,433 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.99 vs. limit=10.0 2024-08-17 18:32:50,196 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3453520.0, ans=0.0 2024-08-17 18:32:52,857 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3453520.0, ans=0.0 2024-08-17 18:32:56,516 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-17 18:33:03,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3453620.0, ans=0.0 2024-08-17 18:33:10,561 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 14 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-17 18:33:15,218 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.68 vs. limit=12.0 2024-08-17 18:33:23,717 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3453820.0, ans=0.2 2024-08-17 18:33:31,990 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3453820.0, ans=0.125 2024-08-17 18:33:34,005 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.333e+01 2.532e+01 2.820e+01 1.498e+02, threshold=5.063e+01, percent-clipped=1.0 2024-08-17 18:33:34,026 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 2250, loss[loss=0.09766, beats_loss=0.01155, ecapa_loss=0.0001504, whisper_loss=0.08461, over 19689.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01069, ecapa_loss=0.0001428, whisper_loss=0.09092, over 3862260.64 frames. ], batch size: 82, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:33:42,329 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-17 18:33:42,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3453920.0, ans=0.125 2024-08-17 18:33:46,248 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-17 18:33:55,465 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.88 vs. limit=15.0 2024-08-17 18:33:59,267 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3454120.0, ans=0.125 2024-08-17 18:33:59,286 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3454120.0, ans=0.2 2024-08-17 18:34:01,694 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 18:34:05,534 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-17 18:34:10,525 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 18 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-17 18:34:14,543 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 22 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-17 18:34:40,477 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 2300, loss[loss=0.1005, beats_loss=0.01008, ecapa_loss=0.0001379, whisper_loss=0.08901, over 19311.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01075, ecapa_loss=0.0001433, whisper_loss=0.09104, over 3894300.71 frames. ], batch size: 75, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:34:42,894 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 24 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-17 18:34:45,519 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3454420.0, ans=0.125 2024-08-17 18:35:00,742 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3454520.0, ans=0.1 2024-08-17 18:35:12,659 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3454620.0, ans=0.2 2024-08-17 18:35:24,011 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-17 18:35:26,354 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-17 18:35:32,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3454820.0, ans=0.025 2024-08-17 18:35:35,661 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 15 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-17 18:35:39,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3454820.0, ans=0.5 2024-08-17 18:35:44,469 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.769e+01 2.369e+01 2.599e+01 2.912e+01 4.410e+01, threshold=5.198e+01, percent-clipped=0.0 2024-08-17 18:35:44,490 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 2350, loss[loss=0.08231, beats_loss=0.0146, ecapa_loss=0.0001172, whisper_loss=0.06654, over 22704.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01067, ecapa_loss=0.0001439, whisper_loss=0.09156, over 3909837.11 frames. ], batch size: 94, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:35:47,199 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 24 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-17 18:35:54,002 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3454920.0, ans=0.125 2024-08-17 18:35:58,120 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3455020.0, ans=0.0 2024-08-17 18:36:07,710 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3455020.0, ans=0.125 2024-08-17 18:36:16,106 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.56 vs. limit=22.5 2024-08-17 18:36:18,565 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.89 vs. limit=15.0 2024-08-17 18:36:23,102 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.55 vs. limit=15.0 2024-08-17 18:36:26,238 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-17 18:36:26,396 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3455220.0, ans=0.125 2024-08-17 18:36:32,446 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.41 vs. limit=15.0 2024-08-17 18:36:51,549 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 2400, loss[loss=0.1134, beats_loss=0.008389, ecapa_loss=0.0001651, whisper_loss=0.1033, over 20709.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01057, ecapa_loss=0.0001443, whisper_loss=0.0919, over 3914353.82 frames. ], batch size: 80, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:36:58,686 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 23 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-17 18:37:01,486 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 23 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-17 18:37:03,316 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3455420.0, ans=0.0 2024-08-17 18:37:15,103 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-17 18:37:19,828 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.60 vs. limit=15.0 2024-08-17 18:37:23,399 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3455620.0, ans=0.1 2024-08-17 18:37:36,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3455720.0, ans=0.1 2024-08-17 18:37:44,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3455720.0, ans=0.125 2024-08-17 18:37:49,752 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 17 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-17 18:38:05,951 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.211e+01 2.408e+01 2.768e+01 3.443e+01, threshold=4.816e+01, percent-clipped=0.0 2024-08-17 18:38:05,970 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 2450, loss[loss=0.111, beats_loss=0.01034, ecapa_loss=0.0001515, whisper_loss=0.0991, over 22631.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01059, ecapa_loss=0.0001445, whisper_loss=0.09139, over 3909108.61 frames. ], batch size: 92, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:38:08,806 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3455920.0, ans=0.0 2024-08-17 18:38:24,716 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.32 vs. limit=22.5 2024-08-17 18:38:26,768 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-17 18:38:30,417 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-17 18:38:35,435 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3456020.0, ans=0.0 2024-08-17 18:39:03,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3456220.0, ans=0.125 2024-08-17 18:39:26,363 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 2500, loss[loss=0.07531, beats_loss=0.01065, ecapa_loss=0.0001782, whisper_loss=0.06288, over 12599.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01052, ecapa_loss=0.0001456, whisper_loss=0.09149, over 3917336.81 frames. ], batch size: 55, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:39:43,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3456520.0, ans=0.2 2024-08-17 18:39:43,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3456520.0, ans=0.125 2024-08-17 18:39:46,740 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3456520.0, ans=0.125 2024-08-17 18:40:00,642 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 28 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-17 18:40:01,209 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3456620.0, ans=0.125 2024-08-17 18:40:03,758 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-17 18:40:10,268 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 28 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-17 18:40:13,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3456720.0, ans=0.125 2024-08-17 18:40:21,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3456720.0, ans=0.125 2024-08-17 18:40:22,804 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.07 vs. limit=10.0 2024-08-17 18:40:27,382 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3456820.0, ans=0.1 2024-08-17 18:40:29,083 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3456820.0, ans=0.0 2024-08-17 18:40:35,526 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.25 vs. limit=15.0 2024-08-17 18:40:39,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3456820.0, ans=0.1 2024-08-17 18:40:39,860 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3456820.0, ans=0.2 2024-08-17 18:40:41,781 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.780e+01 2.335e+01 2.460e+01 2.782e+01 3.981e+01, threshold=4.921e+01, percent-clipped=0.0 2024-08-17 18:40:41,805 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 2550, loss[loss=0.07325, beats_loss=0.01303, ecapa_loss=0.0001206, whisper_loss=0.05901, over 13737.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01056, ecapa_loss=0.0001459, whisper_loss=0.09065, over 3899015.30 frames. ], batch size: 55, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:41:19,289 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 24 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-17 18:41:30,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3457220.0, ans=0.125 2024-08-17 18:41:31,724 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 22 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-17 18:41:41,167 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.86 vs. limit=22.5 2024-08-17 18:42:01,966 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 2600, loss[loss=0.0667, beats_loss=0.01382, ecapa_loss=0.0001535, whisper_loss=0.05135, over 14107.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01054, ecapa_loss=0.0001452, whisper_loss=0.09051, over 3861718.93 frames. ], batch size: 58, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:42:05,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3457420.0, ans=0.1 2024-08-17 18:42:13,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3457420.0, ans=0.0 2024-08-17 18:42:24,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3457520.0, ans=0.1 2024-08-17 18:42:27,573 WARNING [optim.py:496] (2/4) Scaling gradients by 0.08065144717693329, model_norm_threshold=49.205039978027344 2024-08-17 18:42:27,748 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.22, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.182e+04, grad_sumsq=8.182e+04, orig_rms_sq=1.000e+00 2024-08-17 18:42:55,885 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3457720.0, ans=0.125 2024-08-17 18:43:04,981 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3457820.0, ans=0.0 2024-08-17 18:43:11,570 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 23 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-17 18:43:14,413 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.998e+01 2.414e+01 2.571e+01 2.892e+01 6.101e+02, threshold=5.143e+01, percent-clipped=1.0 2024-08-17 18:43:14,436 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 2650, loss[loss=0.1024, beats_loss=0.01069, ecapa_loss=0.0001416, whisper_loss=0.09029, over 21796.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01055, ecapa_loss=0.0001451, whisper_loss=0.09082, over 3860706.38 frames. ], batch size: 86, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:43:31,465 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-17 18:43:35,910 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-17 18:44:09,331 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 22 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-17 18:44:18,559 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3458320.0, ans=0.0 2024-08-17 18:44:25,233 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 2700, loss[loss=0.1226, beats_loss=0.009159, ecapa_loss=0.0001372, whisper_loss=0.1121, over 18849.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01053, ecapa_loss=0.0001461, whisper_loss=0.0908, over 3869744.58 frames. ], batch size: 71, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:44:34,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3458420.0, ans=0.0 2024-08-17 18:44:34,981 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 31 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-17 18:44:50,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3458520.0, ans=0.07 2024-08-17 18:45:04,033 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 23 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-17 18:45:28,927 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3458820.0, ans=0.125 2024-08-17 18:45:37,412 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.308e+01 2.578e+01 2.796e+01 3.722e+01, threshold=5.156e+01, percent-clipped=0.0 2024-08-17 18:45:37,434 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 2750, loss[loss=0.103, beats_loss=0.008805, ecapa_loss=0.0001635, whisper_loss=0.09252, over 18071.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01052, ecapa_loss=0.0001461, whisper_loss=0.09079, over 3891486.13 frames. ], batch size: 71, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:45:46,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3458920.0, ans=0.2 2024-08-17 18:45:49,049 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.37 vs. limit=10.0 2024-08-17 18:45:54,825 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 22 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-17 18:46:40,522 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 24 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-17 18:46:48,804 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 2800, loss[loss=0.1133, beats_loss=0.009977, ecapa_loss=0.0001638, whisper_loss=0.1016, over 21910.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01049, ecapa_loss=0.0001453, whisper_loss=0.09123, over 3889327.77 frames. ], batch size: 91, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:46:50,324 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 22 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-17 18:46:56,051 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.11 vs. limit=6.0 2024-08-17 18:47:08,284 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 16 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-17 18:47:20,960 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 16 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-17 18:47:25,217 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 21 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-17 18:47:35,175 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 14 from Vox, 43 fro AS 2024-08-17 18:47:46,025 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 23 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-17 18:47:48,308 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3459820.0, ans=0.025 2024-08-17 18:48:04,523 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.294e+01 2.641e+01 2.883e+01 4.874e+01, threshold=5.283e+01, percent-clipped=0.0 2024-08-17 18:48:04,546 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 2850, loss[loss=0.09814, beats_loss=0.01292, ecapa_loss=0.0001373, whisper_loss=0.08385, over 21994.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01053, ecapa_loss=0.000145, whisper_loss=0.09152, over 3888846.13 frames. ], batch size: 90, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:48:10,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3459920.0, ans=0.2 2024-08-17 18:48:11,826 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.57 vs. limit=12.0 2024-08-17 18:48:18,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3460020.0, ans=0.125 2024-08-17 18:48:22,763 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3460020.0, ans=0.125 2024-08-17 18:48:38,651 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.87 vs. limit=10.0 2024-08-17 18:48:48,594 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 17 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-17 18:48:49,647 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3460220.0, ans=0.0 2024-08-17 18:49:00,394 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-17 18:49:05,417 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3460320.0, ans=0.125 2024-08-17 18:49:21,711 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 2900, loss[loss=0.08616, beats_loss=0.0121, ecapa_loss=0.0001418, whisper_loss=0.07264, over 18521.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01063, ecapa_loss=0.0001451, whisper_loss=0.09095, over 3920974.25 frames. ], batch size: 76, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 18:49:44,908 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.91 vs. limit=15.0 2024-08-17 18:49:54,562 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3460620.0, ans=0.125 2024-08-17 18:50:05,097 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.50 vs. limit=15.0 2024-08-17 18:50:13,641 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.72 vs. limit=15.0 2024-08-17 18:50:20,012 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-17 18:50:24,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3460820.0, ans=0.125 2024-08-17 18:50:29,189 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.97 vs. limit=12.0 2024-08-17 18:50:31,502 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3460820.0, ans=0.125 2024-08-17 18:50:34,548 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.72 vs. limit=6.0 2024-08-17 18:50:35,141 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.046e+01 2.363e+01 2.529e+01 2.876e+01 1.646e+02, threshold=5.058e+01, percent-clipped=2.0 2024-08-17 18:50:35,163 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 2950, loss[loss=0.1099, beats_loss=0.01067, ecapa_loss=0.0001455, whisper_loss=0.09778, over 19172.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01062, ecapa_loss=0.0001465, whisper_loss=0.09051, over 3895232.19 frames. ], batch size: 77, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 18:50:38,069 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-17 18:51:00,892 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3461020.0, ans=0.125 2024-08-17 18:51:08,032 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.81 vs. limit=15.0 2024-08-17 18:51:10,067 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-17 18:51:10,290 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3461120.0, ans=0.125 2024-08-17 18:51:12,615 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-17 18:51:22,948 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.59 vs. limit=22.5 2024-08-17 18:51:44,145 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 3000, loss[loss=0.09634, beats_loss=0.01036, ecapa_loss=0.0001701, whisper_loss=0.08428, over 19327.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01056, ecapa_loss=0.000146, whisper_loss=0.09064, over 3883299.27 frames. ], batch size: 81, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 18:51:44,146 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-17 18:52:21,608 INFO [train_multi_KD3.py:1149] (2/4) Epoch 24, validation on ASR_libri: loss=0.2525, beats_loss=0, ecapa_loss=0.0005269, whisper_loss=0.2472, over 922467.00 frames. 2024-08-17 18:52:37,986 INFO [train_multi_KD3.py:1149] (2/4) Epoch 24, validation on SV_voxceleb1: loss=0.00404, beats_loss=0, ecapa_loss=0.000404, whisper_loss=0, over 939242.00 frames. 2024-08-17 18:54:27,985 INFO [train_multi_KD3.py:1149] (2/4) Epoch 24, validation on AT_audioset: loss=0.02333, beats_loss=0.02333, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-17 18:54:27,994 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31584MB 2024-08-17 18:54:58,850 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.08 vs. limit=15.0 2024-08-17 18:55:06,745 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.41 vs. limit=10.0 2024-08-17 18:55:24,751 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3461820.0, ans=0.0 2024-08-17 18:55:28,443 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 17 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-17 18:55:33,538 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.89 vs. limit=15.0 2024-08-17 18:55:38,520 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.375e+01 2.603e+01 2.812e+01 5.666e+01, threshold=5.206e+01, percent-clipped=2.0 2024-08-17 18:55:38,541 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 3050, loss[loss=0.1186, beats_loss=0.009503, ecapa_loss=0.0001546, whisper_loss=0.1075, over 22304.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01053, ecapa_loss=0.0001472, whisper_loss=0.09111, over 3923478.94 frames. ], batch size: 87, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 18:55:45,748 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3461920.0, ans=0.125 2024-08-17 18:55:52,667 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-17 18:56:00,026 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3462020.0, ans=0.125 2024-08-17 18:56:14,379 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3462120.0, ans=0.0 2024-08-17 18:56:16,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3462120.0, ans=0.0 2024-08-17 18:56:26,977 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-17 18:56:27,204 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3462220.0, ans=0.5 2024-08-17 18:56:45,927 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 3100, loss[loss=0.1234, beats_loss=0.008077, ecapa_loss=0.0001627, whisper_loss=0.1137, over 17756.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01059, ecapa_loss=0.0001478, whisper_loss=0.09124, over 3909160.62 frames. ], batch size: 69, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 18:56:59,591 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-17 18:57:00,263 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3462520.0, ans=0.0 2024-08-17 18:57:14,740 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 14 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-17 18:57:50,855 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.41 vs. limit=12.0 2024-08-17 18:57:54,021 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.850e+01 2.306e+01 2.653e+01 2.984e+01 6.724e+01, threshold=5.307e+01, percent-clipped=1.0 2024-08-17 18:57:54,043 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 3150, loss[loss=0.1105, beats_loss=0.007528, ecapa_loss=0.0001819, whisper_loss=0.1012, over 22112.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01064, ecapa_loss=0.0001476, whisper_loss=0.09091, over 3886826.24 frames. ], batch size: 89, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 18:58:07,944 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.87 vs. limit=15.0 2024-08-17 18:58:17,161 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 18 from LS+wenet, 11 from Vox, 38 fro AS 2024-08-17 18:58:20,195 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3463020.0, ans=0.1 2024-08-17 18:58:23,014 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3463120.0, ans=0.0 2024-08-17 18:58:36,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3463220.0, ans=0.1 2024-08-17 18:58:38,028 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-17 18:58:39,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3463220.0, ans=0.2 2024-08-17 18:58:43,805 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3463220.0, ans=0.125 2024-08-17 18:58:45,049 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3463220.0, ans=0.0 2024-08-17 18:59:03,819 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 3200, loss[loss=0.1039, beats_loss=0.01131, ecapa_loss=0.0001313, whisper_loss=0.09124, over 16087.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01065, ecapa_loss=0.0001473, whisper_loss=0.09042, over 3878649.87 frames. ], batch size: 64, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 18:59:09,587 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-17 18:59:16,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3463520.0, ans=0.0 2024-08-17 18:59:26,287 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.86 vs. limit=22.5 2024-08-17 18:59:32,563 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3463620.0, ans=0.0 2024-08-17 18:59:33,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3463620.0, ans=0.1 2024-08-17 18:59:43,215 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3463720.0, ans=0.125 2024-08-17 18:59:46,976 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3463720.0, ans=0.125 2024-08-17 18:59:51,025 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3463720.0, ans=0.125 2024-08-17 19:00:01,150 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 19 from LS+wenet, 18 from Vox, 17 fro AS 2024-08-17 19:00:04,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3463820.0, ans=0.0 2024-08-17 19:00:04,501 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3463820.0, ans=0.5 2024-08-17 19:00:10,537 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.969e+01 2.338e+01 2.602e+01 2.900e+01 3.800e+01, threshold=5.204e+01, percent-clipped=0.0 2024-08-17 19:00:10,573 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 3250, loss[loss=0.0841, beats_loss=0.01139, ecapa_loss=0.0001429, whisper_loss=0.07128, over 19086.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01058, ecapa_loss=0.000148, whisper_loss=0.09081, over 3847812.93 frames. ], batch size: 76, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:00:23,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3464020.0, ans=0.125 2024-08-17 19:00:39,290 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 13 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-17 19:00:49,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3464220.0, ans=0.125 2024-08-17 19:00:51,557 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 35 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-17 19:01:16,096 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 3300, loss[loss=0.1113, beats_loss=0.01063, ecapa_loss=0.0001422, whisper_loss=0.09928, over 18403.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01063, ecapa_loss=0.0001482, whisper_loss=0.09044, over 3841584.93 frames. ], batch size: 71, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:01:34,503 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.30 vs. limit=15.0 2024-08-17 19:01:48,380 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 19:01:50,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3464620.0, ans=0.125 2024-08-17 19:01:55,974 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 27 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-17 19:02:17,125 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-17 19:02:21,749 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.81 vs. limit=10.0 2024-08-17 19:02:22,174 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.671e+01 2.198e+01 2.485e+01 2.907e+01 4.364e+01, threshold=4.970e+01, percent-clipped=0.0 2024-08-17 19:02:22,195 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 3350, loss[loss=0.09624, beats_loss=0.01019, ecapa_loss=0.000156, whisper_loss=0.08449, over 22312.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01054, ecapa_loss=0.0001484, whisper_loss=0.09094, over 3833879.08 frames. ], batch size: 91, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:02:30,724 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3464920.0, ans=0.125 2024-08-17 19:02:34,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3465020.0, ans=0.125 2024-08-17 19:02:38,607 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 14 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-17 19:02:44,497 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.47 vs. limit=15.0 2024-08-17 19:02:56,193 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=3465120.0, ans=0.2 2024-08-17 19:03:14,010 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3465320.0, ans=0.2 2024-08-17 19:03:15,335 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3465320.0, ans=0.1 2024-08-17 19:03:16,861 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3465320.0, ans=0.125 2024-08-17 19:03:19,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3465320.0, ans=0.1 2024-08-17 19:03:25,005 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3465320.0, ans=0.125 2024-08-17 19:03:28,369 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 3400, loss[loss=0.09823, beats_loss=0.007705, ecapa_loss=0.0001801, whisper_loss=0.08873, over 17844.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01056, ecapa_loss=0.0001474, whisper_loss=0.091, over 3856990.19 frames. ], batch size: 70, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:03:37,254 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 30 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-17 19:04:05,363 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3465620.0, ans=0.1 2024-08-17 19:04:23,237 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 26 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-17 19:04:33,205 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.314e+01 2.533e+01 2.860e+01 4.018e+01, threshold=5.066e+01, percent-clipped=0.0 2024-08-17 19:04:33,227 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 3450, loss[loss=0.1182, beats_loss=0.009463, ecapa_loss=0.0001338, whisper_loss=0.1074, over 23398.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01051, ecapa_loss=0.000147, whisper_loss=0.09132, over 3857247.66 frames. ], batch size: 90, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:05:20,953 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 25 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-17 19:05:25,062 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 28 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-17 19:05:39,555 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 3500, loss[loss=0.118, beats_loss=0.009956, ecapa_loss=0.0001438, whisper_loss=0.1066, over 22875.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01049, ecapa_loss=0.0001481, whisper_loss=0.09137, over 3844927.53 frames. ], batch size: 89, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:05:45,778 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.23 vs. limit=15.0 2024-08-17 19:05:56,861 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 22 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-17 19:06:02,321 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3466520.0, ans=0.125 2024-08-17 19:06:03,232 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-17 19:06:35,133 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 20 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-17 19:06:36,535 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 25 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-17 19:06:37,041 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.12 vs. limit=12.0 2024-08-17 19:06:39,082 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 27 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-17 19:06:45,342 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.305e+01 2.521e+01 2.842e+01 7.889e+01, threshold=5.042e+01, percent-clipped=2.0 2024-08-17 19:06:45,361 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 3550, loss[loss=0.09588, beats_loss=0.0103, ecapa_loss=0.0001318, whisper_loss=0.08426, over 23914.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01056, ecapa_loss=0.0001474, whisper_loss=0.09031, over 3841727.08 frames. ], batch size: 93, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:06:56,747 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3466920.0, ans=0.0 2024-08-17 19:07:03,214 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.35 vs. limit=12.0 2024-08-17 19:07:17,754 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-17 19:07:50,104 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3467320.0, ans=0.125 2024-08-17 19:07:52,180 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 3600, loss[loss=0.09203, beats_loss=0.01025, ecapa_loss=0.0001595, whisper_loss=0.08019, over 22507.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01055, ecapa_loss=0.0001468, whisper_loss=0.09055, over 3860200.85 frames. ], batch size: 92, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:07:58,228 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3467420.0, ans=0.1 2024-08-17 19:08:03,558 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.26 vs. limit=15.0 2024-08-17 19:08:07,768 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-17 19:08:14,692 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.90 vs. limit=22.5 2024-08-17 19:08:30,121 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.74 vs. limit=10.0 2024-08-17 19:08:30,451 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 29 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-17 19:08:31,970 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3467720.0, ans=0.125 2024-08-17 19:08:36,344 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.68 vs. limit=5.0 2024-08-17 19:08:38,709 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.87 vs. limit=15.0 2024-08-17 19:08:41,779 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 12 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-17 19:08:47,810 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 27 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-17 19:08:52,886 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 29 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-17 19:08:53,103 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3467820.0, ans=0.1 2024-08-17 19:08:55,110 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.286e+01 2.581e+01 2.801e+01 4.273e+01, threshold=5.162e+01, percent-clipped=0.0 2024-08-17 19:08:55,134 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 3650, loss[loss=0.07193, beats_loss=0.01032, ecapa_loss=0.0001484, whisper_loss=0.06013, over 13833.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01059, ecapa_loss=0.0001465, whisper_loss=0.09085, over 3872529.01 frames. ], batch size: 55, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:08:57,057 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.33 vs. limit=22.5 2024-08-17 19:09:01,481 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3467920.0, ans=0.0 2024-08-17 19:09:02,348 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-17 19:09:13,966 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3468020.0, ans=0.125 2024-08-17 19:09:17,569 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 19 from LS+wenet, 29 from Vox, 27 fro AS 2024-08-17 19:09:19,329 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.95 vs. limit=22.5 2024-08-17 19:09:46,657 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 35 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-17 19:09:57,692 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 3700, loss[loss=0.1143, beats_loss=0.008665, ecapa_loss=0.0001646, whisper_loss=0.104, over 14917.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01051, ecapa_loss=0.0001468, whisper_loss=0.09097, over 3855159.38 frames. ], batch size: 59, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:09:59,534 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3468420.0, ans=0.0 2024-08-17 19:10:11,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3468520.0, ans=0.0 2024-08-17 19:10:17,299 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=22.5 2024-08-17 19:10:26,750 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.14 vs. limit=6.0 2024-08-17 19:11:02,103 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.85 vs. limit=15.0 2024-08-17 19:11:02,331 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.199e+01 2.477e+01 2.865e+01 5.326e+01, threshold=4.955e+01, percent-clipped=1.0 2024-08-17 19:11:02,353 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 3750, loss[loss=0.08347, beats_loss=0.01423, ecapa_loss=0.0001342, whisper_loss=0.0679, over 21878.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01054, ecapa_loss=0.0001459, whisper_loss=0.09056, over 3816569.85 frames. ], batch size: 91, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:11:14,041 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 27 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-17 19:11:14,758 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.23 vs. limit=15.0 2024-08-17 19:11:14,900 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.91 vs. limit=22.5 2024-08-17 19:11:15,723 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3469020.0, ans=0.125 2024-08-17 19:11:16,623 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 21 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-17 19:11:17,751 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 35 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-17 19:11:27,515 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-17 19:11:29,217 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3469120.0, ans=0.07 2024-08-17 19:11:31,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3469120.0, ans=0.1 2024-08-17 19:11:37,420 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 26 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-17 19:11:41,397 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 22 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-17 19:11:49,374 WARNING [optim.py:496] (2/4) Scaling gradients by 0.09380995482206345, model_norm_threshold=49.54741287231445 2024-08-17 19:11:49,542 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.23, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.469e+04, grad_sumsq=6.357e+06, orig_rms_sq=1.018e-02 2024-08-17 19:11:52,481 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3469220.0, ans=0.0 2024-08-17 19:11:53,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3469320.0, ans=0.0 2024-08-17 19:11:54,659 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-17 19:12:02,251 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 18 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-17 19:12:07,366 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 3800, loss[loss=0.1178, beats_loss=0.009663, ecapa_loss=0.0001316, whisper_loss=0.1068, over 22518.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01061, ecapa_loss=0.0001469, whisper_loss=0.09018, over 3827985.48 frames. ], batch size: 83, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:12:15,483 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3469420.0, ans=0.0 2024-08-17 19:12:16,748 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3469420.0, ans=0.2 2024-08-17 19:12:27,516 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3469520.0, ans=0.1 2024-08-17 19:12:31,403 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 22 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-17 19:12:36,086 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.92 vs. limit=15.0 2024-08-17 19:12:38,386 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 36 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-17 19:12:49,103 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 18 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-17 19:12:51,168 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3469720.0, ans=0.125 2024-08-17 19:12:55,624 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.72 vs. limit=10.0 2024-08-17 19:12:59,173 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3469720.0, ans=0.125 2024-08-17 19:13:06,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3469820.0, ans=0.125 2024-08-17 19:13:13,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3469820.0, ans=0.2 2024-08-17 19:13:16,153 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.982e+01 2.363e+01 2.589e+01 3.155e+01 5.282e+02, threshold=5.179e+01, percent-clipped=3.0 2024-08-17 19:13:16,175 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 3850, loss[loss=0.1016, beats_loss=0.01209, ecapa_loss=0.0001372, whisper_loss=0.08813, over 22884.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01059, ecapa_loss=0.0001477, whisper_loss=0.09036, over 3833365.96 frames. ], batch size: 93, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:13:17,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3469920.0, ans=10.0 2024-08-17 19:13:50,672 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-17 19:13:52,487 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.76 vs. limit=15.0 2024-08-17 19:14:01,376 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 16 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-17 19:14:01,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3470220.0, ans=0.09899494936611666 2024-08-17 19:14:03,073 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3470220.0, ans=0.0 2024-08-17 19:14:07,727 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3470220.0, ans=0.125 2024-08-17 19:14:17,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3470320.0, ans=0.1 2024-08-17 19:14:21,439 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.502e+01 2024-08-17 19:14:25,592 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3470320.0, ans=0.0 2024-08-17 19:14:27,578 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 3900, loss[loss=0.1223, beats_loss=0.009193, ecapa_loss=0.0001789, whisper_loss=0.1114, over 18855.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01058, ecapa_loss=0.0001488, whisper_loss=0.09023, over 3846666.26 frames. ], batch size: 76, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:14:29,390 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=15.0 2024-08-17 19:14:44,735 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-17 19:14:46,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3470520.0, ans=0.1 2024-08-17 19:14:49,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3470520.0, ans=0.1 2024-08-17 19:15:02,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3470620.0, ans=0.0 2024-08-17 19:15:05,174 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3470620.0, ans=0.125 2024-08-17 19:15:10,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3470720.0, ans=0.0 2024-08-17 19:15:21,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3470820.0, ans=0.125 2024-08-17 19:15:28,279 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 24 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-17 19:15:35,749 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.756e+01 2.330e+01 2.586e+01 2.887e+01 6.168e+01, threshold=5.173e+01, percent-clipped=1.0 2024-08-17 19:15:35,771 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 3950, loss[loss=0.107, beats_loss=0.0104, ecapa_loss=0.0001477, whisper_loss=0.09511, over 22733.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01054, ecapa_loss=0.0001491, whisper_loss=0.09126, over 3872313.32 frames. ], batch size: 91, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:15:37,844 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.51 vs. limit=15.0 2024-08-17 19:15:58,500 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-17 19:16:21,356 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.99 vs. limit=12.0 2024-08-17 19:16:22,170 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3471220.0, ans=0.125 2024-08-17 19:16:22,704 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.95 vs. limit=5.0 2024-08-17 19:16:23,131 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 16 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-17 19:16:29,911 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 21 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-17 19:16:43,054 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 4000, loss[loss=0.1142, beats_loss=0.009341, ecapa_loss=0.0001619, whisper_loss=0.1032, over 14667.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01051, ecapa_loss=0.0001491, whisper_loss=0.09077, over 3878790.96 frames. ], batch size: 56, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:17:00,665 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-17 19:17:03,405 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3471520.0, ans=0.125 2024-08-17 19:17:19,144 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3471720.0, ans=0.125 2024-08-17 19:17:25,112 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 26 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-17 19:17:25,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3471720.0, ans=0.0 2024-08-17 19:17:26,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3471720.0, ans=0.125 2024-08-17 19:17:32,332 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 34 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-17 19:17:42,775 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 19 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-17 19:17:44,914 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.303e+01 2.468e+01 2.762e+01 4.613e+01, threshold=4.936e+01, percent-clipped=0.0 2024-08-17 19:17:44,935 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 4050, loss[loss=0.1119, beats_loss=0.008892, ecapa_loss=0.0002096, whisper_loss=0.1009, over 21231.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0105, ecapa_loss=0.0001493, whisper_loss=0.09089, over 3879692.65 frames. ], batch size: 89, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:17:46,974 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.85 vs. limit=22.5 2024-08-17 19:17:57,736 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-17 19:18:02,969 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3472020.0, ans=0.0 2024-08-17 19:18:16,922 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.40 vs. limit=10.0 2024-08-17 19:18:32,144 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.60 vs. limit=15.0 2024-08-17 19:18:41,939 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3472320.0, ans=0.125 2024-08-17 19:18:47,034 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3472420.0, ans=0.0 2024-08-17 19:18:47,907 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 4100, loss[loss=0.0909, beats_loss=0.01091, ecapa_loss=0.000185, whisper_loss=0.07814, over 18424.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01058, ecapa_loss=0.0001493, whisper_loss=0.09043, over 3879865.06 frames. ], batch size: 81, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:18:50,311 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 18 from Vox, 49 fro AS 2024-08-17 19:19:00,645 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3472520.0, ans=0.125 2024-08-17 19:19:04,898 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.87 vs. limit=12.0 2024-08-17 19:19:17,471 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 24 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-17 19:19:19,879 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 27 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-17 19:19:27,527 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3472720.0, ans=0.0 2024-08-17 19:19:41,765 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3472820.0, ans=0.0 2024-08-17 19:19:48,542 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-17 19:19:49,838 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.275e+01 2.466e+01 2.807e+01 1.119e+02, threshold=4.931e+01, percent-clipped=1.0 2024-08-17 19:19:49,860 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 4150, loss[loss=0.1008, beats_loss=0.01017, ecapa_loss=0.0001422, whisper_loss=0.08921, over 19621.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0106, ecapa_loss=0.000148, whisper_loss=0.0908, over 3895067.52 frames. ], batch size: 75, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:19:54,115 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 37 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-17 19:20:03,896 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3473020.0, ans=0.125 2024-08-17 19:20:21,522 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3473120.0, ans=0.1 2024-08-17 19:20:22,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3473120.0, ans=0.1 2024-08-17 19:20:41,690 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.78 vs. limit=15.0 2024-08-17 19:20:42,599 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-17 19:20:46,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3473320.0, ans=0.04949747468305833 2024-08-17 19:20:52,361 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 4200, loss[loss=0.09169, beats_loss=0.01267, ecapa_loss=0.0001212, whisper_loss=0.07781, over 23045.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01056, ecapa_loss=0.0001476, whisper_loss=0.09152, over 3876600.85 frames. ], batch size: 93, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:21:00,783 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.04 vs. limit=15.0 2024-08-17 19:21:06,237 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-17 19:21:08,424 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2024-08-17 19:21:20,634 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.11 vs. limit=22.5 2024-08-17 19:21:50,640 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3473820.0, ans=0.125 2024-08-17 19:21:53,829 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 38 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-17 19:21:54,119 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3473920.0, ans=0.125 2024-08-17 19:21:55,003 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.224e+01 2.485e+01 2.719e+01 4.060e+01, threshold=4.970e+01, percent-clipped=0.0 2024-08-17 19:21:55,024 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 4250, loss[loss=0.1239, beats_loss=0.009362, ecapa_loss=0.0001839, whisper_loss=0.1127, over 22521.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01057, ecapa_loss=0.0001478, whisper_loss=0.09158, over 3902723.71 frames. ], batch size: 94, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:22:04,259 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3473920.0, ans=0.125 2024-08-17 19:22:09,006 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 16 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-17 19:22:18,033 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3474020.0, ans=0.125 2024-08-17 19:22:21,768 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3474120.0, ans=0.125 2024-08-17 19:22:24,523 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-17 19:22:25,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3474120.0, ans=0.125 2024-08-17 19:22:29,407 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-17 19:22:37,153 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3474220.0, ans=0.1 2024-08-17 19:22:42,004 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-17 19:22:48,399 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-17 19:22:58,004 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 4300, loss[loss=0.1227, beats_loss=0.009723, ecapa_loss=0.0001394, whisper_loss=0.1116, over 23462.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0105, ecapa_loss=0.000148, whisper_loss=0.09145, over 3901934.79 frames. ], batch size: 90, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:23:02,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=3474420.0, ans=0.05 2024-08-17 19:23:03,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3474420.0, ans=0.09899494936611666 2024-08-17 19:23:06,209 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3474420.0, ans=0.125 2024-08-17 19:23:08,787 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.06 vs. limit=15.0 2024-08-17 19:23:21,922 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3474620.0, ans=0.125 2024-08-17 19:23:21,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3474620.0, ans=0.1 2024-08-17 19:23:25,876 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-17 19:23:30,756 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 37 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-17 19:23:34,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3474720.0, ans=0.125 2024-08-17 19:23:44,944 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3474720.0, ans=0.125 2024-08-17 19:24:01,024 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.274e+01 2.524e+01 2.891e+01 1.236e+02, threshold=5.048e+01, percent-clipped=2.0 2024-08-17 19:24:01,046 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 4350, loss[loss=0.09041, beats_loss=0.01087, ecapa_loss=0.0001197, whisper_loss=0.07834, over 15724.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01057, ecapa_loss=0.000149, whisper_loss=0.09056, over 3853589.85 frames. ], batch size: 59, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:24:02,658 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-17 19:24:07,503 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3474920.0, ans=0.1 2024-08-17 19:24:09,748 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 22 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-17 19:24:18,788 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3475020.0, ans=0.125 2024-08-17 19:24:35,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3475120.0, ans=0.125 2024-08-17 19:24:36,031 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-17 19:24:44,039 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 21 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-17 19:24:56,047 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 29 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-17 19:24:56,376 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3475320.0, ans=0.125 2024-08-17 19:25:01,269 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-17 19:25:03,558 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 4400, loss[loss=0.1045, beats_loss=0.009918, ecapa_loss=0.0001879, whisper_loss=0.09275, over 22425.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01057, ecapa_loss=0.0001495, whisper_loss=0.09057, over 3873799.30 frames. ], batch size: 90, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:25:06,363 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3475420.0, ans=0.0 2024-08-17 19:25:11,542 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.58 vs. limit=15.0 2024-08-17 19:25:23,470 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3475520.0, ans=0.2 2024-08-17 19:25:23,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3475520.0, ans=0.2 2024-08-17 19:25:24,195 WARNING [optim.py:496] (2/4) Scaling gradients by 0.0026171719655394554, model_norm_threshold=50.480224609375 2024-08-17 19:25:24,363 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.45, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.667e+08, grad_sumsq=1.636e+10, orig_rms_sq=1.019e-02 2024-08-17 19:25:26,056 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3475520.0, ans=0.125 2024-08-17 19:25:28,009 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 14 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-17 19:25:34,409 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3475620.0, ans=0.1 2024-08-17 19:25:39,331 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-17 19:26:05,666 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.732e+01 2.336e+01 2.604e+01 2.872e+01 1.929e+04, threshold=5.209e+01, percent-clipped=1.0 2024-08-17 19:26:05,687 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 4450, loss[loss=0.1017, beats_loss=0.01023, ecapa_loss=0.0001562, whisper_loss=0.08993, over 21887.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01058, ecapa_loss=0.0001487, whisper_loss=0.09049, over 3853850.40 frames. ], batch size: 90, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:26:06,661 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.50 vs. limit=15.0 2024-08-17 19:26:08,295 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 16 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-17 19:26:27,101 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 22 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-17 19:26:37,176 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 19 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-17 19:26:58,492 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-17 19:26:58,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3476320.0, ans=0.125 2024-08-17 19:26:59,700 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 32 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-17 19:27:08,314 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 4500, loss[loss=0.1054, beats_loss=0.008023, ecapa_loss=0.0001573, whisper_loss=0.09584, over 15803.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0106, ecapa_loss=0.0001485, whisper_loss=0.09007, over 3875276.06 frames. ], batch size: 63, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:27:13,776 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3476420.0, ans=0.035 2024-08-17 19:27:27,694 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 25 from LS+wenet, 11 from Vox, 36 fro AS 2024-08-17 19:27:31,257 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.06 vs. limit=15.0 2024-08-17 19:27:37,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3476620.0, ans=0.05 2024-08-17 19:27:51,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3476720.0, ans=0.1 2024-08-17 19:27:52,325 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3476720.0, ans=0.125 2024-08-17 19:27:53,058 WARNING [optim.py:496] (2/4) Scaling gradients by 0.05500377342104912, model_norm_threshold=52.087669372558594 2024-08-17 19:27:53,228 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.2.out_combiner.bypass_scale with proportion 0.09, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.494e+04, grad_sumsq=1.471e+05, orig_rms_sq=5.773e-01 2024-08-17 19:27:55,658 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-17 19:28:04,273 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-17 19:28:10,283 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.751e+01 2.357e+01 2.568e+01 2.877e+01 9.470e+02, threshold=5.136e+01, percent-clipped=3.0 2024-08-17 19:28:10,304 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 4550, loss[loss=0.09108, beats_loss=0.01171, ecapa_loss=0.0001502, whisper_loss=0.07787, over 22274.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01059, ecapa_loss=0.0001494, whisper_loss=0.09015, over 3902340.41 frames. ], batch size: 93, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:28:12,009 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3476920.0, ans=0.09899494936611666 2024-08-17 19:28:20,620 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.71 vs. limit=10.0 2024-08-17 19:28:24,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3477020.0, ans=0.0 2024-08-17 19:28:39,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3477120.0, ans=0.125 2024-08-17 19:28:46,774 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.37 vs. limit=15.0 2024-08-17 19:28:48,950 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 33 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-17 19:28:59,970 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3477220.0, ans=0.0 2024-08-17 19:29:02,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3477320.0, ans=0.1 2024-08-17 19:29:16,671 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.974e-01 2024-08-17 19:29:17,432 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 4600, loss[loss=0.09855, beats_loss=0.01011, ecapa_loss=0.0001707, whisper_loss=0.08673, over 21918.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01056, ecapa_loss=0.0001487, whisper_loss=0.09034, over 3908125.35 frames. ], batch size: 91, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:29:17,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3477420.0, ans=0.0 2024-08-17 19:29:33,685 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 29 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-17 19:29:34,917 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 30 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-17 19:29:41,330 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 16 from Vox, 48 fro AS 2024-08-17 19:29:49,859 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 28 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-17 19:30:02,584 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 22 from LS+wenet, 26 from Vox, 47 fro AS 2024-08-17 19:30:09,717 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-17 19:30:24,456 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.235e+01 2.492e+01 2.713e+01 4.136e+01, threshold=4.985e+01, percent-clipped=0.0 2024-08-17 19:30:24,477 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 4650, loss[loss=0.1053, beats_loss=0.01141, ecapa_loss=0.0001077, whisper_loss=0.09279, over 21633.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01062, ecapa_loss=0.0001482, whisper_loss=0.09009, over 3913780.01 frames. ], batch size: 82, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:30:31,647 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3477920.0, ans=0.0 2024-08-17 19:30:39,149 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-17 19:30:46,310 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3478020.0, ans=0.125 2024-08-17 19:31:20,110 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3478320.0, ans=0.0 2024-08-17 19:31:22,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3478320.0, ans=0.125 2024-08-17 19:31:28,795 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3478320.0, ans=0.125 2024-08-17 19:31:32,485 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 4700, loss[loss=0.12, beats_loss=0.008175, ecapa_loss=0.0001504, whisper_loss=0.1103, over 23390.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01059, ecapa_loss=0.000148, whisper_loss=0.09053, over 3899877.50 frames. ], batch size: 91, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:31:43,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3478420.0, ans=0.125 2024-08-17 19:31:50,770 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-17 19:32:15,672 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.10 vs. limit=15.0 2024-08-17 19:32:34,061 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.92 vs. limit=12.0 2024-08-17 19:32:39,304 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.685e+01 2.375e+01 2.548e+01 2.840e+01 1.213e+02, threshold=5.097e+01, percent-clipped=2.0 2024-08-17 19:32:39,327 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 4750, loss[loss=0.1084, beats_loss=0.009677, ecapa_loss=0.0001124, whisper_loss=0.09761, over 16230.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01055, ecapa_loss=0.0001478, whisper_loss=0.09078, over 3902447.19 frames. ], batch size: 59, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:32:58,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3479020.0, ans=0.0 2024-08-17 19:32:59,799 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3479020.0, ans=0.0 2024-08-17 19:33:06,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3479120.0, ans=0.0 2024-08-17 19:33:09,390 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3479120.0, ans=0.125 2024-08-17 19:33:12,028 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3479120.0, ans=0.125 2024-08-17 19:33:30,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3479220.0, ans=0.125 2024-08-17 19:33:30,645 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3479220.0, ans=0.125 2024-08-17 19:33:46,504 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 4800, loss[loss=0.08194, beats_loss=0.01197, ecapa_loss=0.0001514, whisper_loss=0.06846, over 21199.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01062, ecapa_loss=0.0001473, whisper_loss=0.08989, over 3912811.97 frames. ], batch size: 89, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:33:59,885 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 21 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-17 19:34:01,887 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.31 vs. limit=15.0 2024-08-17 19:34:15,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3479620.0, ans=0.125 2024-08-17 19:34:17,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3479620.0, ans=0.2 2024-08-17 19:34:18,399 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3479620.0, ans=0.125 2024-08-17 19:34:29,498 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.95 vs. limit=15.0 2024-08-17 19:34:30,057 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-17 19:34:40,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3479820.0, ans=0.035 2024-08-17 19:34:41,762 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 21 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-17 19:34:44,340 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3479820.0, ans=0.125 2024-08-17 19:34:44,368 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3479820.0, ans=0.0 2024-08-17 19:34:52,002 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.389e+01 2.627e+01 2.841e+01 3.778e+01, threshold=5.254e+01, percent-clipped=0.0 2024-08-17 19:34:52,024 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 4850, loss[loss=0.1077, beats_loss=0.008734, ecapa_loss=0.0001814, whisper_loss=0.09714, over 15819.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01068, ecapa_loss=0.000147, whisper_loss=0.08996, over 3909798.57 frames. ], batch size: 65, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:35:34,562 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.89 vs. limit=22.5 2024-08-17 19:35:36,959 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.72 vs. limit=15.0 2024-08-17 19:35:42,040 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3480220.0, ans=0.0 2024-08-17 19:35:53,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3480320.0, ans=0.125 2024-08-17 19:35:55,086 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3480320.0, ans=0.1 2024-08-17 19:35:59,732 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 4900, loss[loss=0.1081, beats_loss=0.01101, ecapa_loss=0.0001237, whisper_loss=0.09584, over 16391.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01058, ecapa_loss=0.000148, whisper_loss=0.09041, over 3917148.01 frames. ], batch size: 62, lr: 2.56e-03, grad_scale: 1.152921504606847e+18 2024-08-17 19:36:00,164 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3480420.0, ans=0.2 2024-08-17 19:36:01,219 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 24 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-17 19:36:14,142 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.27 vs. limit=15.0 2024-08-17 19:36:15,187 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3480520.0, ans=0.0 2024-08-17 19:36:33,550 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 30 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-17 19:36:36,488 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3480620.0, ans=0.2 2024-08-17 19:36:41,837 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-17 19:36:44,483 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3480720.0, ans=0.125 2024-08-17 19:36:51,305 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 19 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-17 19:36:55,999 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3480820.0, ans=0.2 2024-08-17 19:36:57,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3480820.0, ans=0.2 2024-08-17 19:37:04,875 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.256e+01 2.549e+01 2.782e+01 4.181e+01, threshold=5.098e+01, percent-clipped=0.0 2024-08-17 19:37:04,897 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 4950, loss[loss=0.101, beats_loss=0.009818, ecapa_loss=0.0001021, whisper_loss=0.09017, over 15454.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01062, ecapa_loss=0.0001475, whisper_loss=0.08974, over 3875658.19 frames. ], batch size: 54, lr: 2.56e-03, grad_scale: 1.152921504606847e+18 2024-08-17 19:37:17,278 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3481020.0, ans=0.1 2024-08-17 19:37:17,291 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3481020.0, ans=0.0 2024-08-17 19:37:17,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3481020.0, ans=0.1 2024-08-17 19:37:31,022 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 15 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-17 19:37:36,321 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3481120.0, ans=0.125 2024-08-17 19:37:38,511 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 21 from LS+wenet, 23 from Vox, 18 fro AS 2024-08-17 19:37:59,871 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 21 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-17 19:38:08,639 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 5000, loss[loss=0.09947, beats_loss=0.0103, ecapa_loss=0.0001502, whisper_loss=0.08767, over 16577.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01057, ecapa_loss=0.000148, whisper_loss=0.09031, over 3843172.46 frames. ], batch size: 67, lr: 2.56e-03, grad_scale: 1.152921504606847e+18 2024-08-17 19:38:12,773 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 26 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-17 19:38:20,352 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-17 19:38:47,319 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=3481720.0, ans=10.0 2024-08-17 19:38:59,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3481820.0, ans=0.0 2024-08-17 19:39:09,779 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.326e+01 2.618e+01 2.940e+01 4.201e+01, threshold=5.237e+01, percent-clipped=0.0 2024-08-17 19:39:09,798 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 5050, loss[loss=0.1008, beats_loss=0.0113, ecapa_loss=0.0001495, whisper_loss=0.08797, over 19432.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0107, ecapa_loss=0.0001475, whisper_loss=0.08961, over 3861597.77 frames. ], batch size: 78, lr: 2.56e-03, grad_scale: 1.152921504606847e+18 2024-08-17 19:39:09,960 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-17 19:39:11,291 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3481920.0, ans=0.2 2024-08-17 19:39:45,198 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.99 vs. limit=15.0 2024-08-17 19:39:45,511 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.65 vs. limit=10.0 2024-08-17 19:39:59,430 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3482220.0, ans=0.0 2024-08-17 19:39:59,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3482220.0, ans=0.125 2024-08-17 19:40:04,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3482320.0, ans=0.0 2024-08-17 19:40:06,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3482320.0, ans=0.125 2024-08-17 19:40:09,571 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 32 from Vox, 37 fro AS 2024-08-17 19:40:13,951 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.72 vs. limit=6.0 2024-08-17 19:40:15,328 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.36 vs. limit=15.0 2024-08-17 19:40:17,453 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 5100, loss[loss=0.1029, beats_loss=0.01061, ecapa_loss=0.0001431, whisper_loss=0.09086, over 18440.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01074, ecapa_loss=0.0001464, whisper_loss=0.08944, over 3876960.41 frames. ], batch size: 73, lr: 2.56e-03, grad_scale: 1.152921504606847e+18 2024-08-17 19:40:20,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3482420.0, ans=0.0 2024-08-17 19:40:39,559 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 18 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-17 19:40:54,828 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 19 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-17 19:41:04,332 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-17 19:41:06,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3482720.0, ans=0.09899494936611666 2024-08-17 19:41:17,576 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.72 vs. limit=22.5 2024-08-17 19:41:30,561 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.372e+01 2.547e+01 2.927e+01 4.256e+01, threshold=5.094e+01, percent-clipped=0.0 2024-08-17 19:41:30,583 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 5150, loss[loss=0.08748, beats_loss=0.01324, ecapa_loss=0.0001348, whisper_loss=0.0729, over 22552.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01073, ecapa_loss=0.0001463, whisper_loss=0.08993, over 3905023.16 frames. ], batch size: 94, lr: 2.56e-03, grad_scale: 1.152921504606847e+18 2024-08-17 19:41:32,794 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3482920.0, ans=0.1 2024-08-17 19:41:33,810 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 26 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-17 19:41:37,291 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3482920.0, ans=0.0 2024-08-17 19:41:55,558 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-17 19:42:47,288 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 5200, loss[loss=0.09834, beats_loss=0.008616, ecapa_loss=0.0001296, whisper_loss=0.08843, over 15881.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01066, ecapa_loss=0.0001462, whisper_loss=0.09014, over 3865076.60 frames. ], batch size: 58, lr: 2.56e-03, grad_scale: 1.152921504606847e+18 2024-08-17 19:43:19,544 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.16 vs. limit=12.0 2024-08-17 19:43:28,144 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.60 vs. limit=15.0 2024-08-17 19:43:47,394 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.68 vs. limit=15.0 2024-08-17 19:43:54,816 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-17 19:43:58,660 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.68 vs. limit=15.0 2024-08-17 19:44:06,799 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 5250, loss[loss=0.1047, beats_loss=0.007418, ecapa_loss=0.000152, whisper_loss=0.09578, over 19089.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01068, ecapa_loss=0.0001462, whisper_loss=0.08965, over 3865136.90 frames. ], batch size: 73, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:44:07,171 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3483920.0, ans=0.125 2024-08-17 19:44:07,892 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.262e+01 2.523e+01 2.892e+01 1.179e+02, threshold=5.045e+01, percent-clipped=2.0 2024-08-17 19:44:15,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3483920.0, ans=0.0 2024-08-17 19:44:23,213 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3484020.0, ans=0.2 2024-08-17 19:44:24,800 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.18 vs. limit=22.5 2024-08-17 19:44:28,056 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3484020.0, ans=0.1 2024-08-17 19:44:40,292 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 25 from LS+wenet, 28 from Vox, 21 fro AS 2024-08-17 19:44:49,570 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 19:44:56,041 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-17 19:45:00,298 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.46 vs. limit=10.0 2024-08-17 19:45:11,191 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 5300, loss[loss=0.1164, beats_loss=0.00903, ecapa_loss=0.000144, whisper_loss=0.1059, over 23600.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0106, ecapa_loss=0.0001474, whisper_loss=0.09109, over 3880680.87 frames. ], batch size: 93, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:45:26,464 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-17 19:45:32,774 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 19 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-17 19:45:34,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3484520.0, ans=0.1 2024-08-17 19:45:36,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3484620.0, ans=0.125 2024-08-17 19:45:58,381 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.89 vs. limit=15.0 2024-08-17 19:46:09,248 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3484820.0, ans=0.125 2024-08-17 19:46:14,049 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 5350, loss[loss=0.1061, beats_loss=0.006069, ecapa_loss=0.0001747, whisper_loss=0.09827, over 17658.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01068, ecapa_loss=0.0001469, whisper_loss=0.08973, over 3869743.00 frames. ], batch size: 68, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:46:15,227 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.678e+01 2.274e+01 2.526e+01 2.902e+01 3.170e+02, threshold=5.052e+01, percent-clipped=3.0 2024-08-17 19:46:25,632 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 31 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-17 19:46:44,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3485120.0, ans=0.125 2024-08-17 19:47:06,537 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.49 vs. limit=22.5 2024-08-17 19:47:18,395 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 5400, loss[loss=0.1237, beats_loss=0.007459, ecapa_loss=0.0001547, whisper_loss=0.1147, over 24104.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01058, ecapa_loss=0.0001468, whisper_loss=0.09044, over 3884178.42 frames. ], batch size: 94, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:47:20,075 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3485420.0, ans=0.1 2024-08-17 19:47:21,379 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3485420.0, ans=0.025 2024-08-17 19:47:33,290 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-17 19:47:34,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3485520.0, ans=0.0 2024-08-17 19:47:51,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3485620.0, ans=0.125 2024-08-17 19:47:52,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3485620.0, ans=0.035 2024-08-17 19:48:29,263 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 5450, loss[loss=0.09882, beats_loss=0.009549, ecapa_loss=0.0001737, whisper_loss=0.08754, over 21666.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01063, ecapa_loss=0.0001467, whisper_loss=0.09004, over 3904188.48 frames. ], batch size: 89, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:48:30,533 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.407e+01 2.677e+01 2.868e+01 8.391e+01, threshold=5.355e+01, percent-clipped=1.0 2024-08-17 19:48:46,880 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3486020.0, ans=0.0 2024-08-17 19:48:49,265 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 18 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-17 19:48:50,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3486020.0, ans=0.0 2024-08-17 19:48:54,192 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 26 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-17 19:48:59,244 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-17 19:49:05,995 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 20 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-17 19:49:42,562 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 18 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-17 19:49:46,127 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 5500, loss[loss=0.08274, beats_loss=0.01304, ecapa_loss=0.0001231, whisper_loss=0.06847, over 19637.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01059, ecapa_loss=0.0001478, whisper_loss=0.09051, over 3865749.77 frames. ], batch size: 79, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 19:49:59,723 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3486420.0, ans=0.2 2024-08-17 19:50:01,635 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.30 vs. limit=22.5 2024-08-17 19:50:04,397 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.05 vs. limit=22.5 2024-08-17 19:50:12,360 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.49 vs. limit=15.0 2024-08-17 19:50:18,534 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 16 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-17 19:50:25,512 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3486620.0, ans=0.0 2024-08-17 19:50:29,436 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.21 vs. limit=15.0 2024-08-17 19:50:37,318 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 29 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-17 19:50:37,647 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3486720.0, ans=0.125 2024-08-17 19:50:40,006 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-17 19:50:48,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3486720.0, ans=0.0 2024-08-17 19:50:49,859 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 23 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-17 19:50:59,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3486820.0, ans=0.125 2024-08-17 19:51:08,079 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3486920.0, ans=0.125 2024-08-17 19:51:08,865 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 5550, loss[loss=0.09895, beats_loss=0.01003, ecapa_loss=0.0001389, whisper_loss=0.08753, over 23217.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01059, ecapa_loss=0.0001484, whisper_loss=0.09098, over 3911528.80 frames. ], batch size: 90, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 19:51:11,879 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.945e+01 2.339e+01 2.599e+01 2.892e+01 2.617e+02, threshold=5.198e+01, percent-clipped=1.0 2024-08-17 19:51:22,077 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3486920.0, ans=0.125 2024-08-17 19:51:24,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3487020.0, ans=0.125 2024-08-17 19:51:26,124 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 24 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-17 19:51:28,332 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3487020.0, ans=0.125 2024-08-17 19:51:30,606 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-17 19:51:46,193 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3487120.0, ans=0.0 2024-08-17 19:52:03,619 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.29 vs. limit=15.0 2024-08-17 19:52:06,092 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.41 vs. limit=22.5 2024-08-17 19:52:12,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3487320.0, ans=0.125 2024-08-17 19:52:15,274 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 14 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-17 19:52:16,917 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 19:52:21,595 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 5600, loss[loss=0.09364, beats_loss=0.01005, ecapa_loss=0.0001645, whisper_loss=0.08194, over 13303.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0106, ecapa_loss=0.0001476, whisper_loss=0.09091, over 3903315.60 frames. ], batch size: 53, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 19:52:37,380 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3487520.0, ans=0.2 2024-08-17 19:52:43,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3487520.0, ans=0.125 2024-08-17 19:52:43,503 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3487520.0, ans=0.0 2024-08-17 19:52:48,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3487620.0, ans=0.125 2024-08-17 19:52:49,732 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-17 19:52:51,757 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.93 vs. limit=15.0 2024-08-17 19:53:01,260 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 16 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-17 19:53:01,975 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.79 vs. limit=12.0 2024-08-17 19:53:08,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3487720.0, ans=0.125 2024-08-17 19:53:15,203 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3487820.0, ans=0.125 2024-08-17 19:53:26,083 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 5650, loss[loss=0.099, beats_loss=0.0091, ecapa_loss=0.0001803, whisper_loss=0.08809, over 15140.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01058, ecapa_loss=0.000148, whisper_loss=0.09068, over 3910153.80 frames. ], batch size: 60, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 19:53:28,915 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+01 2.385e+01 2.591e+01 2.978e+01 4.325e+01, threshold=5.182e+01, percent-clipped=0.0 2024-08-17 19:53:31,918 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 17 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-17 19:53:33,721 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3487920.0, ans=0.125 2024-08-17 19:53:34,619 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 31 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-17 19:53:36,869 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.53 vs. limit=15.0 2024-08-17 19:53:45,832 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.51 vs. limit=22.5 2024-08-17 19:53:46,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3488020.0, ans=0.1 2024-08-17 19:53:59,684 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3488120.0, ans=0.0 2024-08-17 19:54:18,510 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-17 19:54:24,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3488320.0, ans=0.07 2024-08-17 19:54:31,790 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 5700, loss[loss=0.09996, beats_loss=0.01028, ecapa_loss=0.0001927, whisper_loss=0.08775, over 18094.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01059, ecapa_loss=0.0001487, whisper_loss=0.09045, over 3908721.17 frames. ], batch size: 78, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 19:54:35,165 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.83 vs. limit=15.0 2024-08-17 19:54:49,132 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3488520.0, ans=0.0 2024-08-17 19:54:55,218 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-17 19:54:57,881 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-17 19:55:01,782 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3488620.0, ans=0.125 2024-08-17 19:55:05,138 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 24 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-17 19:55:10,745 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-17 19:55:13,351 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-17 19:55:28,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3488820.0, ans=0.2 2024-08-17 19:55:35,805 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 5750, loss[loss=0.09013, beats_loss=0.00889, ecapa_loss=0.0001965, whisper_loss=0.07928, over 13347.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01055, ecapa_loss=0.0001485, whisper_loss=0.09075, over 3894756.04 frames. ], batch size: 56, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 19:55:36,615 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.43 vs. limit=15.0 2024-08-17 19:55:38,278 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.375e+01 2.666e+01 3.139e+01 4.378e+01, threshold=5.332e+01, percent-clipped=0.0 2024-08-17 19:55:54,017 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3489020.0, ans=0.125 2024-08-17 19:56:05,988 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=3489120.0, ans=10.0 2024-08-17 19:56:11,774 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 31 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-17 19:56:12,962 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-17 19:56:19,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3489220.0, ans=0.0 2024-08-17 19:56:20,471 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-17 19:56:35,731 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.12 vs. limit=22.5 2024-08-17 19:56:36,415 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-17 19:56:39,888 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 5800, loss[loss=0.1072, beats_loss=0.007692, ecapa_loss=0.0001591, whisper_loss=0.09789, over 15229.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01062, ecapa_loss=0.0001483, whisper_loss=0.09047, over 3895711.95 frames. ], batch size: 59, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 19:56:42,917 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3489420.0, ans=0.0 2024-08-17 19:56:52,838 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3489520.0, ans=0.125 2024-08-17 19:56:55,284 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 18 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-17 19:57:06,969 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3489620.0, ans=0.2 2024-08-17 19:57:30,920 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-17 19:57:35,010 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3489820.0, ans=0.5 2024-08-17 19:57:39,849 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-17 19:57:45,923 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 5850, loss[loss=0.09266, beats_loss=0.01335, ecapa_loss=0.0001792, whisper_loss=0.07752, over 20295.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01058, ecapa_loss=0.0001489, whisper_loss=0.09057, over 3899398.70 frames. ], batch size: 91, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 19:57:48,653 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.235e+01 2.518e+01 2.723e+01 7.884e+01, threshold=5.036e+01, percent-clipped=1.0 2024-08-17 19:57:50,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3489920.0, ans=0.125 2024-08-17 19:57:59,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3490020.0, ans=0.125 2024-08-17 19:58:07,018 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 36 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-17 19:58:08,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3490020.0, ans=0.125 2024-08-17 19:58:22,597 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3490120.0, ans=0.07 2024-08-17 19:58:25,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3490220.0, ans=0.0 2024-08-17 19:58:38,462 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3490320.0, ans=0.1 2024-08-17 19:58:51,003 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 5900, loss[loss=0.1175, beats_loss=0.009742, ecapa_loss=0.0001549, whisper_loss=0.1062, over 22662.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01059, ecapa_loss=0.0001484, whisper_loss=0.09074, over 3884188.11 frames. ], batch size: 89, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 19:59:04,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3490520.0, ans=0.125 2024-08-17 19:59:08,482 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3490520.0, ans=0.1 2024-08-17 19:59:13,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3490520.0, ans=0.0 2024-08-17 19:59:20,586 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3490620.0, ans=0.0 2024-08-17 19:59:22,966 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3490620.0, ans=0.1 2024-08-17 19:59:23,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3490620.0, ans=0.125 2024-08-17 19:59:29,903 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3490720.0, ans=0.1 2024-08-17 19:59:30,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=3490720.0, ans=0.2 2024-08-17 19:59:44,111 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 35 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-17 19:59:48,419 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 18 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-17 19:59:56,887 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 5950, loss[loss=0.08116, beats_loss=0.01021, ecapa_loss=0.0001375, whisper_loss=0.06957, over 14889.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01056, ecapa_loss=0.0001478, whisper_loss=0.09102, over 3909356.02 frames. ], batch size: 56, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 19:59:59,489 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.656e+01 2.262e+01 2.461e+01 2.864e+01 3.747e+01, threshold=4.922e+01, percent-clipped=0.0 2024-08-17 19:59:59,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3490920.0, ans=0.1 2024-08-17 20:00:17,332 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-17 20:00:21,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3491020.0, ans=0.1 2024-08-17 20:00:28,567 WARNING [optim.py:496] (2/4) Scaling gradients by 0.08441564440727234, model_norm_threshold=49.215904235839844 2024-08-17 20:00:28,735 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.3.norm.log_scale with proportion 0.11, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.653e+04, grad_sumsq=3.653e+04, orig_rms_sq=1.000e+00 2024-08-17 20:00:37,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3491220.0, ans=0.2 2024-08-17 20:00:38,447 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 28 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-17 20:00:39,159 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.35 vs. limit=15.0 2024-08-17 20:00:52,778 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.25 vs. limit=15.0 2024-08-17 20:01:01,646 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 29 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-17 20:01:04,678 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 6000, loss[loss=0.1184, beats_loss=0.008615, ecapa_loss=0.0001983, whisper_loss=0.1078, over 22310.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01058, ecapa_loss=0.0001479, whisper_loss=0.09124, over 3897017.14 frames. ], batch size: 91, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:01:04,678 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-17 20:01:38,345 INFO [train_multi_KD3.py:1149] (2/4) Epoch 24, validation on ASR_libri: loss=0.2507, beats_loss=0, ecapa_loss=0.000535, whisper_loss=0.2453, over 922467.00 frames. 2024-08-17 20:01:55,995 INFO [train_multi_KD3.py:1149] (2/4) Epoch 24, validation on SV_voxceleb1: loss=0.00412, beats_loss=0, ecapa_loss=0.000412, whisper_loss=0, over 939242.00 frames. 2024-08-17 20:03:40,263 INFO [train_multi_KD3.py:1149] (2/4) Epoch 24, validation on AT_audioset: loss=0.02332, beats_loss=0.02332, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-17 20:03:40,267 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31584MB 2024-08-17 20:03:44,373 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-17 20:03:51,956 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3491520.0, ans=0.0 2024-08-17 20:03:58,758 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3491520.0, ans=0.125 2024-08-17 20:03:59,729 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-17 20:04:06,173 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3491620.0, ans=0.125 2024-08-17 20:04:10,261 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.57 vs. limit=10.0 2024-08-17 20:04:10,857 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-17 20:04:18,185 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.65 vs. limit=10.0 2024-08-17 20:04:24,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3491720.0, ans=0.125 2024-08-17 20:04:28,321 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 26 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-17 20:04:29,869 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3491720.0, ans=0.125 2024-08-17 20:04:37,477 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 22 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-17 20:04:39,304 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.50 vs. limit=15.0 2024-08-17 20:04:40,307 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3491820.0, ans=0.0 2024-08-17 20:04:44,841 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 6050, loss[loss=0.115, beats_loss=0.009019, ecapa_loss=0.0001667, whisper_loss=0.1044, over 22671.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0105, ecapa_loss=0.0001476, whisper_loss=0.09175, over 3898697.95 frames. ], batch size: 92, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:04:47,352 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.332e+01 2.568e+01 2.986e+01 5.830e+02, threshold=5.136e+01, percent-clipped=1.0 2024-08-17 20:04:52,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3491920.0, ans=0.025 2024-08-17 20:04:56,693 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-17 20:05:09,481 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 26 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-17 20:05:09,832 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3492020.0, ans=0.0 2024-08-17 20:05:11,634 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3492120.0, ans=0.0 2024-08-17 20:05:13,247 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.17 vs. limit=15.0 2024-08-17 20:05:23,241 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3492120.0, ans=0.1 2024-08-17 20:05:41,438 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-17 20:05:43,139 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-17 20:05:55,382 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 6100, loss[loss=0.1153, beats_loss=0.01238, ecapa_loss=0.0001613, whisper_loss=0.1013, over 22382.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01056, ecapa_loss=0.0001475, whisper_loss=0.09148, over 3911415.97 frames. ], batch size: 92, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:06:00,048 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.78 vs. limit=5.0 2024-08-17 20:06:00,880 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3492420.0, ans=0.125 2024-08-17 20:06:17,562 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=3492520.0, ans=0.025 2024-08-17 20:06:17,716 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=8.163e-03 2024-08-17 20:06:20,282 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3492520.0, ans=0.0 2024-08-17 20:06:28,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3492620.0, ans=0.1 2024-08-17 20:06:31,273 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3492620.0, ans=0.0 2024-08-17 20:07:05,073 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 28 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-17 20:07:06,132 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 6150, loss[loss=0.1169, beats_loss=0.009463, ecapa_loss=0.0001485, whisper_loss=0.1059, over 21471.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01048, ecapa_loss=0.0001478, whisper_loss=0.09164, over 3912176.49 frames. ], batch size: 78, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:07:08,590 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.377e+01 2.638e+01 2.970e+01 6.161e+01, threshold=5.275e+01, percent-clipped=1.0 2024-08-17 20:07:14,210 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 27 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-17 20:07:15,943 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=10.41 vs. limit=10.0 2024-08-17 20:07:25,056 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3493020.0, ans=0.09899494936611666 2024-08-17 20:07:26,670 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 38 from LS+wenet, 27 from Vox, 24 fro AS 2024-08-17 20:07:29,132 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 17 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-17 20:07:31,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3493120.0, ans=0.125 2024-08-17 20:07:51,428 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3493220.0, ans=0.5 2024-08-17 20:08:11,884 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 6200, loss[loss=0.1126, beats_loss=0.009854, ecapa_loss=0.0001751, whisper_loss=0.101, over 22211.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01045, ecapa_loss=0.0001478, whisper_loss=0.09167, over 3883261.21 frames. ], batch size: 91, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:08:20,003 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.98 vs. limit=10.0 2024-08-17 20:08:21,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3493420.0, ans=0.2 2024-08-17 20:08:24,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3493520.0, ans=0.1 2024-08-17 20:08:27,930 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 22 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-17 20:08:28,308 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3493520.0, ans=0.1 2024-08-17 20:09:09,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3493820.0, ans=0.125 2024-08-17 20:09:16,766 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 6250, loss[loss=0.09213, beats_loss=0.01224, ecapa_loss=0.0001245, whisper_loss=0.07865, over 22765.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0105, ecapa_loss=0.0001479, whisper_loss=0.09068, over 3858408.40 frames. ], batch size: 90, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:09:19,389 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.612e+01 2.313e+01 2.518e+01 2.765e+01 5.277e+01, threshold=5.035e+01, percent-clipped=1.0 2024-08-17 20:09:25,900 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 9 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-17 20:09:37,173 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-17 20:09:38,886 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.51 vs. limit=10.0 2024-08-17 20:09:54,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3494120.0, ans=0.2 2024-08-17 20:10:03,232 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 35 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-17 20:10:07,016 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 19 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-17 20:10:22,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=3494420.0, ans=22.5 2024-08-17 20:10:23,129 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 6300, loss[loss=0.1131, beats_loss=0.009493, ecapa_loss=0.0001468, whisper_loss=0.1021, over 16407.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01054, ecapa_loss=0.0001476, whisper_loss=0.09092, over 3829366.15 frames. ], batch size: 62, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:10:23,308 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-17 20:10:27,394 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3494420.0, ans=0.125 2024-08-17 20:10:33,121 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3494420.0, ans=0.125 2024-08-17 20:10:36,559 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 24 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-17 20:10:51,135 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 22 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-17 20:11:06,423 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3494720.0, ans=0.0 2024-08-17 20:11:07,774 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3494720.0, ans=0.0 2024-08-17 20:11:10,687 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3494720.0, ans=0.0 2024-08-17 20:11:14,115 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-17 20:11:20,176 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.74 vs. limit=15.0 2024-08-17 20:11:22,066 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-17 20:11:28,207 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 6350, loss[loss=0.1054, beats_loss=0.008883, ecapa_loss=0.0002031, whisper_loss=0.09451, over 18801.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01045, ecapa_loss=0.0001493, whisper_loss=0.09117, over 3826427.57 frames. ], batch size: 82, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:11:30,935 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.269e+01 2.498e+01 3.027e+01 2.064e+02, threshold=4.996e+01, percent-clipped=1.0 2024-08-17 20:11:40,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3495020.0, ans=0.2 2024-08-17 20:11:44,289 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3495020.0, ans=0.125 2024-08-17 20:11:45,309 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-17 20:11:52,811 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.73 vs. limit=15.0 2024-08-17 20:12:03,387 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-17 20:12:03,879 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.20 vs. limit=22.5 2024-08-17 20:12:04,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3495120.0, ans=0.0 2024-08-17 20:12:31,792 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 6400, loss[loss=0.1301, beats_loss=0.007071, ecapa_loss=0.0001843, whisper_loss=0.1212, over 18786.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01047, ecapa_loss=0.000149, whisper_loss=0.09127, over 3858961.85 frames. ], batch size: 75, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:12:32,185 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3495420.0, ans=0.125 2024-08-17 20:12:33,084 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 27 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-17 20:12:41,534 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.89 vs. limit=15.0 2024-08-17 20:12:50,343 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.98 vs. limit=15.0 2024-08-17 20:12:57,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3495620.0, ans=0.1 2024-08-17 20:13:00,939 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 24 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-17 20:13:02,490 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3495620.0, ans=0.1 2024-08-17 20:13:08,084 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.07 vs. limit=10.0 2024-08-17 20:13:14,871 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 21 from LS+wenet, 19 from Vox, 54 fro AS 2024-08-17 20:13:35,348 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 6450, loss[loss=0.1118, beats_loss=0.009032, ecapa_loss=0.0001694, whisper_loss=0.1011, over 21441.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01057, ecapa_loss=0.000148, whisper_loss=0.09093, over 3886837.63 frames. ], batch size: 86, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:13:38,103 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+01 2.330e+01 2.566e+01 2.943e+01 3.819e+01, threshold=5.133e+01, percent-clipped=0.0 2024-08-17 20:13:41,769 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 19 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-17 20:14:03,670 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 20 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-17 20:14:10,959 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3496120.0, ans=0.125 2024-08-17 20:14:13,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3496220.0, ans=0.04949747468305833 2024-08-17 20:14:14,909 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.24 vs. limit=15.0 2024-08-17 20:14:29,860 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-17 20:14:38,784 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 6500, loss[loss=0.08403, beats_loss=0.01298, ecapa_loss=0.0001105, whisper_loss=0.06995, over 14151.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01054, ecapa_loss=0.0001483, whisper_loss=0.0915, over 3893046.93 frames. ], batch size: 55, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:14:40,367 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3496420.0, ans=0.0 2024-08-17 20:14:45,514 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 21 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-17 20:14:45,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3496420.0, ans=0.125 2024-08-17 20:14:49,255 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3496420.0, ans=0.1 2024-08-17 20:14:56,902 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3496520.0, ans=0.1 2024-08-17 20:14:59,549 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3496520.0, ans=0.0 2024-08-17 20:15:04,726 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.22 vs. limit=15.0 2024-08-17 20:15:08,579 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.50 vs. limit=12.0 2024-08-17 20:15:24,999 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-17 20:15:29,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3496820.0, ans=0.125 2024-08-17 20:15:32,852 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3496820.0, ans=0.1 2024-08-17 20:15:41,281 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 6550, loss[loss=0.1046, beats_loss=0.0118, ecapa_loss=0.000175, whisper_loss=0.09102, over 22041.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01057, ecapa_loss=0.0001474, whisper_loss=0.09169, over 3914703.88 frames. ], batch size: 92, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:15:43,825 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.730e+01 2.275e+01 2.582e+01 2.844e+01 5.439e+01, threshold=5.163e+01, percent-clipped=1.0 2024-08-17 20:15:48,633 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 17 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-17 20:15:50,304 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3496920.0, ans=0.0 2024-08-17 20:16:02,847 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3497020.0, ans=0.0 2024-08-17 20:16:32,869 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 22 from LS+wenet, 35 from Vox, 38 fro AS 2024-08-17 20:16:42,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3497320.0, ans=0.07 2024-08-17 20:16:43,686 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.17 vs. limit=15.0 2024-08-17 20:16:44,144 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 6600, loss[loss=0.1244, beats_loss=0.008004, ecapa_loss=0.0001639, whisper_loss=0.1147, over 22994.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01045, ecapa_loss=0.0001489, whisper_loss=0.09228, over 3930844.99 frames. ], batch size: 91, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:16:44,357 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-17 20:16:54,652 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 38 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-17 20:17:01,942 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 23 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-17 20:17:11,820 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-17 20:17:19,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3497620.0, ans=0.0 2024-08-17 20:17:22,585 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3497720.0, ans=0.125 2024-08-17 20:17:30,306 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3497720.0, ans=0.125 2024-08-17 20:17:42,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3497820.0, ans=0.0 2024-08-17 20:17:48,867 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 6650, loss[loss=0.1197, beats_loss=0.008972, ecapa_loss=0.0001399, whisper_loss=0.1093, over 18177.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.0104, ecapa_loss=0.00015, whisper_loss=0.09204, over 3942377.86 frames. ], batch size: 70, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:17:52,005 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.725e+01 2.296e+01 2.528e+01 2.803e+01 4.875e+01, threshold=5.056e+01, percent-clipped=0.0 2024-08-17 20:18:00,688 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3497920.0, ans=0.125 2024-08-17 20:18:55,595 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 6700, loss[loss=0.09681, beats_loss=0.01184, ecapa_loss=0.0001078, whisper_loss=0.08389, over 17320.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01039, ecapa_loss=0.0001503, whisper_loss=0.09153, over 3894847.29 frames. ], batch size: 63, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:19:14,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3498520.0, ans=0.0 2024-08-17 20:19:32,580 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 26 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-17 20:19:36,725 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-17 20:19:43,489 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 26 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-17 20:19:50,727 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 29 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-17 20:19:51,051 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3498820.0, ans=0.09899494936611666 2024-08-17 20:19:57,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3498820.0, ans=0.1 2024-08-17 20:19:59,390 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.87 vs. limit=22.5 2024-08-17 20:20:02,620 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 6750, loss[loss=0.1233, beats_loss=0.007208, ecapa_loss=0.0001777, whisper_loss=0.1143, over 18844.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01044, ecapa_loss=0.0001489, whisper_loss=0.09143, over 3898012.55 frames. ], batch size: 73, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:20:05,080 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.341e+01 2.553e+01 2.881e+01 4.288e+01, threshold=5.105e+01, percent-clipped=0.0 2024-08-17 20:20:08,016 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-17 20:20:12,268 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 10 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-17 20:20:13,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3498920.0, ans=0.0 2024-08-17 20:20:22,489 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3499020.0, ans=0.125 2024-08-17 20:20:28,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3499120.0, ans=0.07 2024-08-17 20:20:40,498 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-17 20:20:43,665 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.24 vs. limit=15.0 2024-08-17 20:20:45,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3499220.0, ans=0.1 2024-08-17 20:20:47,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3499220.0, ans=10.0 2024-08-17 20:20:49,442 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.12 vs. limit=15.0 2024-08-17 20:20:51,207 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 21 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-17 20:21:02,601 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3499320.0, ans=0.125 2024-08-17 20:21:04,174 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.98 vs. limit=10.0 2024-08-17 20:21:10,349 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 6800, loss[loss=0.114, beats_loss=0.00968, ecapa_loss=0.000139, whisper_loss=0.1029, over 22586.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01052, ecapa_loss=0.0001494, whisper_loss=0.09071, over 3896604.42 frames. ], batch size: 89, lr: 2.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:21:14,312 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-17 20:21:50,551 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 31 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-17 20:21:59,644 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.79 vs. limit=15.0 2024-08-17 20:22:00,938 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3499720.0, ans=0.0 2024-08-17 20:22:01,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3499720.0, ans=0.2 2024-08-17 20:22:10,199 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3499820.0, ans=0.0 2024-08-17 20:22:17,263 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3499820.0, ans=0.2 2024-08-17 20:22:18,650 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.47 vs. limit=6.0 2024-08-17 20:22:19,111 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 6850, loss[loss=0.1021, beats_loss=0.008885, ecapa_loss=0.000159, whisper_loss=0.09159, over 17742.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01057, ecapa_loss=0.0001487, whisper_loss=0.09014, over 3885571.17 frames. ], batch size: 71, lr: 2.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:22:22,025 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.292e+01 2.494e+01 2.760e+01 3.944e+01, threshold=4.988e+01, percent-clipped=0.0 2024-08-17 20:22:22,863 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.46 vs. limit=15.0 2024-08-17 20:22:29,273 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3499920.0, ans=0.125 2024-08-17 20:22:30,202 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 20 from LS+wenet, 29 from Vox, 42 fro AS 2024-08-17 20:22:49,703 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-17 20:22:51,536 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3500120.0, ans=0.125 2024-08-17 20:23:11,723 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=3500220.0, ans=10.0 2024-08-17 20:23:12,363 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 25 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-17 20:23:12,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3500220.0, ans=0.0 2024-08-17 20:23:14,420 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.14 vs. limit=15.0 2024-08-17 20:23:19,168 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3500320.0, ans=0.09899494936611666 2024-08-17 20:23:27,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3500420.0, ans=0.125 2024-08-17 20:23:28,533 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 6900, loss[loss=0.1147, beats_loss=0.008318, ecapa_loss=0.0001646, whisper_loss=0.1048, over 17081.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01066, ecapa_loss=0.0001477, whisper_loss=0.09002, over 3882272.67 frames. ], batch size: 69, lr: 2.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:23:34,799 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3500420.0, ans=0.1 2024-08-17 20:24:07,967 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3500720.0, ans=0.125 2024-08-17 20:24:21,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3500820.0, ans=0.125 2024-08-17 20:24:31,467 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 35 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-17 20:24:31,786 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3500820.0, ans=0.2 2024-08-17 20:24:33,109 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3500820.0, ans=0.125 2024-08-17 20:24:34,144 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 22 from LS+wenet, 32 from Vox, 37 fro AS 2024-08-17 20:24:36,733 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 6950, loss[loss=0.09642, beats_loss=0.009919, ecapa_loss=0.0001688, whisper_loss=0.08481, over 14302.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01066, ecapa_loss=0.0001477, whisper_loss=0.0903, over 3894295.48 frames. ], batch size: 56, lr: 2.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:24:39,294 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.285e+01 2.467e+01 2.826e+01 3.694e+01, threshold=4.933e+01, percent-clipped=0.0 2024-08-17 20:24:55,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3501020.0, ans=0.125 2024-08-17 20:25:05,745 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3501120.0, ans=0.0 2024-08-17 20:25:19,928 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.10 vs. limit=10.0 2024-08-17 20:25:23,893 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.69 vs. limit=15.0 2024-08-17 20:25:24,927 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3501220.0, ans=0.2 2024-08-17 20:25:38,394 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.03 vs. limit=10.0 2024-08-17 20:25:43,692 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 7000, loss[loss=0.1094, beats_loss=0.009114, ecapa_loss=0.0001844, whisper_loss=0.0984, over 17902.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01063, ecapa_loss=0.0001481, whisper_loss=0.0907, over 3899292.14 frames. ], batch size: 75, lr: 2.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:25:56,312 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 23 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-17 20:26:01,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3501520.0, ans=0.0 2024-08-17 20:26:31,113 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3501720.0, ans=0.2 2024-08-17 20:26:32,327 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 26 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-17 20:26:53,303 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 7050, loss[loss=0.1059, beats_loss=0.009632, ecapa_loss=0.0001495, whisper_loss=0.09473, over 23353.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01056, ecapa_loss=0.0001488, whisper_loss=0.09116, over 3887810.85 frames. ], batch size: 94, lr: 2.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:26:56,151 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.324e+01 2.539e+01 2.808e+01 3.658e+01, threshold=5.079e+01, percent-clipped=0.0 2024-08-17 20:27:16,975 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.96 vs. limit=15.0 2024-08-17 20:27:55,321 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3502320.0, ans=0.0 2024-08-17 20:28:02,813 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3502320.0, ans=0.125 2024-08-17 20:28:03,063 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.18 vs. limit=15.0 2024-08-17 20:28:04,823 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 7100, loss[loss=0.08073, beats_loss=0.01255, ecapa_loss=0.0001498, whisper_loss=0.06668, over 15482.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01051, ecapa_loss=0.0001482, whisper_loss=0.09084, over 3855912.44 frames. ], batch size: 64, lr: 2.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:28:09,844 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.66 vs. limit=15.0 2024-08-17 20:28:12,959 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-17 20:28:19,447 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3502520.0, ans=0.0 2024-08-17 20:28:21,212 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3502520.0, ans=0.2 2024-08-17 20:28:26,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3502520.0, ans=0.025 2024-08-17 20:28:32,703 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-17 20:28:33,325 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3502620.0, ans=0.1 2024-08-17 20:28:38,282 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-17 20:28:40,205 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.87 vs. limit=15.0 2024-08-17 20:29:13,404 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 7150, loss[loss=0.1273, beats_loss=0.006181, ecapa_loss=0.0001789, whisper_loss=0.1194, over 17653.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01053, ecapa_loss=0.0001471, whisper_loss=0.09067, over 3884001.26 frames. ], batch size: 65, lr: 2.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:29:16,399 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.338e+01 2.584e+01 3.067e+01 1.427e+02, threshold=5.169e+01, percent-clipped=2.0 2024-08-17 20:29:26,614 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.87 vs. limit=15.0 2024-08-17 20:29:38,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3503020.0, ans=0.125 2024-08-17 20:29:40,226 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3503020.0, ans=0.125 2024-08-17 20:29:54,591 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.25 vs. limit=10.0 2024-08-17 20:29:58,722 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3503220.0, ans=0.0 2024-08-17 20:30:08,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3503220.0, ans=0.0 2024-08-17 20:30:20,923 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-17 20:30:21,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=3503320.0, ans=0.05 2024-08-17 20:30:25,187 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 7200, loss[loss=0.1152, beats_loss=0.00962, ecapa_loss=0.000143, whisper_loss=0.1042, over 20038.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01037, ecapa_loss=0.0001482, whisper_loss=0.09219, over 3916975.01 frames. ], batch size: 77, lr: 2.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:30:37,804 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=3503420.0, ans=15.0 2024-08-17 20:30:38,077 WARNING [optim.py:496] (2/4) Scaling gradients by 0.0867324024438858, model_norm_threshold=51.6881103515625 2024-08-17 20:30:38,247 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.27, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=9.576e+04, grad_sumsq=9.576e+04, orig_rms_sq=1.000e+00 2024-08-17 20:30:39,709 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-17 20:30:54,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3503620.0, ans=0.125 2024-08-17 20:31:35,667 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 7250, loss[loss=0.08991, beats_loss=0.01097, ecapa_loss=0.0001177, whisper_loss=0.07777, over 17628.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01043, ecapa_loss=0.0001481, whisper_loss=0.09179, over 3917988.88 frames. ], batch size: 66, lr: 2.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:31:37,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3503920.0, ans=0.1 2024-08-17 20:31:38,266 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.366e+01 2.657e+01 3.070e+01 5.959e+02, threshold=5.314e+01, percent-clipped=2.0 2024-08-17 20:32:03,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3504120.0, ans=0.2 2024-08-17 20:32:22,731 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 16 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-17 20:32:29,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3504220.0, ans=0.0 2024-08-17 20:32:30,979 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3504220.0, ans=0.125 2024-08-17 20:32:34,931 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-17 20:32:43,901 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 15 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-17 20:32:45,804 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3504320.0, ans=0.125 2024-08-17 20:32:49,208 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 7300, loss[loss=0.08629, beats_loss=0.01154, ecapa_loss=0.0001775, whisper_loss=0.07297, over 21475.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01048, ecapa_loss=0.0001478, whisper_loss=0.09148, over 3892037.71 frames. ], batch size: 93, lr: 2.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:32:57,166 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3504420.0, ans=0.5 2024-08-17 20:33:01,649 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.40 vs. limit=10.0 2024-08-17 20:33:03,741 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 10 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-17 20:33:07,048 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-17 20:33:08,073 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3504520.0, ans=0.0 2024-08-17 20:33:36,847 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 14 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-17 20:33:37,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3504720.0, ans=0.125 2024-08-17 20:33:39,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3504720.0, ans=0.125 2024-08-17 20:33:50,720 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-17 20:33:52,806 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 16 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-17 20:33:53,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3504820.0, ans=0.125 2024-08-17 20:33:53,916 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2024-08-17 20:33:59,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3504820.0, ans=0.0 2024-08-17 20:34:07,448 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 27 from LS+wenet, 19 from Vox, 15 fro AS 2024-08-17 20:34:09,051 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 7350, loss[loss=0.1298, beats_loss=0.006347, ecapa_loss=0.0002117, whisper_loss=0.1214, over 14357.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01055, ecapa_loss=0.0001481, whisper_loss=0.09048, over 3892092.89 frames. ], batch size: 61, lr: 2.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:34:12,373 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.817e+01 2.308e+01 2.645e+01 2.863e+01 4.311e+01, threshold=5.290e+01, percent-clipped=0.0 2024-08-17 20:34:23,338 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-17 20:34:39,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3505120.0, ans=0.0 2024-08-17 20:34:41,818 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3505120.0, ans=0.125 2024-08-17 20:34:43,364 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.57 vs. limit=12.0 2024-08-17 20:34:52,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=3505120.0, ans=0.02 2024-08-17 20:35:13,337 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-17 20:35:17,268 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3505320.0, ans=0.125 2024-08-17 20:35:25,808 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3505320.0, ans=0.125 2024-08-17 20:35:27,864 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.43 vs. limit=15.0 2024-08-17 20:35:29,774 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 7400, loss[loss=0.09711, beats_loss=0.009687, ecapa_loss=0.0001356, whisper_loss=0.08607, over 19477.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01055, ecapa_loss=0.0001479, whisper_loss=0.09073, over 3921304.83 frames. ], batch size: 73, lr: 2.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:35:33,085 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 31 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-17 20:35:39,237 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.95 vs. limit=22.5 2024-08-17 20:35:41,031 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-17 20:35:41,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3505420.0, ans=0.05 2024-08-17 20:35:44,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3505520.0, ans=0.125 2024-08-17 20:35:46,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3505520.0, ans=0.125 2024-08-17 20:36:12,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3505620.0, ans=0.2 2024-08-17 20:36:16,359 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-17 20:36:17,587 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 19 from LS+wenet, 33 from Vox, 41 fro AS 2024-08-17 20:36:19,198 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-17 20:36:23,509 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.46 vs. limit=10.0 2024-08-17 20:36:47,316 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 7450, loss[loss=0.1066, beats_loss=0.0115, ecapa_loss=0.0001403, whisper_loss=0.09371, over 18911.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01055, ecapa_loss=0.0001482, whisper_loss=0.09073, over 3929999.14 frames. ], batch size: 74, lr: 2.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:36:51,132 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.583e+01 2.401e+01 2.543e+01 2.763e+01 3.752e+01, threshold=5.086e+01, percent-clipped=0.0 2024-08-17 20:36:58,366 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3505920.0, ans=0.1 2024-08-17 20:37:08,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3506020.0, ans=0.0 2024-08-17 20:37:20,745 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-17 20:37:24,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3506120.0, ans=0.0 2024-08-17 20:37:25,436 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-17 20:37:26,718 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 17 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-17 20:37:29,296 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3506120.0, ans=0.0 2024-08-17 20:38:05,443 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-17 20:38:06,077 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3506420.0, ans=0.0 2024-08-17 20:38:06,744 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 7500, loss[loss=0.09775, beats_loss=0.01002, ecapa_loss=0.0001751, whisper_loss=0.08599, over 21142.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01059, ecapa_loss=0.0001482, whisper_loss=0.08985, over 3912955.18 frames. ], batch size: 91, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:38:09,915 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.71 vs. limit=15.0 2024-08-17 20:38:15,240 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 22 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-17 20:38:20,082 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 20:38:26,912 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 14 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-17 20:38:47,013 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3506620.0, ans=0.125 2024-08-17 20:39:00,773 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3506720.0, ans=0.125 2024-08-17 20:39:06,019 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 11 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-17 20:39:10,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3506820.0, ans=0.2 2024-08-17 20:39:19,708 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3506820.0, ans=0.125 2024-08-17 20:39:22,076 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 7550, loss[loss=0.09104, beats_loss=0.0129, ecapa_loss=0.000143, whisper_loss=0.07671, over 19227.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0106, ecapa_loss=0.0001475, whisper_loss=0.0897, over 3871053.50 frames. ], batch size: 79, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:39:24,769 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.340e+01 2.512e+01 2.890e+01 6.756e+01, threshold=5.024e+01, percent-clipped=1.0 2024-08-17 20:39:27,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3506920.0, ans=0.1 2024-08-17 20:39:33,415 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3506920.0, ans=0.0 2024-08-17 20:39:45,412 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3507020.0, ans=0.125 2024-08-17 20:39:52,673 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3507120.0, ans=0.0 2024-08-17 20:39:58,778 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3507120.0, ans=0.125 2024-08-17 20:40:04,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3507120.0, ans=0.125 2024-08-17 20:40:06,359 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 17 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-17 20:40:37,092 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 7600, loss[loss=0.1166, beats_loss=0.008543, ecapa_loss=0.000126, whisper_loss=0.1068, over 18120.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01059, ecapa_loss=0.0001476, whisper_loss=0.08928, over 3843070.83 frames. ], batch size: 68, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:40:37,278 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-17 20:40:54,843 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 18 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-17 20:40:58,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3507520.0, ans=0.125 2024-08-17 20:41:02,170 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.09 vs. limit=6.0 2024-08-17 20:41:04,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3507520.0, ans=0.125 2024-08-17 20:41:43,261 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3507820.0, ans=0.2 2024-08-17 20:41:49,141 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 7650, loss[loss=0.1154, beats_loss=0.01041, ecapa_loss=0.0001093, whisper_loss=0.1039, over 23739.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01059, ecapa_loss=0.0001475, whisper_loss=0.08937, over 3847704.06 frames. ], batch size: 90, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:41:52,187 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.783e+01 2.316e+01 2.490e+01 2.754e+01 3.586e+01, threshold=4.980e+01, percent-clipped=0.0 2024-08-17 20:42:01,356 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 21 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-17 20:42:18,149 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3508120.0, ans=0.0 2024-08-17 20:42:19,064 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 17 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-17 20:42:30,427 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.48 vs. limit=15.0 2024-08-17 20:42:35,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3508220.0, ans=0.125 2024-08-17 20:42:56,969 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3508320.0, ans=0.2 2024-08-17 20:43:02,385 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 7700, loss[loss=0.1141, beats_loss=0.009547, ecapa_loss=0.0001389, whisper_loss=0.1031, over 23684.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01058, ecapa_loss=0.0001468, whisper_loss=0.08965, over 3893499.16 frames. ], batch size: 94, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:43:10,080 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 17 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-17 20:43:28,969 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 21 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-17 20:43:33,248 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.33 vs. limit=12.0 2024-08-17 20:43:39,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3508620.0, ans=0.125 2024-08-17 20:43:44,832 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-17 20:44:16,593 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3508820.0, ans=0.125 2024-08-17 20:44:18,607 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.66 vs. limit=15.0 2024-08-17 20:44:20,367 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 7750, loss[loss=0.1133, beats_loss=0.01038, ecapa_loss=0.0001297, whisper_loss=0.1016, over 17939.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01056, ecapa_loss=0.0001467, whisper_loss=0.09045, over 3896584.66 frames. ], batch size: 67, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:44:23,682 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.352e+01 2.581e+01 3.039e+01 8.036e+01, threshold=5.163e+01, percent-clipped=1.0 2024-08-17 20:44:28,561 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 15 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-17 20:44:29,055 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3508920.0, ans=0.125 2024-08-17 20:44:30,686 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.71 vs. limit=15.0 2024-08-17 20:44:44,472 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3509020.0, ans=0.125 2024-08-17 20:44:45,883 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.34 vs. limit=15.0 2024-08-17 20:44:47,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3509020.0, ans=0.125 2024-08-17 20:44:59,384 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 16 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-17 20:45:30,655 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-17 20:45:34,122 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3509320.0, ans=0.125 2024-08-17 20:45:35,123 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-17 20:45:36,113 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 7800, loss[loss=0.1002, beats_loss=0.01087, ecapa_loss=0.0001777, whisper_loss=0.08752, over 22424.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01055, ecapa_loss=0.0001471, whisper_loss=0.09034, over 3896671.18 frames. ], batch size: 92, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:45:39,471 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 27 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-17 20:45:44,116 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 18 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-17 20:46:04,462 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-17 20:46:16,284 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.38 vs. limit=15.0 2024-08-17 20:46:21,935 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3509720.0, ans=6.0 2024-08-17 20:46:38,804 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 26 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-17 20:46:40,693 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 20 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-17 20:46:42,143 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3509820.0, ans=0.125 2024-08-17 20:46:51,373 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 7850, loss[loss=0.08306, beats_loss=0.01169, ecapa_loss=0.0001581, whisper_loss=0.06979, over 19931.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01052, ecapa_loss=0.0001487, whisper_loss=0.09067, over 3890001.53 frames. ], batch size: 82, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:46:53,396 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3509920.0, ans=0.0 2024-08-17 20:46:54,180 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.319e+01 2.575e+01 2.873e+01 4.382e+02, threshold=5.150e+01, percent-clipped=1.0 2024-08-17 20:47:01,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3509920.0, ans=0.2 2024-08-17 20:47:04,127 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 21 from LS+wenet, 19 from Vox, 54 fro AS 2024-08-17 20:47:27,468 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 20:47:33,446 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3510120.0, ans=0.125 2024-08-17 20:47:37,199 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3510220.0, ans=0.125 2024-08-17 20:47:47,516 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.26 vs. limit=22.5 2024-08-17 20:47:49,387 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.51 vs. limit=22.5 2024-08-17 20:47:54,530 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3510320.0, ans=0.0 2024-08-17 20:48:03,790 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 7900, loss[loss=0.1015, beats_loss=0.01082, ecapa_loss=0.0001144, whisper_loss=0.08949, over 17943.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01065, ecapa_loss=0.0001475, whisper_loss=0.08997, over 3889892.20 frames. ], batch size: 67, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:48:19,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=3510520.0, ans=10.0 2024-08-17 20:48:19,620 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.90 vs. limit=10.0 2024-08-17 20:48:39,106 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3510620.0, ans=0.2 2024-08-17 20:48:50,504 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3510720.0, ans=0.0 2024-08-17 20:48:56,366 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3510720.0, ans=0.0 2024-08-17 20:49:00,515 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3510820.0, ans=0.125 2024-08-17 20:49:01,371 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 13 from Vox, 51 fro AS 2024-08-17 20:49:09,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3510820.0, ans=0.125 2024-08-17 20:49:14,160 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 7950, loss[loss=0.09462, beats_loss=0.01009, ecapa_loss=0.0001923, whisper_loss=0.08261, over 17080.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01063, ecapa_loss=0.0001467, whisper_loss=0.09018, over 3867097.35 frames. ], batch size: 70, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:49:16,455 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.370e+01 2.553e+01 2.861e+01 6.638e+01, threshold=5.106e+01, percent-clipped=2.0 2024-08-17 20:49:16,672 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 20 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-17 20:49:31,417 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 18 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-17 20:49:35,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3511020.0, ans=0.125 2024-08-17 20:49:50,909 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 13 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-17 20:49:56,866 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3511220.0, ans=0.0 2024-08-17 20:50:07,006 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 19 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-17 20:50:21,237 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-17 20:50:26,077 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 8000, loss[loss=0.09901, beats_loss=0.01011, ecapa_loss=0.0001573, whisper_loss=0.08732, over 17952.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01063, ecapa_loss=0.000146, whisper_loss=0.09033, over 3854236.78 frames. ], batch size: 73, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:50:26,924 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3511420.0, ans=0.0 2024-08-17 20:50:47,020 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3511520.0, ans=0.125 2024-08-17 20:50:51,115 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 22 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-17 20:50:51,374 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3511520.0, ans=0.125 2024-08-17 20:51:02,264 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3511620.0, ans=0.125 2024-08-17 20:51:19,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3511720.0, ans=0.125 2024-08-17 20:51:23,569 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3511720.0, ans=0.125 2024-08-17 20:51:32,873 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2024-08-17 20:51:38,214 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3511820.0, ans=0.125 2024-08-17 20:51:40,627 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 8050, loss[loss=0.06539, beats_loss=0.01369, ecapa_loss=0.0001522, whisper_loss=0.05018, over 13670.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01062, ecapa_loss=0.0001462, whisper_loss=0.08992, over 3846882.72 frames. ], batch size: 58, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:51:44,030 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.251e+01 2.590e+01 2.848e+01 4.049e+01, threshold=5.180e+01, percent-clipped=0.0 2024-08-17 20:52:11,299 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 26 from LS+wenet, 18 from Vox, 50 fro AS 2024-08-17 20:52:26,075 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3512220.0, ans=0.0 2024-08-17 20:52:41,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3512320.0, ans=0.2 2024-08-17 20:52:43,416 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 36 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-17 20:52:45,128 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3512320.0, ans=0.2 2024-08-17 20:52:49,849 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 8100, loss[loss=0.1096, beats_loss=0.009646, ecapa_loss=0.0001517, whisper_loss=0.09842, over 20473.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01056, ecapa_loss=0.0001461, whisper_loss=0.09084, over 3877392.39 frames. ], batch size: 79, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:52:59,497 WARNING [optim.py:496] (2/4) Scaling gradients by 0.0995599552989006, model_norm_threshold=51.80342102050781 2024-08-17 20:52:59,665 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.42, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.127e+05, grad_sumsq=1.103e+07, orig_rms_sq=1.022e-02 2024-08-17 20:53:12,500 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-17 20:53:14,165 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 32 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-17 20:53:25,010 WARNING [optim.py:496] (2/4) Scaling gradients by 0.09344208240509033, model_norm_threshold=51.80342102050781 2024-08-17 20:53:25,187 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.conv_module1.depthwise_conv.causal_conv.weight with proportion 0.20, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.259e+04, grad_sumsq=1.018e+05, orig_rms_sq=6.150e-01 2024-08-17 20:53:27,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3512620.0, ans=0.125 2024-08-17 20:53:36,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3512720.0, ans=0.125 2024-08-17 20:53:41,498 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 29 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-17 20:53:54,010 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3512820.0, ans=0.125 2024-08-17 20:53:59,281 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 8150, loss[loss=0.09129, beats_loss=0.01193, ecapa_loss=0.0001275, whisper_loss=0.07809, over 16688.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01058, ecapa_loss=0.0001458, whisper_loss=0.09033, over 3869842.53 frames. ], batch size: 67, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:53:59,380 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 27 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-17 20:54:02,015 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.389e+01 2.629e+01 2.995e+01 5.544e+02, threshold=5.257e+01, percent-clipped=3.0 2024-08-17 20:54:02,324 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-17 20:54:10,872 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 17 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-17 20:54:19,721 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3513020.0, ans=0.125 2024-08-17 20:54:26,401 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3513120.0, ans=0.125 2024-08-17 20:54:27,592 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3513120.0, ans=0.2 2024-08-17 20:54:38,823 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3513120.0, ans=0.125 2024-08-17 20:54:38,827 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3513120.0, ans=0.125 2024-08-17 20:55:08,100 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 8200, loss[loss=0.107, beats_loss=0.01236, ecapa_loss=0.0001286, whisper_loss=0.09331, over 23286.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01055, ecapa_loss=0.000146, whisper_loss=0.09106, over 3928916.26 frames. ], batch size: 92, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:55:11,604 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3513420.0, ans=0.0 2024-08-17 20:55:15,061 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 19 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-17 20:55:16,458 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 19 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-17 20:55:30,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3513520.0, ans=0.125 2024-08-17 20:55:31,683 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3513520.0, ans=0.1 2024-08-17 20:55:39,528 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 25 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-17 20:55:42,284 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 12 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-17 20:55:49,999 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 32 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-17 20:56:09,745 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-17 20:56:14,609 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 8250, loss[loss=0.07275, beats_loss=0.01238, ecapa_loss=0.0001532, whisper_loss=0.05883, over 15511.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01062, ecapa_loss=0.0001459, whisper_loss=0.09119, over 3946502.39 frames. ], batch size: 66, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:56:17,217 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.807e+01 2.363e+01 2.592e+01 2.897e+01 5.680e+01, threshold=5.184e+01, percent-clipped=1.0 2024-08-17 20:56:23,493 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 23 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-17 20:56:26,308 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3513920.0, ans=0.125 2024-08-17 20:56:39,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3514020.0, ans=0.125 2024-08-17 20:56:42,143 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3514120.0, ans=0.125 2024-08-17 20:56:44,949 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.40 vs. limit=15.0 2024-08-17 20:56:45,923 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3514120.0, ans=0.125 2024-08-17 20:56:46,588 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.13 vs. limit=8.0 2024-08-17 20:56:59,831 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3514220.0, ans=0.0 2024-08-17 20:57:06,489 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3514320.0, ans=10.0 2024-08-17 20:57:09,765 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 31 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-17 20:57:14,555 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.13 vs. limit=15.0 2024-08-17 20:57:19,750 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 8300, loss[loss=0.1245, beats_loss=0.009986, ecapa_loss=0.0001477, whisper_loss=0.113, over 23220.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01062, ecapa_loss=0.0001461, whisper_loss=0.09101, over 3955777.32 frames. ], batch size: 90, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:57:29,768 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.60 vs. limit=22.5 2024-08-17 20:57:30,664 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=4.633e-01 2024-08-17 20:57:32,120 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3514520.0, ans=0.0 2024-08-17 20:57:50,813 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-17 20:57:52,068 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 16 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-17 20:57:53,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3514620.0, ans=0.05 2024-08-17 20:58:00,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3514720.0, ans=0.1 2024-08-17 20:58:06,171 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3514720.0, ans=0.1 2024-08-17 20:58:13,639 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 29 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-17 20:58:27,336 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 23 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-17 20:58:28,637 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 8350, loss[loss=0.09264, beats_loss=0.01196, ecapa_loss=0.0001881, whisper_loss=0.0788, over 18281.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01066, ecapa_loss=0.0001457, whisper_loss=0.09052, over 3928052.58 frames. ], batch size: 78, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:58:30,070 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3514920.0, ans=0.1 2024-08-17 20:58:32,343 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.299e+01 2.560e+01 2.760e+01 1.780e+02, threshold=5.121e+01, percent-clipped=1.0 2024-08-17 20:58:32,860 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3514920.0, ans=0.125 2024-08-17 20:58:39,513 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 24 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-17 20:59:08,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3515120.0, ans=0.125 2024-08-17 20:59:20,202 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3515220.0, ans=0.0 2024-08-17 20:59:24,335 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-17 20:59:24,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3515320.0, ans=0.125 2024-08-17 20:59:24,876 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3515320.0, ans=0.0 2024-08-17 20:59:27,299 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 12 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-17 20:59:30,288 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3515320.0, ans=0.2 2024-08-17 20:59:40,347 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 8400, loss[loss=0.1109, beats_loss=0.01085, ecapa_loss=0.000149, whisper_loss=0.09858, over 22787.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0106, ecapa_loss=0.0001465, whisper_loss=0.09074, over 3938162.48 frames. ], batch size: 91, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:59:44,527 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3515420.0, ans=0.0 2024-08-17 20:59:44,815 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.09 vs. limit=15.0 2024-08-17 20:59:49,715 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.91 vs. limit=15.0 2024-08-17 20:59:52,264 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3515420.0, ans=0.125 2024-08-17 20:59:53,530 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3515520.0, ans=0.1 2024-08-17 21:00:26,124 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 29 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-17 21:00:29,982 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-17 21:00:41,154 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-17 21:00:46,388 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 17 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-17 21:00:48,567 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 8450, loss[loss=0.1005, beats_loss=0.009166, ecapa_loss=0.0001924, whisper_loss=0.08941, over 16812.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01057, ecapa_loss=0.0001472, whisper_loss=0.09013, over 3904981.11 frames. ], batch size: 70, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:00:49,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3515920.0, ans=0.1 2024-08-17 21:00:51,503 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+01 2.340e+01 2.576e+01 2.813e+01 3.735e+01, threshold=5.152e+01, percent-clipped=0.0 2024-08-17 21:00:59,022 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.72 vs. limit=6.0 2024-08-17 21:01:00,420 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3515920.0, ans=0.2 2024-08-17 21:01:11,897 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.79 vs. limit=6.0 2024-08-17 21:01:15,074 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.65 vs. limit=8.0 2024-08-17 21:01:19,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3516120.0, ans=0.1 2024-08-17 21:01:24,031 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 23 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-17 21:01:54,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3516320.0, ans=0.0 2024-08-17 21:01:59,948 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 8500, loss[loss=0.1089, beats_loss=0.0112, ecapa_loss=0.0001133, whisper_loss=0.09655, over 23429.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.0106, ecapa_loss=0.0001469, whisper_loss=0.08908, over 3903623.64 frames. ], batch size: 89, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:02:01,922 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.09 vs. limit=15.0 2024-08-17 21:02:13,753 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 24 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-17 21:02:45,196 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3516720.0, ans=0.1 2024-08-17 21:03:03,084 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-17 21:03:05,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3516820.0, ans=0.1 2024-08-17 21:03:10,216 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 20 from LS+wenet, 21 from Vox, 50 fro AS 2024-08-17 21:03:12,978 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 8550, loss[loss=0.1176, beats_loss=0.009998, ecapa_loss=0.0001526, whisper_loss=0.1061, over 21979.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01059, ecapa_loss=0.0001466, whisper_loss=0.08976, over 3906460.19 frames. ], batch size: 89, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:03:15,206 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-17 21:03:16,621 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.766e+01 2.310e+01 2.634e+01 2.977e+01 2.577e+02, threshold=5.269e+01, percent-clipped=4.0 2024-08-17 21:03:20,045 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3516920.0, ans=0.0 2024-08-17 21:03:39,823 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.44 vs. limit=15.0 2024-08-17 21:03:44,151 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.77 vs. limit=15.0 2024-08-17 21:03:51,857 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-17 21:04:09,809 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3517320.0, ans=0.125 2024-08-17 21:04:13,365 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 28 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-17 21:04:22,129 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.93 vs. limit=15.0 2024-08-17 21:04:24,305 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 8600, loss[loss=0.105, beats_loss=0.01082, ecapa_loss=0.0001452, whisper_loss=0.09269, over 22977.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01052, ecapa_loss=0.0001465, whisper_loss=0.09063, over 3904118.67 frames. ], batch size: 93, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:04:24,425 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 22 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-17 21:04:32,578 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.388e-02 2024-08-17 21:05:13,766 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3517720.0, ans=0.125 2024-08-17 21:05:24,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3517820.0, ans=0.125 2024-08-17 21:05:30,519 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.83 vs. limit=22.5 2024-08-17 21:05:33,607 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-17 21:05:36,490 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 8650, loss[loss=0.1046, beats_loss=0.009936, ecapa_loss=0.00016, whisper_loss=0.09309, over 21953.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01057, ecapa_loss=0.0001451, whisper_loss=0.09086, over 3888395.27 frames. ], batch size: 91, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:05:39,275 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.019e+01 2.385e+01 2.642e+01 2.981e+01 2.241e+02, threshold=5.284e+01, percent-clipped=1.0 2024-08-17 21:05:46,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3517920.0, ans=10.0 2024-08-17 21:05:47,835 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-17 21:05:50,645 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.54 vs. limit=15.0 2024-08-17 21:05:54,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3518020.0, ans=0.0 2024-08-17 21:05:58,519 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3518020.0, ans=0.035 2024-08-17 21:06:00,775 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 24 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-17 21:06:20,896 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3518220.0, ans=0.05 2024-08-17 21:06:35,228 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.06 vs. limit=15.0 2024-08-17 21:06:49,462 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 8700, loss[loss=0.113, beats_loss=0.009432, ecapa_loss=0.0001587, whisper_loss=0.1019, over 21339.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01042, ecapa_loss=0.0001469, whisper_loss=0.09142, over 3856371.87 frames. ], batch size: 84, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:06:52,559 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-17 21:06:53,509 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3518420.0, ans=0.1 2024-08-17 21:07:33,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3518720.0, ans=0.125 2024-08-17 21:07:34,921 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 21 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-17 21:07:36,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3518720.0, ans=0.125 2024-08-17 21:07:39,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3518720.0, ans=0.0 2024-08-17 21:07:42,357 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3518720.0, ans=0.125 2024-08-17 21:07:48,446 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3518820.0, ans=0.0 2024-08-17 21:07:51,199 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3518820.0, ans=0.125 2024-08-17 21:07:52,716 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3518820.0, ans=0.1 2024-08-17 21:08:03,625 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 8750, loss[loss=0.111, beats_loss=0.008843, ecapa_loss=0.0001784, whisper_loss=0.1004, over 21985.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01037, ecapa_loss=0.0001481, whisper_loss=0.09164, over 3853556.52 frames. ], batch size: 90, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:08:06,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3518920.0, ans=0.07 2024-08-17 21:08:07,239 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.737e+01 2.308e+01 2.508e+01 2.741e+01 1.105e+02, threshold=5.017e+01, percent-clipped=1.0 2024-08-17 21:08:39,573 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.75 vs. limit=15.0 2024-08-17 21:08:44,525 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 21:08:46,412 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.83 vs. limit=15.0 2024-08-17 21:08:48,102 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 22 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-17 21:08:49,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3519120.0, ans=0.125 2024-08-17 21:09:05,982 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 14 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-17 21:09:20,851 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 8800, loss[loss=0.1013, beats_loss=0.01003, ecapa_loss=0.0001302, whisper_loss=0.09002, over 23118.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01046, ecapa_loss=0.0001473, whisper_loss=0.09138, over 3845206.89 frames. ], batch size: 90, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:09:33,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3519420.0, ans=0.125 2024-08-17 21:09:41,486 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 22 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-17 21:09:44,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3519520.0, ans=0.125 2024-08-17 21:09:46,735 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.23 vs. limit=22.5 2024-08-17 21:09:49,464 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3519620.0, ans=0.125 2024-08-17 21:09:50,350 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-17 21:09:53,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3519620.0, ans=0.2 2024-08-17 21:09:55,742 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3519620.0, ans=0.125 2024-08-17 21:10:07,261 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 21:10:12,776 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 34 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-17 21:10:30,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3519820.0, ans=0.125 2024-08-17 21:10:30,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3519820.0, ans=0.0 2024-08-17 21:10:34,489 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 8850, loss[loss=0.09414, beats_loss=0.009049, ecapa_loss=0.0001492, whisper_loss=0.0836, over 19612.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01057, ecapa_loss=0.000146, whisper_loss=0.09093, over 3840312.93 frames. ], batch size: 77, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:10:37,026 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.338e+01 2.557e+01 2.876e+01 3.818e+01, threshold=5.114e+01, percent-clipped=0.0 2024-08-17 21:10:50,752 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3520020.0, ans=0.125 2024-08-17 21:11:03,698 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.24 vs. limit=15.0 2024-08-17 21:11:04,936 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 18 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-17 21:11:18,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3520120.0, ans=0.1 2024-08-17 21:11:27,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3520220.0, ans=0.125 2024-08-17 21:11:41,128 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3520320.0, ans=0.0 2024-08-17 21:11:44,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3520320.0, ans=0.125 2024-08-17 21:11:50,163 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 8900, loss[loss=0.09677, beats_loss=0.01202, ecapa_loss=0.0001149, whisper_loss=0.0836, over 20508.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01054, ecapa_loss=0.0001455, whisper_loss=0.09081, over 3850807.66 frames. ], batch size: 77, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:12:18,936 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.35 vs. limit=22.5 2024-08-17 21:12:19,601 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 14 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-17 21:12:26,366 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.45 vs. limit=15.0 2024-08-17 21:12:28,415 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 22 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-17 21:12:32,483 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3520620.0, ans=0.025 2024-08-17 21:12:34,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3520620.0, ans=0.125 2024-08-17 21:12:36,019 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3520720.0, ans=0.125 2024-08-17 21:12:36,198 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.27 vs. limit=15.0 2024-08-17 21:12:54,540 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 23 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-17 21:12:55,974 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3520820.0, ans=0.125 2024-08-17 21:13:05,379 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3520820.0, ans=0.125 2024-08-17 21:13:07,609 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 8950, loss[loss=0.1004, beats_loss=0.01114, ecapa_loss=0.0001576, whisper_loss=0.08768, over 21021.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01052, ecapa_loss=0.0001467, whisper_loss=0.09111, over 3870918.75 frames. ], batch size: 87, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:13:10,742 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.743e+01 2.279e+01 2.513e+01 2.850e+01 4.067e+01, threshold=5.027e+01, percent-clipped=0.0 2024-08-17 21:13:11,435 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3520920.0, ans=0.0 2024-08-17 21:13:16,531 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 25 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-17 21:14:00,114 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=7.889e-03 2024-08-17 21:14:02,374 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-17 21:14:11,071 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3521320.0, ans=0.025 2024-08-17 21:14:26,716 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 9000, loss[loss=0.07586, beats_loss=0.01431, ecapa_loss=0.0001277, whisper_loss=0.06027, over 18436.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01065, ecapa_loss=0.0001471, whisper_loss=0.09002, over 3887595.89 frames. ], batch size: 77, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:14:26,716 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-17 21:15:03,681 INFO [train_multi_KD3.py:1149] (2/4) Epoch 24, validation on ASR_libri: loss=0.2507, beats_loss=0, ecapa_loss=0.0005281, whisper_loss=0.2454, over 922467.00 frames. 2024-08-17 21:15:22,182 INFO [train_multi_KD3.py:1149] (2/4) Epoch 24, validation on SV_voxceleb1: loss=0.004114, beats_loss=0, ecapa_loss=0.0004114, whisper_loss=0, over 939242.00 frames. 2024-08-17 21:17:01,744 INFO [train_multi_KD3.py:1149] (2/4) Epoch 24, validation on AT_audioset: loss=0.02322, beats_loss=0.02322, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-17 21:17:01,748 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31584MB 2024-08-17 21:17:13,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3521420.0, ans=0.1 2024-08-17 21:17:15,374 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3521520.0, ans=0.0 2024-08-17 21:17:47,429 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-17 21:18:10,789 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.70 vs. limit=15.0 2024-08-17 21:18:17,985 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 9050, loss[loss=0.1028, beats_loss=0.009295, ecapa_loss=0.0001135, whisper_loss=0.09239, over 15945.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01058, ecapa_loss=0.0001467, whisper_loss=0.09102, over 3858138.05 frames. ], batch size: 57, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:18:19,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3521920.0, ans=0.0 2024-08-17 21:18:19,692 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.18 vs. limit=15.0 2024-08-17 21:18:20,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3521920.0, ans=0.125 2024-08-17 21:18:21,881 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.405e+01 2.622e+01 2.954e+01 2.025e+02, threshold=5.245e+01, percent-clipped=2.0 2024-08-17 21:18:22,087 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 26 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-17 21:18:59,620 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-17 21:19:07,688 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-17 21:19:13,377 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3522220.0, ans=0.125 2024-08-17 21:19:13,745 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3522220.0, ans=0.0 2024-08-17 21:19:28,288 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3522320.0, ans=0.125 2024-08-17 21:19:31,775 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3522320.0, ans=0.0 2024-08-17 21:19:38,106 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 9100, loss[loss=0.09464, beats_loss=0.01221, ecapa_loss=0.000141, whisper_loss=0.08103, over 21156.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01059, ecapa_loss=0.0001463, whisper_loss=0.09068, over 3872094.63 frames. ], batch size: 88, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:19:38,236 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 31 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-17 21:19:52,957 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3522520.0, ans=0.0 2024-08-17 21:20:06,078 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 16 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-17 21:20:13,487 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3522620.0, ans=0.125 2024-08-17 21:20:15,927 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 22 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-17 21:20:26,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3522720.0, ans=0.0 2024-08-17 21:20:36,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3522820.0, ans=0.125 2024-08-17 21:20:36,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3522820.0, ans=0.0 2024-08-17 21:20:47,342 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 18 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-17 21:20:51,040 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 9150, loss[loss=0.1032, beats_loss=0.01081, ecapa_loss=0.0001344, whisper_loss=0.09103, over 20086.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01066, ecapa_loss=0.0001458, whisper_loss=0.08994, over 3853934.93 frames. ], batch size: 81, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:20:51,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3522920.0, ans=0.0 2024-08-17 21:20:54,005 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.307e+01 2.548e+01 2.836e+01 3.815e+01, threshold=5.097e+01, percent-clipped=0.0 2024-08-17 21:21:05,055 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 21:21:14,489 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3523020.0, ans=0.125 2024-08-17 21:21:28,732 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.28 vs. limit=15.0 2024-08-17 21:21:48,373 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3523320.0, ans=0.2 2024-08-17 21:21:51,072 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3523320.0, ans=0.1 2024-08-17 21:21:53,121 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.29 vs. limit=15.0 2024-08-17 21:21:56,683 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3523320.0, ans=0.0 2024-08-17 21:22:01,294 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 9200, loss[loss=0.1182, beats_loss=0.006963, ecapa_loss=0.000168, whisper_loss=0.1096, over 15541.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01058, ecapa_loss=0.0001463, whisper_loss=0.09095, over 3868095.79 frames. ], batch size: 64, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:22:01,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3523420.0, ans=0.125 2024-08-17 21:22:10,693 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 15 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-17 21:22:11,898 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 15 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-17 21:22:15,649 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-17 21:22:22,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3523520.0, ans=0.125 2024-08-17 21:22:42,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3523720.0, ans=0.125 2024-08-17 21:22:46,385 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3523720.0, ans=0.125 2024-08-17 21:22:51,275 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 30 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-17 21:23:05,335 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3523920.0, ans=0.2 2024-08-17 21:23:05,434 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3523920.0, ans=0.2 2024-08-17 21:23:06,135 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 9250, loss[loss=0.09702, beats_loss=0.0111, ecapa_loss=0.0001676, whisper_loss=0.08425, over 22900.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01057, ecapa_loss=0.000147, whisper_loss=0.09054, over 3882440.75 frames. ], batch size: 94, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:23:06,604 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3523920.0, ans=0.125 2024-08-17 21:23:08,954 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.350e+01 2.657e+01 3.037e+01 4.188e+01, threshold=5.314e+01, percent-clipped=0.0 2024-08-17 21:23:14,685 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.22 vs. limit=15.0 2024-08-17 21:23:28,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3524020.0, ans=0.125 2024-08-17 21:23:35,811 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3524120.0, ans=0.0 2024-08-17 21:23:36,142 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.76 vs. limit=15.0 2024-08-17 21:23:37,009 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3524120.0, ans=0.0 2024-08-17 21:23:38,592 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.51 vs. limit=15.0 2024-08-17 21:23:58,270 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 29 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-17 21:24:04,022 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 30 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-17 21:24:06,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3524320.0, ans=0.0 2024-08-17 21:24:13,199 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 9300, loss[loss=0.08922, beats_loss=0.009774, ecapa_loss=0.0001805, whisper_loss=0.07764, over 18238.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01057, ecapa_loss=0.0001473, whisper_loss=0.09067, over 3915863.87 frames. ], batch size: 76, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:24:21,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3524420.0, ans=0.0 2024-08-17 21:24:26,226 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 12 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-17 21:24:29,419 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.65 vs. limit=15.0 2024-08-17 21:24:30,451 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.19 vs. limit=15.0 2024-08-17 21:24:35,881 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.33 vs. limit=6.0 2024-08-17 21:24:39,334 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.19 vs. limit=10.0 2024-08-17 21:24:45,094 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-17 21:24:50,439 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3524720.0, ans=0.125 2024-08-17 21:25:06,018 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-17 21:25:18,741 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 9350, loss[loss=0.1228, beats_loss=0.01013, ecapa_loss=0.0001278, whisper_loss=0.1114, over 22945.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01056, ecapa_loss=0.0001471, whisper_loss=0.09086, over 3917764.96 frames. ], batch size: 88, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:25:21,631 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.849e+01 2.336e+01 2.549e+01 2.795e+01 4.217e+01, threshold=5.098e+01, percent-clipped=0.0 2024-08-17 21:25:30,990 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3524920.0, ans=0.125 2024-08-17 21:25:35,887 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3525020.0, ans=0.125 2024-08-17 21:25:35,959 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3525020.0, ans=0.125 2024-08-17 21:25:38,106 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-17 21:25:39,458 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-17 21:25:58,058 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3525220.0, ans=0.05 2024-08-17 21:26:07,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3525220.0, ans=0.07 2024-08-17 21:26:24,319 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=3525320.0, ans=0.05 2024-08-17 21:26:27,691 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 9400, loss[loss=0.09009, beats_loss=0.009464, ecapa_loss=0.000164, whisper_loss=0.07899, over 14107.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01062, ecapa_loss=0.0001464, whisper_loss=0.09036, over 3932211.67 frames. ], batch size: 58, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:26:29,605 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3525420.0, ans=0.125 2024-08-17 21:26:47,021 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 27 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-17 21:27:06,921 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3525620.0, ans=0.125 2024-08-17 21:27:18,478 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3525720.0, ans=0.125 2024-08-17 21:27:22,025 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 35 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-17 21:27:22,219 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3525720.0, ans=0.125 2024-08-17 21:27:30,396 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 15 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-17 21:27:34,877 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-17 21:27:37,757 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 9450, loss[loss=0.1152, beats_loss=0.01051, ecapa_loss=0.0001567, whisper_loss=0.1031, over 22494.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01061, ecapa_loss=0.0001455, whisper_loss=0.09001, over 3906865.94 frames. ], batch size: 90, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:27:38,801 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3525920.0, ans=0.125 2024-08-17 21:27:39,875 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-17 21:27:41,289 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.632e+01 2.404e+01 2.620e+01 3.015e+01 5.071e+01, threshold=5.241e+01, percent-clipped=0.0 2024-08-17 21:28:14,748 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 21 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-17 21:28:21,815 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-17 21:28:23,522 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3526220.0, ans=0.125 2024-08-17 21:28:26,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3526220.0, ans=0.2 2024-08-17 21:28:49,405 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 9500, loss[loss=0.09228, beats_loss=0.01088, ecapa_loss=0.0001256, whisper_loss=0.08014, over 16537.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01057, ecapa_loss=0.0001459, whisper_loss=0.09017, over 3909891.41 frames. ], batch size: 67, lr: 2.55e-03, grad_scale: 1.152921504606847e+18 2024-08-17 21:28:50,240 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3526420.0, ans=0.2 2024-08-17 21:28:53,078 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 26 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-17 21:29:29,365 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3526620.0, ans=0.125 2024-08-17 21:29:38,087 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 25 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-17 21:29:42,256 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 28 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-17 21:29:48,307 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 18 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-17 21:30:10,014 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 9550, loss[loss=0.09025, beats_loss=0.01334, ecapa_loss=0.0001268, whisper_loss=0.07564, over 20341.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01053, ecapa_loss=0.0001468, whisper_loss=0.08987, over 3885476.03 frames. ], batch size: 84, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:30:14,572 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3526920.0, ans=0.125 2024-08-17 21:30:15,618 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.375e+01 2.611e+01 2.915e+01 4.364e+01, threshold=5.222e+01, percent-clipped=0.0 2024-08-17 21:30:23,485 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3526920.0, ans=0.5 2024-08-17 21:30:25,425 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 26 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-17 21:30:40,889 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.04 vs. limit=15.0 2024-08-17 21:30:42,354 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3527020.0, ans=0.125 2024-08-17 21:31:00,127 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 35 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-17 21:31:10,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3527220.0, ans=0.2 2024-08-17 21:31:25,942 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 21 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-17 21:31:37,395 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3527320.0, ans=0.0 2024-08-17 21:31:44,510 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 9600, loss[loss=0.09481, beats_loss=0.01021, ecapa_loss=0.0001432, whisper_loss=0.08316, over 17701.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01054, ecapa_loss=0.0001465, whisper_loss=0.08959, over 3834412.01 frames. ], batch size: 71, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:31:44,924 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3527420.0, ans=0.09899494936611666 2024-08-17 21:31:58,110 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3527420.0, ans=0.1 2024-08-17 21:31:59,953 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.53 vs. limit=15.0 2024-08-17 21:32:20,116 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3527520.0, ans=0.0 2024-08-17 21:32:21,794 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3527620.0, ans=0.125 2024-08-17 21:32:24,885 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.36 vs. limit=15.0 2024-08-17 21:32:37,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3527620.0, ans=0.0 2024-08-17 21:33:23,372 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 9650, loss[loss=0.08127, beats_loss=0.009797, ecapa_loss=0.0001131, whisper_loss=0.07035, over 14300.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01054, ecapa_loss=0.0001453, whisper_loss=0.0903, over 3832083.05 frames. ], batch size: 55, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:33:25,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3527920.0, ans=0.1 2024-08-17 21:33:26,449 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.55 vs. limit=5.0 2024-08-17 21:33:29,200 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.929e+01 2.363e+01 2.592e+01 2.956e+01 8.123e+01, threshold=5.183e+01, percent-clipped=2.0 2024-08-17 21:33:54,082 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 42 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-17 21:34:02,890 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-17 21:34:52,962 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3528320.0, ans=0.125 2024-08-17 21:35:04,729 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 9700, loss[loss=0.1316, beats_loss=0.01028, ecapa_loss=0.0001171, whisper_loss=0.1201, over 24801.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01047, ecapa_loss=0.0001472, whisper_loss=0.09063, over 3853901.21 frames. ], batch size: 91, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:35:11,217 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3528420.0, ans=0.2 2024-08-17 21:35:16,343 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.33 vs. limit=15.0 2024-08-17 21:35:25,274 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.22 vs. limit=15.0 2024-08-17 21:35:31,803 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3528620.0, ans=0.125 2024-08-17 21:36:12,153 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3528820.0, ans=0.1 2024-08-17 21:36:14,708 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 38 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-17 21:36:16,940 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 9750, loss[loss=0.1145, beats_loss=0.009391, ecapa_loss=0.0001513, whisper_loss=0.1036, over 16767.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01046, ecapa_loss=0.0001471, whisper_loss=0.09102, over 3849152.33 frames. ], batch size: 66, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:36:17,347 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3528920.0, ans=0.125 2024-08-17 21:36:20,772 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.350e+01 2.623e+01 3.001e+01 4.380e+01, threshold=5.247e+01, percent-clipped=0.0 2024-08-17 21:36:28,378 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3528920.0, ans=0.1 2024-08-17 21:36:32,582 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3529020.0, ans=0.125 2024-08-17 21:36:33,547 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-17 21:36:39,391 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.01 vs. limit=22.5 2024-08-17 21:36:44,139 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3529120.0, ans=0.125 2024-08-17 21:36:49,748 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 15 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-17 21:36:50,897 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-17 21:36:54,103 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3529120.0, ans=0.125 2024-08-17 21:37:09,075 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3529220.0, ans=0.125 2024-08-17 21:37:29,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3529420.0, ans=0.0 2024-08-17 21:37:30,200 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 9800, loss[loss=0.07334, beats_loss=0.01116, ecapa_loss=0.0001617, whisper_loss=0.06056, over 16062.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01056, ecapa_loss=0.0001478, whisper_loss=0.09031, over 3849217.07 frames. ], batch size: 64, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:37:30,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3529420.0, ans=0.5 2024-08-17 21:37:37,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3529420.0, ans=0.0 2024-08-17 21:38:00,973 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3529620.0, ans=0.125 2024-08-17 21:38:13,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3529620.0, ans=0.1 2024-08-17 21:38:23,005 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3529720.0, ans=0.125 2024-08-17 21:38:23,850 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-17 21:38:28,669 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-17 21:38:34,427 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.25 vs. limit=15.0 2024-08-17 21:38:36,788 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3529820.0, ans=0.1 2024-08-17 21:38:39,545 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 15 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-17 21:38:48,035 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 9850, loss[loss=0.08991, beats_loss=0.009758, ecapa_loss=0.000123, whisper_loss=0.07893, over 16432.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01046, ecapa_loss=0.0001495, whisper_loss=0.09137, over 3837588.97 frames. ], batch size: 61, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:38:52,470 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.238e+01 2.528e+01 2.791e+01 4.527e+01, threshold=5.056e+01, percent-clipped=0.0 2024-08-17 21:38:58,616 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3529920.0, ans=0.0 2024-08-17 21:39:03,836 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=3530020.0, ans=0.2 2024-08-17 21:39:30,044 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-17 21:39:47,045 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.74 vs. limit=22.5 2024-08-17 21:39:51,114 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 17 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-17 21:40:04,752 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 9900, loss[loss=0.09624, beats_loss=0.01114, ecapa_loss=0.0001131, whisper_loss=0.08397, over 17601.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01056, ecapa_loss=0.0001493, whisper_loss=0.09091, over 3833751.84 frames. ], batch size: 68, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:40:18,042 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-17 21:40:22,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=3530520.0, ans=15.0 2024-08-17 21:40:24,083 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.09 vs. limit=15.0 2024-08-17 21:40:38,673 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.64 vs. limit=15.0 2024-08-17 21:40:44,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3530620.0, ans=0.125 2024-08-17 21:40:46,188 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3530620.0, ans=0.125 2024-08-17 21:40:55,199 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3530720.0, ans=0.2 2024-08-17 21:41:04,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3530820.0, ans=0.2 2024-08-17 21:41:08,463 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-17 21:41:19,387 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.93 vs. limit=12.0 2024-08-17 21:41:19,759 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 9950, loss[loss=0.07743, beats_loss=0.01102, ecapa_loss=0.000142, whisper_loss=0.06498, over 17998.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01058, ecapa_loss=0.0001482, whisper_loss=0.09085, over 3823452.71 frames. ], batch size: 71, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:41:21,171 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 23 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-17 21:41:22,700 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3530920.0, ans=0.2 2024-08-17 21:41:23,683 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.774e+01 2.318e+01 2.487e+01 2.825e+01 4.305e+01, threshold=4.974e+01, percent-clipped=0.0 2024-08-17 21:41:27,957 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3530920.0, ans=0.05 2024-08-17 21:41:30,794 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3530920.0, ans=0.125 2024-08-17 21:41:32,844 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 18 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-17 21:41:36,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3531020.0, ans=0.1 2024-08-17 21:41:41,811 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 26 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-17 21:41:55,486 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 24 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-17 21:41:58,310 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-17 21:42:38,366 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 10000, loss[loss=0.1049, beats_loss=0.01006, ecapa_loss=0.0001449, whisper_loss=0.09338, over 14875.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01052, ecapa_loss=0.0001484, whisper_loss=0.09121, over 3841319.54 frames. ], batch size: 59, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:42:39,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3531420.0, ans=0.125 2024-08-17 21:42:42,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3531420.0, ans=0.125 2024-08-17 21:43:05,162 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 22 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-17 21:43:11,365 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 13 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-17 21:43:14,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3531620.0, ans=0.125 2024-08-17 21:43:18,570 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-17 21:43:28,450 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 22 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-17 21:43:33,339 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3531720.0, ans=10.0 2024-08-17 21:43:34,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3531720.0, ans=0.125 2024-08-17 21:43:54,959 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 10050, loss[loss=0.1055, beats_loss=0.01339, ecapa_loss=0.0001054, whisper_loss=0.09103, over 23283.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01064, ecapa_loss=0.0001466, whisper_loss=0.09011, over 3873781.50 frames. ], batch size: 92, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:43:55,076 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-17 21:43:59,734 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.404e+01 2.606e+01 2.788e+01 4.457e+01, threshold=5.212e+01, percent-clipped=0.0 2024-08-17 21:44:06,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3531920.0, ans=0.04949747468305833 2024-08-17 21:44:23,585 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.39 vs. limit=22.5 2024-08-17 21:44:35,838 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3532120.0, ans=0.0 2024-08-17 21:44:39,852 WARNING [optim.py:496] (2/4) Scaling gradients by 0.09485381096601486, model_norm_threshold=52.118003845214844 2024-08-17 21:44:40,022 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.28, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.481e+04, grad_sumsq=8.481e+04, orig_rms_sq=1.000e+00 2024-08-17 21:44:49,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3532220.0, ans=0.0 2024-08-17 21:44:50,302 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.73 vs. limit=15.0 2024-08-17 21:44:56,565 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3532320.0, ans=0.025 2024-08-17 21:45:01,212 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.81 vs. limit=15.0 2024-08-17 21:45:08,957 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.67 vs. limit=15.0 2024-08-17 21:45:12,851 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 10100, loss[loss=0.09504, beats_loss=0.01176, ecapa_loss=0.0001455, whisper_loss=0.08183, over 19757.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01063, ecapa_loss=0.0001472, whisper_loss=0.0904, over 3905824.61 frames. ], batch size: 79, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:45:26,626 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.22 vs. limit=6.0 2024-08-17 21:45:31,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=3532520.0, ans=10.0 2024-08-17 21:46:24,592 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-17 21:46:26,189 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 10150, loss[loss=0.0883, beats_loss=0.01084, ecapa_loss=0.0001222, whisper_loss=0.07624, over 22375.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01054, ecapa_loss=0.000148, whisper_loss=0.09077, over 3924337.27 frames. ], batch size: 89, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:46:30,189 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.357e+01 2.542e+01 2.923e+01 5.495e+02, threshold=5.084e+01, percent-clipped=3.0 2024-08-17 21:46:30,382 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-17 21:46:38,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3533020.0, ans=0.2 2024-08-17 21:47:04,191 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-17 21:47:15,837 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3533220.0, ans=0.2 2024-08-17 21:47:18,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3533220.0, ans=0.125 2024-08-17 21:47:36,456 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-17 21:47:37,753 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 10200, loss[loss=0.1033, beats_loss=0.01042, ecapa_loss=0.0001846, whisper_loss=0.09104, over 21912.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01052, ecapa_loss=0.0001492, whisper_loss=0.09034, over 3909015.84 frames. ], batch size: 92, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:47:48,011 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3533420.0, ans=0.1 2024-08-17 21:47:48,122 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3533420.0, ans=0.0 2024-08-17 21:47:48,967 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 12 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-17 21:48:02,519 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.67 vs. limit=15.0 2024-08-17 21:48:08,940 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3533620.0, ans=0.125 2024-08-17 21:48:28,075 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3533720.0, ans=0.0 2024-08-17 21:48:37,708 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 28 from LS+wenet, 25 from Vox, 20 fro AS 2024-08-17 21:48:47,933 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-17 21:48:49,021 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 10250, loss[loss=0.1078, beats_loss=0.01069, ecapa_loss=0.0001501, whisper_loss=0.09563, over 22001.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01041, ecapa_loss=0.0001498, whisper_loss=0.09146, over 3935806.66 frames. ], batch size: 91, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:48:52,937 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.974e+01 2.355e+01 2.589e+01 2.999e+01 4.439e+02, threshold=5.177e+01, percent-clipped=1.0 2024-08-17 21:48:54,713 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 21:49:10,828 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.26 vs. limit=12.0 2024-08-17 21:49:12,012 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3534020.0, ans=0.125 2024-08-17 21:49:20,256 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3534120.0, ans=0.125 2024-08-17 21:49:22,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3534120.0, ans=0.0 2024-08-17 21:49:24,951 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-17 21:49:31,971 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3534220.0, ans=0.0 2024-08-17 21:49:36,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3534220.0, ans=0.2 2024-08-17 21:49:44,554 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-17 21:49:47,277 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-17 21:49:52,102 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.32 vs. limit=6.0 2024-08-17 21:49:54,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3534320.0, ans=0.015 2024-08-17 21:49:57,529 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 10300, loss[loss=0.1111, beats_loss=0.009752, ecapa_loss=0.0001868, whisper_loss=0.09946, over 21763.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0104, ecapa_loss=0.0001499, whisper_loss=0.09138, over 3918126.12 frames. ], batch size: 90, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:50:00,187 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-17 21:50:02,051 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3534420.0, ans=0.1 2024-08-17 21:50:09,082 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.65 vs. limit=22.5 2024-08-17 21:50:27,597 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-17 21:50:34,289 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3534620.0, ans=0.125 2024-08-17 21:50:42,114 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 21 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-17 21:50:42,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3534720.0, ans=0.125 2024-08-17 21:50:56,931 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3534820.0, ans=0.2 2024-08-17 21:51:01,091 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3534820.0, ans=0.1 2024-08-17 21:51:05,880 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 10350, loss[loss=0.1187, beats_loss=0.01091, ecapa_loss=0.000146, whisper_loss=0.1063, over 23033.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01047, ecapa_loss=0.000149, whisper_loss=0.09172, over 3940120.69 frames. ], batch size: 93, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:51:09,940 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.918e+01 2.272e+01 2.511e+01 2.861e+01 6.288e+01, threshold=5.023e+01, percent-clipped=1.0 2024-08-17 21:51:10,126 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 13 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-17 21:51:30,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3535020.0, ans=0.0 2024-08-17 21:51:36,916 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-17 21:51:53,440 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-17 21:52:00,478 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3535320.0, ans=0.0 2024-08-17 21:52:11,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3535320.0, ans=0.05 2024-08-17 21:52:12,608 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3535420.0, ans=0.0 2024-08-17 21:52:13,448 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 10400, loss[loss=0.1183, beats_loss=0.01084, ecapa_loss=0.0001701, whisper_loss=0.1058, over 22415.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01055, ecapa_loss=0.0001474, whisper_loss=0.09148, over 3942581.34 frames. ], batch size: 91, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:52:13,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3535420.0, ans=0.0 2024-08-17 21:52:20,212 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 21:52:22,676 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3535420.0, ans=0.1 2024-08-17 21:52:25,066 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 25 from LS+wenet, 16 from Vox, 15 fro AS 2024-08-17 21:52:43,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3535620.0, ans=0.125 2024-08-17 21:52:51,393 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3535620.0, ans=0.035 2024-08-17 21:53:06,611 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3535820.0, ans=0.125 2024-08-17 21:53:06,755 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3535820.0, ans=0.0 2024-08-17 21:53:10,141 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-17 21:53:15,774 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3535820.0, ans=0.09899494936611666 2024-08-17 21:53:19,110 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 10450, loss[loss=0.09943, beats_loss=0.01202, ecapa_loss=0.0001513, whisper_loss=0.0859, over 15029.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01049, ecapa_loss=0.0001467, whisper_loss=0.09093, over 3914400.53 frames. ], batch size: 61, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:53:20,601 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-17 21:53:22,784 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.737e+01 2.244e+01 2.463e+01 2.760e+01 5.655e+01, threshold=4.925e+01, percent-clipped=1.0 2024-08-17 21:53:43,755 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.24 vs. limit=10.0 2024-08-17 21:53:47,048 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 20 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-17 21:53:51,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3536120.0, ans=0.09899494936611666 2024-08-17 21:53:58,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3536220.0, ans=0.0 2024-08-17 21:54:09,667 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-17 21:54:11,698 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.32 vs. limit=15.0 2024-08-17 21:54:24,184 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 10500, loss[loss=0.07753, beats_loss=0.01233, ecapa_loss=0.0001269, whisper_loss=0.06393, over 13318.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01052, ecapa_loss=0.0001472, whisper_loss=0.09007, over 3902180.37 frames. ], batch size: 54, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:54:27,708 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3536420.0, ans=0.0 2024-08-17 21:54:49,724 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3536620.0, ans=0.2 2024-08-17 21:54:51,328 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.19 vs. limit=12.0 2024-08-17 21:54:54,801 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3536620.0, ans=0.0 2024-08-17 21:55:01,297 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 20 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-17 21:55:08,984 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 20 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-17 21:55:29,218 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 10550, loss[loss=0.102, beats_loss=0.01106, ecapa_loss=0.0001471, whisper_loss=0.08945, over 21557.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01055, ecapa_loss=0.0001476, whisper_loss=0.0896, over 3881416.55 frames. ], batch size: 89, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:55:33,486 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.761e+01 2.376e+01 2.745e+01 3.046e+01 5.243e+01, threshold=5.490e+01, percent-clipped=1.0 2024-08-17 21:55:44,365 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.09 vs. limit=10.0 2024-08-17 21:55:57,927 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 15 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-17 21:56:02,706 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 29 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-17 21:56:14,483 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3537220.0, ans=0.0 2024-08-17 21:56:33,345 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 10600, loss[loss=0.09577, beats_loss=0.01011, ecapa_loss=0.0001292, whisper_loss=0.08437, over 23463.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01059, ecapa_loss=0.0001478, whisper_loss=0.08883, over 3878636.75 frames. ], batch size: 89, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:56:40,469 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3537420.0, ans=0.1 2024-08-17 21:56:42,777 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 27 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-17 21:56:44,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3537420.0, ans=0.125 2024-08-17 21:56:44,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3537420.0, ans=0.1 2024-08-17 21:57:05,605 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3537620.0, ans=0.1 2024-08-17 21:57:06,477 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 13 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-17 21:57:18,200 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3537720.0, ans=0.0 2024-08-17 21:57:25,178 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.84 vs. limit=15.0 2024-08-17 21:57:28,892 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.64 vs. limit=15.0 2024-08-17 21:57:37,114 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 10650, loss[loss=0.1233, beats_loss=0.00638, ecapa_loss=0.0001667, whisper_loss=0.1152, over 23196.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01053, ecapa_loss=0.0001468, whisper_loss=0.08956, over 3881068.00 frames. ], batch size: 91, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:57:40,717 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.432e+01 2.682e+01 2.988e+01 4.178e+01, threshold=5.364e+01, percent-clipped=0.0 2024-08-17 21:57:47,594 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-17 21:58:10,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3538120.0, ans=0.125 2024-08-17 21:58:11,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3538120.0, ans=0.0 2024-08-17 21:58:17,562 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3538220.0, ans=0.0 2024-08-17 21:58:18,541 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 22 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-17 21:58:18,752 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=3538220.0, ans=10.0 2024-08-17 21:58:20,886 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 36 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-17 21:58:21,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3538220.0, ans=0.1 2024-08-17 21:58:37,168 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.29 vs. limit=15.0 2024-08-17 21:58:41,404 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 10700, loss[loss=0.1279, beats_loss=0.008929, ecapa_loss=0.0001241, whisper_loss=0.1177, over 19355.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01058, ecapa_loss=0.0001449, whisper_loss=0.08978, over 3879783.82 frames. ], batch size: 70, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:58:54,548 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 22 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-17 21:59:01,961 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3538520.0, ans=0.125 2024-08-17 21:59:12,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3538620.0, ans=0.1 2024-08-17 21:59:19,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3538720.0, ans=0.125 2024-08-17 21:59:29,605 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3538720.0, ans=0.0 2024-08-17 21:59:30,841 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3538820.0, ans=0.2 2024-08-17 21:59:35,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3538820.0, ans=0.035 2024-08-17 21:59:44,051 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 10750, loss[loss=0.09907, beats_loss=0.009926, ecapa_loss=0.0001622, whisper_loss=0.08752, over 18702.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0105, ecapa_loss=0.0001465, whisper_loss=0.09014, over 3897020.96 frames. ], batch size: 76, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:59:46,217 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3538920.0, ans=0.125 2024-08-17 21:59:48,272 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.762e+01 2.354e+01 2.532e+01 2.828e+01 4.238e+01, threshold=5.063e+01, percent-clipped=0.0 2024-08-17 21:59:51,193 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.140e+01 2024-08-17 22:00:08,364 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3539120.0, ans=0.125 2024-08-17 22:00:23,751 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3539220.0, ans=0.07 2024-08-17 22:00:47,868 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 10800, loss[loss=0.09917, beats_loss=0.01013, ecapa_loss=0.0001459, whisper_loss=0.08759, over 23204.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01054, ecapa_loss=0.000147, whisper_loss=0.0903, over 3870206.72 frames. ], batch size: 92, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:00:49,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3539420.0, ans=0.1 2024-08-17 22:00:51,685 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-17 22:00:55,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3539420.0, ans=0.125 2024-08-17 22:01:05,472 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 16 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-17 22:01:36,483 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 23 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-17 22:01:47,261 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3539820.0, ans=0.025 2024-08-17 22:01:49,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3539920.0, ans=0.1 2024-08-17 22:01:50,487 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 10850, loss[loss=0.112, beats_loss=0.009899, ecapa_loss=0.0001711, whisper_loss=0.1004, over 18133.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01055, ecapa_loss=0.0001485, whisper_loss=0.09003, over 3831301.86 frames. ], batch size: 75, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:01:53,435 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3539920.0, ans=0.125 2024-08-17 22:01:54,184 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.336e+01 2.508e+01 2.767e+01 4.451e+01, threshold=5.015e+01, percent-clipped=0.0 2024-08-17 22:01:58,086 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-17 22:01:58,397 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3539920.0, ans=0.0 2024-08-17 22:02:07,739 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3540020.0, ans=0.125 2024-08-17 22:02:07,741 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3540020.0, ans=10.0 2024-08-17 22:02:24,122 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3540120.0, ans=0.0 2024-08-17 22:02:29,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3540220.0, ans=0.0 2024-08-17 22:02:54,003 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 10900, loss[loss=0.1146, beats_loss=0.01185, ecapa_loss=0.000133, whisper_loss=0.1014, over 16959.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01047, ecapa_loss=0.0001491, whisper_loss=0.09117, over 3860204.90 frames. ], batch size: 67, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:03:06,092 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 20 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-17 22:03:15,272 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3540520.0, ans=0.125 2024-08-17 22:03:24,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3540620.0, ans=10.0 2024-08-17 22:03:34,079 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.14 vs. limit=15.0 2024-08-17 22:03:37,386 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-17 22:03:56,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3540920.0, ans=0.125 2024-08-17 22:03:57,305 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 10950, loss[loss=0.1297, beats_loss=0.006402, ecapa_loss=0.0001618, whisper_loss=0.1217, over 20656.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01045, ecapa_loss=0.0001484, whisper_loss=0.09155, over 3879801.31 frames. ], batch size: 78, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:04:01,043 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.429e+01 2.667e+01 3.020e+01 4.482e+01, threshold=5.334e+01, percent-clipped=0.0 2024-08-17 22:04:01,526 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3540920.0, ans=0.1 2024-08-17 22:04:16,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3541020.0, ans=0.125 2024-08-17 22:04:25,221 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3541120.0, ans=0.0 2024-08-17 22:04:29,803 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-17 22:04:37,748 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-17 22:04:41,643 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3541220.0, ans=0.125 2024-08-17 22:05:00,125 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 11000, loss[loss=0.1104, beats_loss=0.007884, ecapa_loss=0.0001601, whisper_loss=0.1009, over 16080.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01047, ecapa_loss=0.0001487, whisper_loss=0.09106, over 3888209.57 frames. ], batch size: 60, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:05:05,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3541420.0, ans=0.125 2024-08-17 22:05:09,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3541420.0, ans=0.2 2024-08-17 22:05:12,592 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.92 vs. limit=15.0 2024-08-17 22:05:34,849 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3541620.0, ans=0.125 2024-08-17 22:05:35,662 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 21 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-17 22:05:47,199 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3541720.0, ans=0.125 2024-08-17 22:05:54,921 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=3541820.0, ans=0.1 2024-08-17 22:06:02,583 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 11050, loss[loss=0.0972, beats_loss=0.009684, ecapa_loss=0.0001508, whisper_loss=0.08601, over 16984.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0105, ecapa_loss=0.0001476, whisper_loss=0.09078, over 3909835.28 frames. ], batch size: 67, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:06:06,469 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.361e+01 2.552e+01 2.844e+01 4.106e+02, threshold=5.103e+01, percent-clipped=1.0 2024-08-17 22:06:42,118 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3542220.0, ans=0.125 2024-08-17 22:06:51,171 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3542220.0, ans=0.125 2024-08-17 22:06:52,403 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 22:07:05,404 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 11100, loss[loss=0.09556, beats_loss=0.01065, ecapa_loss=0.0001778, whisper_loss=0.08313, over 20196.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01047, ecapa_loss=0.0001477, whisper_loss=0.09067, over 3890478.67 frames. ], batch size: 87, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:07:15,783 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3542420.0, ans=0.125 2024-08-17 22:07:19,548 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=7.721e-03 2024-08-17 22:07:28,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3542520.0, ans=0.0 2024-08-17 22:07:30,049 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3542620.0, ans=0.0 2024-08-17 22:07:55,992 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-17 22:08:08,411 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 11150, loss[loss=0.1125, beats_loss=0.01026, ecapa_loss=0.0001287, whisper_loss=0.101, over 22439.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01037, ecapa_loss=0.0001474, whisper_loss=0.09148, over 3907361.25 frames. ], batch size: 90, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:08:12,141 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.728e+01 2.281e+01 2.502e+01 2.861e+01 4.409e+01, threshold=5.005e+01, percent-clipped=0.0 2024-08-17 22:08:16,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3542920.0, ans=0.2 2024-08-17 22:08:27,365 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3543020.0, ans=0.125 2024-08-17 22:08:27,484 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3543020.0, ans=0.2 2024-08-17 22:08:30,009 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3543020.0, ans=0.05 2024-08-17 22:08:42,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3543120.0, ans=0.0 2024-08-17 22:08:57,567 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.61 vs. limit=15.0 2024-08-17 22:09:10,944 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 11200, loss[loss=0.1038, beats_loss=0.01106, ecapa_loss=0.0001407, whisper_loss=0.09135, over 20510.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0104, ecapa_loss=0.0001468, whisper_loss=0.09097, over 3905516.12 frames. ], batch size: 81, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:09:11,319 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3543420.0, ans=0.0 2024-08-17 22:09:40,998 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3543620.0, ans=0.125 2024-08-17 22:09:42,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3543620.0, ans=0.125 2024-08-17 22:09:49,853 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-17 22:09:51,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3543720.0, ans=0.1 2024-08-17 22:09:54,716 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 19 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-17 22:09:55,943 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 13 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-17 22:09:58,980 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3543720.0, ans=0.0 2024-08-17 22:10:04,095 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3543820.0, ans=0.125 2024-08-17 22:10:10,069 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-17 22:10:13,538 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 11250, loss[loss=0.09847, beats_loss=0.01065, ecapa_loss=0.0001579, whisper_loss=0.08624, over 17156.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01043, ecapa_loss=0.0001472, whisper_loss=0.09038, over 3883731.60 frames. ], batch size: 69, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:10:17,706 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.330e+01 2.572e+01 2.919e+01 3.914e+02, threshold=5.145e+01, percent-clipped=2.0 2024-08-17 22:10:18,224 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3543920.0, ans=0.125 2024-08-17 22:10:34,649 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-17 22:10:38,524 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-17 22:10:45,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3544120.0, ans=0.125 2024-08-17 22:10:54,125 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-17 22:10:59,041 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-17 22:11:01,908 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-17 22:11:02,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3544220.0, ans=0.125 2024-08-17 22:11:13,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3544320.0, ans=0.2 2024-08-17 22:11:16,131 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3544320.0, ans=0.0 2024-08-17 22:11:18,408 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 11300, loss[loss=0.1187, beats_loss=0.01018, ecapa_loss=0.0001607, whisper_loss=0.1069, over 21516.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01038, ecapa_loss=0.0001466, whisper_loss=0.09032, over 3855873.20 frames. ], batch size: 90, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:11:24,195 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.10 vs. limit=22.5 2024-08-17 22:11:30,751 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.911e-01 2024-08-17 22:11:37,128 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3544520.0, ans=0.2 2024-08-17 22:11:42,307 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3544520.0, ans=0.125 2024-08-17 22:11:59,413 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-17 22:12:01,312 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3544720.0, ans=0.125 2024-08-17 22:12:02,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3544720.0, ans=0.125 2024-08-17 22:12:03,772 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 16 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-17 22:12:06,614 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3544720.0, ans=0.125 2024-08-17 22:12:08,257 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=15.0 2024-08-17 22:12:17,067 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-17 22:12:26,050 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 11350, loss[loss=0.08905, beats_loss=0.01305, ecapa_loss=0.0001427, whisper_loss=0.07458, over 19414.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0104, ecapa_loss=0.0001462, whisper_loss=0.09003, over 3817045.31 frames. ], batch size: 82, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:12:29,977 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.915e+01 2.322e+01 2.583e+01 3.031e+01 6.064e+01, threshold=5.166e+01, percent-clipped=1.0 2024-08-17 22:12:35,921 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3544920.0, ans=0.125 2024-08-17 22:13:01,353 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3545120.0, ans=0.5 2024-08-17 22:13:04,062 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 36 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-17 22:13:33,276 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 11400, loss[loss=0.1099, beats_loss=0.00979, ecapa_loss=0.0001884, whisper_loss=0.09824, over 21665.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01036, ecapa_loss=0.0001481, whisper_loss=0.09061, over 3836069.82 frames. ], batch size: 93, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:13:59,317 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3545520.0, ans=0.125 2024-08-17 22:14:06,064 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3545620.0, ans=0.125 2024-08-17 22:14:15,170 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 27 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-17 22:14:25,147 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-17 22:14:34,809 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.95 vs. limit=15.0 2024-08-17 22:14:36,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3545820.0, ans=0.0 2024-08-17 22:14:43,796 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 11450, loss[loss=0.105, beats_loss=0.0103, ecapa_loss=0.0001301, whisper_loss=0.09336, over 18266.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01045, ecapa_loss=0.0001465, whisper_loss=0.09095, over 3853070.61 frames. ], batch size: 71, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:14:47,657 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.375e+01 2.632e+01 2.898e+01 5.397e+01, threshold=5.264e+01, percent-clipped=1.0 2024-08-17 22:14:51,107 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3545920.0, ans=0.05 2024-08-17 22:15:33,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3546220.0, ans=0.125 2024-08-17 22:15:48,041 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 19 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-17 22:15:56,053 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 11500, loss[loss=0.1011, beats_loss=0.008954, ecapa_loss=0.0001681, whisper_loss=0.0905, over 18736.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01038, ecapa_loss=0.0001478, whisper_loss=0.09083, over 3832269.55 frames. ], batch size: 75, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:16:02,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3546420.0, ans=0.07 2024-08-17 22:16:35,842 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.17 vs. limit=15.0 2024-08-17 22:16:45,236 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-17 22:16:49,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3546720.0, ans=0.125 2024-08-17 22:16:56,548 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.77 vs. limit=22.5 2024-08-17 22:16:59,138 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3546820.0, ans=0.025 2024-08-17 22:17:07,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3546820.0, ans=0.125 2024-08-17 22:17:13,450 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 11550, loss[loss=0.1028, beats_loss=0.01158, ecapa_loss=0.0001399, whisper_loss=0.08987, over 22494.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01042, ecapa_loss=0.0001465, whisper_loss=0.091, over 3833476.97 frames. ], batch size: 88, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:17:17,595 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.251e+01 2.574e+01 2.799e+01 8.248e+01, threshold=5.147e+01, percent-clipped=1.0 2024-08-17 22:17:26,297 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3547020.0, ans=0.0 2024-08-17 22:17:31,985 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3547020.0, ans=0.0 2024-08-17 22:17:35,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3547020.0, ans=0.1 2024-08-17 22:18:39,551 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 11600, loss[loss=0.0822, beats_loss=0.01304, ecapa_loss=0.0001655, whisper_loss=0.06751, over 17121.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01045, ecapa_loss=0.0001466, whisper_loss=0.0904, over 3871074.70 frames. ], batch size: 73, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:18:58,202 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 23 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-17 22:19:19,682 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-17 22:19:34,247 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 27 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-17 22:19:55,189 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3547720.0, ans=0.125 2024-08-17 22:19:57,940 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3547820.0, ans=0.1 2024-08-17 22:19:59,046 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 19 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-17 22:20:18,506 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 20 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-17 22:20:20,396 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 11650, loss[loss=0.08593, beats_loss=0.01165, ecapa_loss=0.0001698, whisper_loss=0.07258, over 18330.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01051, ecapa_loss=0.0001462, whisper_loss=0.09019, over 3884926.41 frames. ], batch size: 78, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:20:23,252 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 25 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-17 22:20:27,303 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.333e+01 2.551e+01 2.882e+01 3.740e+01, threshold=5.102e+01, percent-clipped=0.0 2024-08-17 22:20:36,085 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-17 22:20:45,495 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 35 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-17 22:20:56,973 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 21 from LS+wenet, 21 from Vox, 49 fro AS 2024-08-17 22:21:07,515 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3548120.0, ans=0.1 2024-08-17 22:21:19,878 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 28 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-17 22:21:24,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3548320.0, ans=0.125 2024-08-17 22:21:37,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3548320.0, ans=0.0 2024-08-17 22:21:38,715 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 18 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-17 22:21:40,171 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 11700, loss[loss=0.09541, beats_loss=0.01226, ecapa_loss=0.0001197, whisper_loss=0.08196, over 16614.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01051, ecapa_loss=0.0001478, whisper_loss=0.09061, over 3904984.17 frames. ], batch size: 66, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:21:42,373 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3548420.0, ans=0.2 2024-08-17 22:21:44,278 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3548420.0, ans=0.125 2024-08-17 22:21:44,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3548420.0, ans=0.07 2024-08-17 22:21:56,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3548520.0, ans=0.125 2024-08-17 22:22:00,106 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3548520.0, ans=0.0 2024-08-17 22:22:44,231 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 21 from LS+wenet, 23 from Vox, 49 fro AS 2024-08-17 22:23:18,409 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3548720.0, ans=0.125 2024-08-17 22:23:32,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3548820.0, ans=0.2 2024-08-17 22:23:35,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3548820.0, ans=0.0 2024-08-17 22:23:37,563 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 11750, loss[loss=0.08988, beats_loss=0.009427, ecapa_loss=0.0001718, whisper_loss=0.07873, over 14299.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01069, ecapa_loss=0.0001459, whisper_loss=0.08999, over 3923434.49 frames. ], batch size: 58, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:23:42,252 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.932e+01 2.390e+01 2.565e+01 2.987e+01 4.892e+01, threshold=5.130e+01, percent-clipped=0.0 2024-08-17 22:23:49,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3548920.0, ans=0.125 2024-08-17 22:23:54,786 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3549020.0, ans=0.125 2024-08-17 22:23:56,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3549020.0, ans=0.2 2024-08-17 22:24:04,944 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.52 vs. limit=10.0 2024-08-17 22:24:08,391 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3549120.0, ans=0.0 2024-08-17 22:24:12,728 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3549120.0, ans=0.025 2024-08-17 22:24:14,308 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3549120.0, ans=0.125 2024-08-17 22:24:22,851 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.93 vs. limit=15.0 2024-08-17 22:24:26,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3549120.0, ans=0.0 2024-08-17 22:24:28,712 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3549220.0, ans=0.125 2024-08-17 22:24:45,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3549220.0, ans=0.125 2024-08-17 22:24:46,886 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-17 22:24:47,717 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3549220.0, ans=0.1 2024-08-17 22:25:05,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3549320.0, ans=0.1 2024-08-17 22:25:11,865 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 11800, loss[loss=0.1224, beats_loss=0.007263, ecapa_loss=0.0001421, whisper_loss=0.1138, over 19884.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01062, ecapa_loss=0.0001468, whisper_loss=0.09015, over 3925393.06 frames. ], batch size: 73, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:25:17,072 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3549420.0, ans=0.125 2024-08-17 22:25:23,090 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 18 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-17 22:25:35,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3549520.0, ans=0.125 2024-08-17 22:25:37,186 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-17 22:25:54,614 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-17 22:25:57,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3549620.0, ans=0.125 2024-08-17 22:25:57,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3549620.0, ans=0.125 2024-08-17 22:25:59,350 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.422e-01 2024-08-17 22:26:19,179 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-17 22:26:24,631 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3549720.0, ans=0.0 2024-08-17 22:26:55,215 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 11850, loss[loss=0.08115, beats_loss=0.01154, ecapa_loss=0.0001438, whisper_loss=0.06817, over 17581.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01061, ecapa_loss=0.0001457, whisper_loss=0.08999, over 3910914.35 frames. ], batch size: 70, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:27:02,145 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.725e+01 2.313e+01 2.497e+01 2.701e+01 4.196e+01, threshold=4.994e+01, percent-clipped=0.0 2024-08-17 22:27:04,328 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 14 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-17 22:27:37,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3550020.0, ans=0.125 2024-08-17 22:27:43,490 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3550120.0, ans=0.1 2024-08-17 22:27:47,366 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 30 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-17 22:27:57,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3550120.0, ans=0.125 2024-08-17 22:27:57,710 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3550120.0, ans=0.0 2024-08-17 22:28:21,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=3550220.0, ans=0.05 2024-08-17 22:28:28,866 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3550320.0, ans=0.0 2024-08-17 22:28:33,879 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2024-08-17 22:28:52,404 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3550420.0, ans=0.2 2024-08-17 22:28:53,861 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 11900, loss[loss=0.121, beats_loss=0.007372, ecapa_loss=0.0002351, whisper_loss=0.1113, over 17053.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01063, ecapa_loss=0.0001463, whisper_loss=0.09019, over 3936789.65 frames. ], batch size: 73, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:29:07,930 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-17 22:29:10,969 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.47 vs. limit=15.0 2024-08-17 22:29:15,592 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3550520.0, ans=0.125 2024-08-17 22:29:20,431 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.54 vs. limit=15.0 2024-08-17 22:29:56,138 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 23 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-17 22:30:03,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3550720.0, ans=0.125 2024-08-17 22:30:16,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3550720.0, ans=0.015 2024-08-17 22:30:46,154 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 11950, loss[loss=0.1169, beats_loss=0.008977, ecapa_loss=0.0001825, whisper_loss=0.1061, over 18810.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01064, ecapa_loss=0.0001462, whisper_loss=0.08978, over 3937072.00 frames. ], batch size: 78, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:30:53,070 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.845e+01 2.173e+01 2.418e+01 2.712e+01 4.261e+01, threshold=4.835e+01, percent-clipped=0.0 2024-08-17 22:31:04,992 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 33 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-17 22:31:09,531 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 24 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-17 22:31:21,331 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3551020.0, ans=0.0 2024-08-17 22:31:21,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3551020.0, ans=0.125 2024-08-17 22:31:32,483 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3551120.0, ans=0.2 2024-08-17 22:31:56,645 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3551220.0, ans=0.07 2024-08-17 22:32:07,426 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3551320.0, ans=0.0 2024-08-17 22:32:16,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3551320.0, ans=0.125 2024-08-17 22:32:16,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3551320.0, ans=0.1 2024-08-17 22:32:19,374 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 12000, loss[loss=0.09597, beats_loss=0.01193, ecapa_loss=0.0001529, whisper_loss=0.08251, over 22196.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01069, ecapa_loss=0.0001464, whisper_loss=0.08943, over 3909745.15 frames. ], batch size: 93, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:32:19,374 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-17 22:33:02,548 INFO [train_multi_KD3.py:1149] (2/4) Epoch 24, validation on ASR_libri: loss=0.251, beats_loss=0, ecapa_loss=0.0005236, whisper_loss=0.2457, over 922467.00 frames. 2024-08-17 22:33:16,252 INFO [train_multi_KD3.py:1149] (2/4) Epoch 24, validation on SV_voxceleb1: loss=0.004219, beats_loss=0, ecapa_loss=0.0004219, whisper_loss=0, over 939242.00 frames. 2024-08-17 22:34:52,754 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.6227, 2.0987, 2.0062, 1.9415], device='cuda:2') 2024-08-17 22:35:20,215 INFO [train_multi_KD3.py:1149] (2/4) Epoch 24, validation on AT_audioset: loss=0.02322, beats_loss=0.02322, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-17 22:35:20,219 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31584MB 2024-08-17 22:35:23,928 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-17 22:35:28,074 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3551420.0, ans=0.0 2024-08-17 22:35:53,613 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3551620.0, ans=0.125 2024-08-17 22:35:59,923 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 29 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-17 22:36:06,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3551720.0, ans=0.125 2024-08-17 22:36:15,702 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 22 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-17 22:36:33,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3551820.0, ans=0.09899494936611666 2024-08-17 22:36:33,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3551820.0, ans=0.025 2024-08-17 22:36:37,128 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 12050, loss[loss=0.1254, beats_loss=0.008137, ecapa_loss=0.0001385, whisper_loss=0.1158, over 15579.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01062, ecapa_loss=0.000146, whisper_loss=0.08998, over 3912357.74 frames. ], batch size: 56, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:36:40,858 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3551920.0, ans=0.07 2024-08-17 22:36:41,542 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.293e+01 2.540e+01 2.892e+01 1.917e+02, threshold=5.080e+01, percent-clipped=1.0 2024-08-17 22:36:59,165 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-17 22:37:00,400 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 22 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-17 22:37:17,078 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3552120.0, ans=0.09899494936611666 2024-08-17 22:37:24,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3552220.0, ans=0.125 2024-08-17 22:37:40,998 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-17 22:37:51,251 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 32 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-17 22:37:54,086 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 12100, loss[loss=0.128, beats_loss=0.007608, ecapa_loss=0.0001697, whisper_loss=0.1187, over 19798.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01053, ecapa_loss=0.0001471, whisper_loss=0.09028, over 3899392.09 frames. ], batch size: 82, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:37:58,143 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-17 22:38:09,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3552520.0, ans=0.1 2024-08-17 22:38:11,091 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3552520.0, ans=0.125 2024-08-17 22:38:16,928 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-17 22:38:38,983 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-17 22:38:53,704 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-17 22:39:10,451 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 12150, loss[loss=0.1038, beats_loss=0.01079, ecapa_loss=0.0001574, whisper_loss=0.0914, over 21509.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01055, ecapa_loss=0.0001479, whisper_loss=0.0904, over 3878834.05 frames. ], batch size: 87, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:39:14,795 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.279e+01 2.475e+01 2.710e+01 6.792e+01, threshold=4.950e+01, percent-clipped=1.0 2024-08-17 22:39:16,233 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 35 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-17 22:39:29,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3553020.0, ans=0.0 2024-08-17 22:39:34,216 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 23 from LS+wenet, 18 from Vox, 52 fro AS 2024-08-17 22:39:34,428 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3553020.0, ans=0.125 2024-08-17 22:39:45,733 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 13 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-17 22:39:50,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3553120.0, ans=0.125 2024-08-17 22:40:12,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3553320.0, ans=0.125 2024-08-17 22:40:22,884 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 12200, loss[loss=0.1051, beats_loss=0.0117, ecapa_loss=0.0001547, whisper_loss=0.09184, over 22712.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01065, ecapa_loss=0.0001483, whisper_loss=0.08976, over 3911343.01 frames. ], batch size: 91, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:40:30,537 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-17 22:40:33,323 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3553420.0, ans=0.2 2024-08-17 22:40:40,775 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 22 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-17 22:40:48,355 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3553520.0, ans=0.0 2024-08-17 22:40:55,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3553620.0, ans=0.1 2024-08-17 22:40:58,585 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.62 vs. limit=15.0 2024-08-17 22:41:22,868 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.98 vs. limit=22.5 2024-08-17 22:41:25,498 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3553820.0, ans=0.125 2024-08-17 22:41:35,245 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 12250, loss[loss=0.1192, beats_loss=0.009346, ecapa_loss=0.000132, whisper_loss=0.1085, over 21172.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01064, ecapa_loss=0.0001474, whisper_loss=0.0902, over 3907217.88 frames. ], batch size: 82, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:41:39,673 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.433e+01 2.663e+01 3.002e+01 4.108e+01, threshold=5.326e+01, percent-clipped=0.0 2024-08-17 22:41:40,010 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-17 22:41:51,164 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3554020.0, ans=0.125 2024-08-17 22:41:54,050 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3554020.0, ans=0.125 2024-08-17 22:42:02,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3554120.0, ans=0.125 2024-08-17 22:42:19,328 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3554220.0, ans=0.0 2024-08-17 22:42:22,400 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3554220.0, ans=0.0 2024-08-17 22:42:33,946 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-17 22:42:41,685 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3554320.0, ans=0.0 2024-08-17 22:42:43,185 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-17 22:42:47,860 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 12300, loss[loss=0.08099, beats_loss=0.01179, ecapa_loss=0.0001518, whisper_loss=0.06768, over 21607.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01061, ecapa_loss=0.0001488, whisper_loss=0.08993, over 3926576.33 frames. ], batch size: 89, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:42:51,279 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-17 22:42:52,689 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 21 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-17 22:43:00,571 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.62 vs. limit=12.0 2024-08-17 22:43:15,262 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.10 vs. limit=22.5 2024-08-17 22:43:16,247 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 28 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-17 22:43:22,973 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3554620.0, ans=0.125 2024-08-17 22:43:34,904 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=3554720.0, ans=10.0 2024-08-17 22:43:37,493 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 25 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-17 22:43:42,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3554720.0, ans=0.2 2024-08-17 22:43:50,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3554820.0, ans=0.1 2024-08-17 22:44:00,842 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 12350, loss[loss=0.1235, beats_loss=0.01023, ecapa_loss=0.0001624, whisper_loss=0.1117, over 22854.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01062, ecapa_loss=0.0001484, whisper_loss=0.09032, over 3936132.58 frames. ], batch size: 91, lr: 2.53e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:44:05,311 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.318e+01 2.520e+01 2.807e+01 5.445e+01, threshold=5.040e+01, percent-clipped=1.0 2024-08-17 22:44:27,308 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.83 vs. limit=10.0 2024-08-17 22:44:33,258 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=6.779e-02 2024-08-17 22:44:35,115 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.86 vs. limit=15.0 2024-08-17 22:44:52,344 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3555220.0, ans=0.2 2024-08-17 22:45:13,540 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 12400, loss[loss=0.106, beats_loss=0.01089, ecapa_loss=0.0001167, whisper_loss=0.09396, over 20188.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0106, ecapa_loss=0.0001469, whisper_loss=0.09002, over 3927483.17 frames. ], batch size: 77, lr: 2.53e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:45:13,924 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3555420.0, ans=0.1 2024-08-17 22:45:20,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=3555420.0, ans=0.025 2024-08-17 22:45:25,014 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3555420.0, ans=0.0 2024-08-17 22:45:34,494 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 27 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-17 22:45:46,753 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 19 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-17 22:46:06,760 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3555720.0, ans=0.0 2024-08-17 22:46:22,403 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-17 22:46:23,492 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 12450, loss[loss=0.1039, beats_loss=0.01023, ecapa_loss=0.0001546, whisper_loss=0.09213, over 18192.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01056, ecapa_loss=0.0001467, whisper_loss=0.08955, over 3927223.07 frames. ], batch size: 73, lr: 2.53e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:46:27,518 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.960e+01 2.268e+01 2.505e+01 2.895e+01 6.006e+01, threshold=5.010e+01, percent-clipped=2.0 2024-08-17 22:46:31,899 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3555920.0, ans=0.2 2024-08-17 22:46:36,575 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 27 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-17 22:46:52,130 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 27 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-17 22:47:02,732 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.29 vs. limit=22.5 2024-08-17 22:47:03,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3556220.0, ans=0.2 2024-08-17 22:47:10,388 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3556220.0, ans=0.035 2024-08-17 22:47:13,053 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 35 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-17 22:47:17,792 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3556320.0, ans=0.125 2024-08-17 22:47:23,203 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 18 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-17 22:47:33,005 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 12500, loss[loss=0.1096, beats_loss=0.01271, ecapa_loss=0.000107, whisper_loss=0.09579, over 24211.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01056, ecapa_loss=0.0001468, whisper_loss=0.08966, over 3909094.00 frames. ], batch size: 92, lr: 2.53e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:47:33,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3556420.0, ans=0.0 2024-08-17 22:47:41,920 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.23 vs. limit=15.0 2024-08-17 22:47:43,562 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.651e+01 2024-08-17 22:47:55,099 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.44 vs. limit=5.0 2024-08-17 22:48:02,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3556620.0, ans=0.125 2024-08-17 22:48:09,339 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 14 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-17 22:48:09,626 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3556620.0, ans=0.0 2024-08-17 22:48:17,423 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 28 from LS+wenet, 9 from Vox, 40 fro AS 2024-08-17 22:48:26,925 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 30 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-17 22:48:41,845 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 12550, loss[loss=0.1078, beats_loss=0.01215, ecapa_loss=0.0001071, whisper_loss=0.09454, over 21094.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01062, ecapa_loss=0.000146, whisper_loss=0.08987, over 3897841.41 frames. ], batch size: 83, lr: 2.53e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:48:46,294 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.984e+01 2.357e+01 2.585e+01 2.988e+01 4.779e+01, threshold=5.170e+01, percent-clipped=0.0 2024-08-17 22:48:52,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3556920.0, ans=0.0 2024-08-17 22:49:00,538 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3557020.0, ans=0.1 2024-08-17 22:49:10,038 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 19 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-17 22:49:13,431 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.179e-01 2024-08-17 22:49:20,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3557120.0, ans=0.125 2024-08-17 22:49:37,028 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3557320.0, ans=0.125 2024-08-17 22:49:46,079 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-17 22:49:48,988 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3557320.0, ans=0.125 2024-08-17 22:49:51,221 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 12600, loss[loss=0.1024, beats_loss=0.009732, ecapa_loss=0.0001342, whisper_loss=0.0913, over 23509.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01072, ecapa_loss=0.0001459, whisper_loss=0.08924, over 3882488.72 frames. ], batch size: 92, lr: 2.53e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:49:57,207 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3557420.0, ans=0.125 2024-08-17 22:50:00,974 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-17 22:50:08,247 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 19 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-17 22:50:16,291 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 17 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-17 22:50:20,244 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 20 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-17 22:50:25,264 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.05 vs. limit=22.5 2024-08-17 22:50:41,461 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3557720.0, ans=0.125 2024-08-17 22:50:50,928 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-17 22:51:00,100 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 12650, loss[loss=0.1225, beats_loss=0.007553, ecapa_loss=0.0001747, whisper_loss=0.1132, over 19410.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01082, ecapa_loss=0.0001466, whisper_loss=0.08862, over 3892140.24 frames. ], batch size: 75, lr: 2.53e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:51:04,577 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.293e+01 2.533e+01 2.794e+01 5.900e+01, threshold=5.066e+01, percent-clipped=1.0 2024-08-17 22:51:07,402 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-17 22:51:12,345 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 30 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-17 22:51:16,347 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3558020.0, ans=0.04949747468305833 2024-08-17 22:51:38,629 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 22:51:43,742 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3558220.0, ans=0.125 2024-08-17 22:51:45,290 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3558220.0, ans=0.125 2024-08-17 22:51:46,951 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.53 vs. limit=15.0 2024-08-17 22:51:51,631 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-17 22:52:03,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3558320.0, ans=0.2 2024-08-17 22:52:07,769 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-17 22:52:08,203 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3558420.0, ans=0.0 2024-08-17 22:52:08,968 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 12700, loss[loss=0.1182, beats_loss=0.00939, ecapa_loss=0.0001516, whisper_loss=0.1073, over 22853.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01075, ecapa_loss=0.0001465, whisper_loss=0.08924, over 3859161.53 frames. ], batch size: 91, lr: 2.53e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:52:16,050 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 21 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-17 22:52:22,966 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-17 22:52:23,661 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.30 vs. limit=15.0 2024-08-17 22:52:24,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3558520.0, ans=0.1 2024-08-17 22:52:39,445 INFO [train_multi_KD3.py:844] (2/4) A total of 97 cuts. 36 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-17 22:52:43,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3558620.0, ans=0.125 2024-08-17 22:52:55,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3558720.0, ans=0.0 2024-08-17 22:52:55,424 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 22:52:56,345 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 23 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-17 22:53:00,692 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-17 22:53:05,647 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.02 vs. limit=15.0 2024-08-17 22:53:08,943 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-17 22:53:18,183 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 12750, loss[loss=0.09085, beats_loss=0.01172, ecapa_loss=0.0001598, whisper_loss=0.07754, over 22526.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01068, ecapa_loss=0.0001471, whisper_loss=0.09, over 3887025.41 frames. ], batch size: 93, lr: 2.53e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:53:18,646 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3558920.0, ans=0.0 2024-08-17 22:53:22,186 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.727e+01 2.283e+01 2.578e+01 2.885e+01 4.284e+01, threshold=5.156e+01, percent-clipped=0.0 2024-08-17 22:53:32,901 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-17 22:53:37,206 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 28 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-17 22:53:43,914 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3559120.0, ans=0.125 2024-08-17 22:53:46,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3559120.0, ans=0.95 2024-08-17 22:53:49,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3559120.0, ans=0.125 2024-08-17 22:53:56,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3559120.0, ans=0.0 2024-08-17 22:53:56,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3559120.0, ans=0.125 2024-08-17 22:53:56,533 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.21 vs. limit=22.5 2024-08-17 22:54:17,504 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=3559320.0, ans=0.2 2024-08-17 22:54:26,614 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 12800, loss[loss=0.09871, beats_loss=0.01189, ecapa_loss=0.0001333, whisper_loss=0.08549, over 23488.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01068, ecapa_loss=0.0001468, whisper_loss=0.09039, over 3933154.00 frames. ], batch size: 93, lr: 2.53e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:54:29,599 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 34 from LS+wenet, 32 from Vox, 26 fro AS 2024-08-17 22:54:32,431 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 16 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-17 22:54:48,573 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.79 vs. limit=15.0 2024-08-17 22:54:53,422 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-17 22:54:59,323 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3559620.0, ans=0.1 2024-08-17 22:55:09,200 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 23 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-17 22:55:10,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3559720.0, ans=0.2 2024-08-17 22:55:13,675 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3559720.0, ans=0.0 2024-08-17 22:55:19,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3559720.0, ans=0.0 2024-08-17 22:55:21,983 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 27 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-17 22:55:24,741 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 16 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-17 22:55:26,040 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 26 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-17 22:55:28,775 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3559820.0, ans=0.125 2024-08-17 22:55:32,345 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.90 vs. limit=22.5 2024-08-17 22:55:37,023 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 12850, loss[loss=0.09594, beats_loss=0.0099, ecapa_loss=0.0002014, whisper_loss=0.08402, over 16545.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01066, ecapa_loss=0.0001474, whisper_loss=0.08993, over 3878450.98 frames. ], batch size: 69, lr: 2.53e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:55:40,323 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3559920.0, ans=0.0 2024-08-17 22:55:41,267 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.754e+01 2.263e+01 2.521e+01 2.838e+01 3.742e+01, threshold=5.041e+01, percent-clipped=0.0 2024-08-17 22:55:41,438 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-17 22:55:58,743 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3560020.0, ans=0.125 2024-08-17 22:55:59,950 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3560020.0, ans=0.0 2024-08-17 22:56:01,621 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3560020.0, ans=0.0 2024-08-17 22:56:07,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3560120.0, ans=0.0 2024-08-17 22:56:14,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3560120.0, ans=0.125 2024-08-17 22:56:26,972 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 20 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-17 22:56:45,739 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 13 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-17 22:56:46,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3560320.0, ans=0.1 2024-08-17 22:56:48,864 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.95 vs. limit=22.5 2024-08-17 22:56:49,647 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 12900, loss[loss=0.0871, beats_loss=0.0138, ecapa_loss=0.0001331, whisper_loss=0.07197, over 14859.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01071, ecapa_loss=0.0001473, whisper_loss=0.08951, over 3864350.48 frames. ], batch size: 63, lr: 2.53e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:57:01,056 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 17 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-17 22:57:08,903 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3560520.0, ans=0.125 2024-08-17 22:57:11,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3560520.0, ans=0.125 2024-08-17 22:57:12,927 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3560520.0, ans=0.0 2024-08-17 22:57:30,650 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.67 vs. limit=15.0 2024-08-17 22:57:32,434 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.68 vs. limit=22.5 2024-08-17 22:57:40,989 WARNING [optim.py:496] (2/4) Scaling gradients by 0.045217473059892654, model_norm_threshold=50.41379928588867 2024-08-17 22:57:41,159 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.15, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.812e+05, grad_sumsq=1.812e+05, orig_rms_sq=1.000e+00 2024-08-17 22:57:55,816 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-17 22:57:57,693 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3560820.0, ans=0.125 2024-08-17 22:58:02,145 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 12950, loss[loss=0.09611, beats_loss=0.01072, ecapa_loss=0.0001728, whisper_loss=0.08366, over 18335.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01072, ecapa_loss=0.0001461, whisper_loss=0.08892, over 3864299.15 frames. ], batch size: 77, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:58:05,673 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.04 vs. limit=6.0 2024-08-17 22:58:07,898 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.175e+01 2.399e+01 2.908e+01 1.115e+03, threshold=4.798e+01, percent-clipped=1.0 2024-08-17 22:58:08,097 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 33 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-17 22:58:19,439 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.410e+05 2024-08-17 22:58:26,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3561020.0, ans=0.1 2024-08-17 22:58:40,731 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-17 22:59:01,641 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3561320.0, ans=0.07 2024-08-17 22:59:13,950 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3561420.0, ans=0.125 2024-08-17 22:59:15,115 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 13000, loss[loss=0.09904, beats_loss=0.01093, ecapa_loss=0.000115, whisper_loss=0.08695, over 14006.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01069, ecapa_loss=0.0001463, whisper_loss=0.08915, over 3861578.50 frames. ], batch size: 54, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:59:23,185 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3561420.0, ans=0.0 2024-08-17 22:59:24,433 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 18 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-17 22:59:31,403 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3561520.0, ans=0.025 2024-08-17 22:59:34,911 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3561520.0, ans=0.0 2024-08-17 22:59:36,527 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.38 vs. limit=15.0 2024-08-17 22:59:43,980 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3561620.0, ans=0.1 2024-08-17 23:00:01,652 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3561720.0, ans=0.125 2024-08-17 23:00:01,733 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=6.644e-01 2024-08-17 23:00:06,646 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.24 vs. limit=10.0 2024-08-17 23:00:15,823 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3561820.0, ans=0.125 2024-08-17 23:00:28,192 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3561920.0, ans=0.125 2024-08-17 23:00:28,998 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 13050, loss[loss=0.08577, beats_loss=0.01094, ecapa_loss=0.0001406, whisper_loss=0.07342, over 22548.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01065, ecapa_loss=0.0001462, whisper_loss=0.08949, over 3867949.00 frames. ], batch size: 94, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:00:34,992 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.770e+01 2.290e+01 2.565e+01 2.843e+01 4.740e+01, threshold=5.131e+01, percent-clipped=0.0 2024-08-17 23:00:47,401 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3562020.0, ans=0.125 2024-08-17 23:01:01,962 WARNING [optim.py:496] (2/4) Scaling gradients by 0.08320149034261703, model_norm_threshold=51.3093376159668 2024-08-17 23:01:02,131 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.121e+04, grad_sumsq=1.818e+04, orig_rms_sq=3.366e+00 2024-08-17 23:01:19,749 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.99 vs. limit=15.0 2024-08-17 23:01:22,256 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 23 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-17 23:01:32,215 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3562320.0, ans=0.0 2024-08-17 23:01:41,804 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3562320.0, ans=0.125 2024-08-17 23:01:44,177 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 13100, loss[loss=0.09055, beats_loss=0.009292, ecapa_loss=0.0001502, whisper_loss=0.07975, over 19176.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01063, ecapa_loss=0.0001459, whisper_loss=0.08942, over 3869696.97 frames. ], batch size: 78, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:01:46,120 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 19 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-17 23:01:47,460 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-17 23:01:49,269 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3562420.0, ans=0.0 2024-08-17 23:02:04,592 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3562520.0, ans=0.2 2024-08-17 23:02:05,854 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 33 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-17 23:02:16,141 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3562620.0, ans=0.1 2024-08-17 23:02:20,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3562620.0, ans=0.2 2024-08-17 23:02:24,356 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.79 vs. limit=12.0 2024-08-17 23:02:27,419 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-17 23:02:36,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3562720.0, ans=0.125 2024-08-17 23:02:43,470 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-17 23:02:52,956 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3562820.0, ans=0.0 2024-08-17 23:02:54,072 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3562820.0, ans=0.125 2024-08-17 23:03:00,796 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 13150, loss[loss=0.08671, beats_loss=0.009982, ecapa_loss=0.0001731, whisper_loss=0.075, over 15845.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01061, ecapa_loss=0.0001454, whisper_loss=0.08963, over 3852577.38 frames. ], batch size: 68, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:03:02,106 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.85 vs. limit=15.0 2024-08-17 23:03:06,818 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.009e+01 2.414e+01 2.698e+01 3.144e+01 6.167e+02, threshold=5.396e+01, percent-clipped=2.0 2024-08-17 23:03:18,295 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-17 23:03:31,903 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-17 23:03:34,987 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 42 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-17 23:03:39,448 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 26 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-17 23:03:41,131 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.58 vs. limit=6.0 2024-08-17 23:03:44,949 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-17 23:03:51,966 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-17 23:03:52,217 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3563220.0, ans=0.125 2024-08-17 23:04:12,258 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3563320.0, ans=0.07 2024-08-17 23:04:15,579 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 13200, loss[loss=0.091, beats_loss=0.01074, ecapa_loss=0.0001503, whisper_loss=0.07875, over 16857.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01058, ecapa_loss=0.0001465, whisper_loss=0.08974, over 3853729.18 frames. ], batch size: 69, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:04:19,200 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.52 vs. limit=15.0 2024-08-17 23:04:20,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3563420.0, ans=0.125 2024-08-17 23:04:32,169 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 29 from Vox, 26 fro AS 2024-08-17 23:04:36,655 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.50 vs. limit=22.5 2024-08-17 23:04:47,745 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2024-08-17 23:04:50,407 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3563620.0, ans=0.125 2024-08-17 23:05:03,416 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3563720.0, ans=0.2 2024-08-17 23:05:15,936 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-17 23:05:22,317 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.26 vs. limit=15.0 2024-08-17 23:05:23,286 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 41 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-17 23:05:28,622 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 13250, loss[loss=0.09367, beats_loss=0.01179, ecapa_loss=0.0001346, whisper_loss=0.08053, over 17954.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01059, ecapa_loss=0.0001475, whisper_loss=0.08967, over 3856016.32 frames. ], batch size: 72, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:05:34,393 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.895e+01 2.345e+01 2.575e+01 2.975e+01 4.743e+02, threshold=5.149e+01, percent-clipped=2.0 2024-08-17 23:05:40,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3563920.0, ans=0.2 2024-08-17 23:05:48,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3564020.0, ans=0.07 2024-08-17 23:05:57,703 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3564120.0, ans=0.0 2024-08-17 23:05:58,553 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-17 23:06:03,116 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-17 23:06:16,461 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-17 23:06:19,209 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 17 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-17 23:06:33,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3564320.0, ans=0.04949747468305833 2024-08-17 23:06:34,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3564320.0, ans=0.125 2024-08-17 23:06:39,305 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 13300, loss[loss=0.07474, beats_loss=0.01123, ecapa_loss=0.0001584, whisper_loss=0.06193, over 16922.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01067, ecapa_loss=0.0001469, whisper_loss=0.08916, over 3888474.75 frames. ], batch size: 70, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:06:42,630 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 23:06:51,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3564420.0, ans=0.0 2024-08-17 23:06:55,536 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=3564520.0, ans=15.0 2024-08-17 23:07:03,554 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 14 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-17 23:07:03,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3564520.0, ans=0.04949747468305833 2024-08-17 23:07:05,379 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.71 vs. limit=22.5 2024-08-17 23:07:12,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3564620.0, ans=0.0 2024-08-17 23:07:31,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3564720.0, ans=0.2 2024-08-17 23:07:32,760 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-17 23:07:36,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3564820.0, ans=0.0 2024-08-17 23:07:47,101 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-17 23:07:48,109 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 13350, loss[loss=0.1026, beats_loss=0.00885, ecapa_loss=0.000146, whisper_loss=0.09227, over 15091.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01064, ecapa_loss=0.0001473, whisper_loss=0.08908, over 3901434.99 frames. ], batch size: 58, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:07:53,622 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.397e+01 2.708e+01 2.975e+01 4.671e+01, threshold=5.415e+01, percent-clipped=0.0 2024-08-17 23:07:57,901 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-17 23:07:59,413 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-17 23:08:11,863 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 19 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-17 23:08:16,209 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 29 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-17 23:08:20,192 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3565120.0, ans=0.0 2024-08-17 23:08:29,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3565220.0, ans=0.1 2024-08-17 23:08:41,621 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 24 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-17 23:08:50,195 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.91 vs. limit=12.0 2024-08-17 23:08:50,262 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.90 vs. limit=22.5 2024-08-17 23:08:53,144 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.40 vs. limit=10.0 2024-08-17 23:08:55,780 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 13400, loss[loss=0.1224, beats_loss=0.009878, ecapa_loss=0.0001801, whisper_loss=0.1107, over 22566.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01064, ecapa_loss=0.0001478, whisper_loss=0.08899, over 3905095.04 frames. ], batch size: 92, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:08:57,247 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-17 23:08:57,463 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3565420.0, ans=0.125 2024-08-17 23:09:04,940 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.318e+05 2024-08-17 23:09:22,473 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.56 vs. limit=10.0 2024-08-17 23:09:29,501 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3565620.0, ans=0.125 2024-08-17 23:09:39,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3565720.0, ans=0.0 2024-08-17 23:09:43,204 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3565720.0, ans=0.1 2024-08-17 23:09:54,431 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3565820.0, ans=0.0 2024-08-17 23:09:58,490 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3565820.0, ans=0.035 2024-08-17 23:10:01,179 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-17 23:10:02,999 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3565820.0, ans=0.0 2024-08-17 23:10:05,688 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 13450, loss[loss=0.08824, beats_loss=0.01077, ecapa_loss=0.0001456, whisper_loss=0.07601, over 18404.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0107, ecapa_loss=0.0001467, whisper_loss=0.08908, over 3888502.85 frames. ], batch size: 76, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:10:11,362 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.425e+01 2.663e+01 2.956e+01 3.669e+02, threshold=5.327e+01, percent-clipped=2.0 2024-08-17 23:10:18,903 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.20 vs. limit=15.0 2024-08-17 23:10:23,580 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 35 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-17 23:10:31,515 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 17 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-17 23:10:36,122 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3566120.0, ans=0.0 2024-08-17 23:10:48,467 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 36 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-17 23:10:55,681 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3566220.0, ans=0.0 2024-08-17 23:10:57,916 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-17 23:11:14,046 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 13500, loss[loss=0.09555, beats_loss=0.01146, ecapa_loss=0.0001303, whisper_loss=0.0828, over 22421.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01065, ecapa_loss=0.0001479, whisper_loss=0.08902, over 3891540.95 frames. ], batch size: 88, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:11:22,157 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3566420.0, ans=0.2 2024-08-17 23:11:31,585 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 16 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-17 23:12:16,405 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3566820.0, ans=0.05 2024-08-17 23:12:21,094 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 13550, loss[loss=0.09845, beats_loss=0.01013, ecapa_loss=0.0001741, whisper_loss=0.08658, over 21608.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01066, ecapa_loss=0.0001467, whisper_loss=0.0894, over 3879281.61 frames. ], batch size: 90, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:12:24,234 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3566920.0, ans=0.125 2024-08-17 23:12:25,361 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3566920.0, ans=0.125 2024-08-17 23:12:26,141 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.373e+01 2.638e+01 2.818e+01 4.102e+01, threshold=5.277e+01, percent-clipped=0.0 2024-08-17 23:12:29,186 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.15 vs. limit=6.0 2024-08-17 23:12:43,969 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=7.022e+01 2024-08-17 23:12:45,447 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 26 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-17 23:12:47,747 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.61 vs. limit=22.5 2024-08-17 23:13:09,813 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-17 23:13:11,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3567220.0, ans=0.125 2024-08-17 23:13:22,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3567320.0, ans=0.1 2024-08-17 23:13:25,772 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.69 vs. limit=15.0 2024-08-17 23:13:26,501 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-17 23:13:28,211 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3567420.0, ans=0.025 2024-08-17 23:13:28,637 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.78 vs. limit=22.5 2024-08-17 23:13:29,215 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 13600, loss[loss=0.09688, beats_loss=0.01325, ecapa_loss=9.824e-05, whisper_loss=0.08265, over 20100.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0107, ecapa_loss=0.0001448, whisper_loss=0.08979, over 3894628.05 frames. ], batch size: 76, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:13:51,081 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3567520.0, ans=0.125 2024-08-17 23:13:57,414 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-17 23:13:59,563 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3567620.0, ans=0.09899494936611666 2024-08-17 23:14:05,029 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 23:14:14,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3567720.0, ans=0.125 2024-08-17 23:14:25,297 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3567720.0, ans=0.1 2024-08-17 23:14:26,414 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-17 23:14:29,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3567820.0, ans=0.125 2024-08-17 23:14:40,972 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 13650, loss[loss=0.112, beats_loss=0.01082, ecapa_loss=0.0001482, whisper_loss=0.09972, over 23914.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01071, ecapa_loss=0.0001463, whisper_loss=0.09047, over 3913631.18 frames. ], batch size: 94, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:14:43,376 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3567920.0, ans=0.1 2024-08-17 23:14:44,320 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-17 23:14:45,922 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.25 vs. limit=6.0 2024-08-17 23:14:47,033 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.376e+01 2.690e+01 3.104e+01 4.136e+01, threshold=5.380e+01, percent-clipped=0.0 2024-08-17 23:15:15,554 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-17 23:15:53,903 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 13700, loss[loss=0.08267, beats_loss=0.0098, ecapa_loss=0.0001346, whisper_loss=0.07152, over 19819.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01064, ecapa_loss=0.0001476, whisper_loss=0.09085, over 3928984.89 frames. ], batch size: 79, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:15:58,880 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.87 vs. limit=6.0 2024-08-17 23:15:58,900 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.41 vs. limit=15.0 2024-08-17 23:16:00,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3568420.0, ans=0.1 2024-08-17 23:16:23,959 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 20 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-17 23:16:31,252 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 36 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-17 23:16:43,110 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 11 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-17 23:16:44,572 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 20 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-17 23:16:48,665 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3568820.0, ans=0.125 2024-08-17 23:16:54,419 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.45 vs. limit=12.0 2024-08-17 23:17:04,167 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 13750, loss[loss=0.08361, beats_loss=0.01104, ecapa_loss=9.636e-05, whisper_loss=0.07161, over 18791.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01063, ecapa_loss=0.0001462, whisper_loss=0.09102, over 3902694.89 frames. ], batch size: 72, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:17:10,314 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.371e+01 2.635e+01 2.890e+01 4.017e+01, threshold=5.270e+01, percent-clipped=0.0 2024-08-17 23:17:38,447 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3569120.0, ans=0.05 2024-08-17 23:17:40,826 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-17 23:17:43,819 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 10 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-17 23:17:46,098 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 26 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-17 23:17:48,782 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 31 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-17 23:17:49,914 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 32 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-17 23:18:01,762 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2024-08-17 23:18:13,899 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 13800, loss[loss=0.1049, beats_loss=0.009645, ecapa_loss=0.0001348, whisper_loss=0.09392, over 22136.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01054, ecapa_loss=0.0001461, whisper_loss=0.09126, over 3896626.10 frames. ], batch size: 88, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:18:18,431 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.73 vs. limit=15.0 2024-08-17 23:18:35,431 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3569520.0, ans=0.0 2024-08-17 23:18:44,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3569620.0, ans=0.0 2024-08-17 23:18:44,873 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.38 vs. limit=15.0 2024-08-17 23:18:55,254 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-17 23:19:00,908 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 19 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-17 23:19:15,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3569820.0, ans=0.0 2024-08-17 23:19:21,476 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 13850, loss[loss=0.08264, beats_loss=0.01416, ecapa_loss=0.0001328, whisper_loss=0.06715, over 14615.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01062, ecapa_loss=0.0001455, whisper_loss=0.09079, over 3907582.85 frames. ], batch size: 59, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:19:26,981 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.402e+01 2.666e+01 2.960e+01 4.114e+01, threshold=5.333e+01, percent-clipped=0.0 2024-08-17 23:19:44,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3570020.0, ans=0.125 2024-08-17 23:19:57,763 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3570120.0, ans=0.0 2024-08-17 23:20:17,597 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-17 23:20:20,740 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2024-08-17 23:20:27,212 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3570320.0, ans=0.05 2024-08-17 23:20:29,634 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 13900, loss[loss=0.1203, beats_loss=0.00764, ecapa_loss=0.0001512, whisper_loss=0.1112, over 17950.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01055, ecapa_loss=0.000146, whisper_loss=0.09123, over 3907783.58 frames. ], batch size: 68, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:20:53,066 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-17 23:20:53,669 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.48 vs. limit=22.5 2024-08-17 23:20:54,650 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3570520.0, ans=0.0 2024-08-17 23:21:01,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=3570620.0, ans=0.1 2024-08-17 23:21:18,206 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3570720.0, ans=0.125 2024-08-17 23:21:31,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3570820.0, ans=0.0 2024-08-17 23:21:39,456 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 13950, loss[loss=0.06967, beats_loss=0.01294, ecapa_loss=0.0001415, whisper_loss=0.05531, over 21682.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01053, ecapa_loss=0.0001462, whisper_loss=0.0914, over 3900562.23 frames. ], batch size: 92, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:21:40,341 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.42 vs. limit=10.0 2024-08-17 23:21:45,227 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.372e+01 2.631e+01 3.013e+01 8.564e+01, threshold=5.263e+01, percent-clipped=2.0 2024-08-17 23:21:47,430 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3570920.0, ans=0.125 2024-08-17 23:21:53,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3571020.0, ans=0.125 2024-08-17 23:21:56,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3571020.0, ans=0.09899494936611666 2024-08-17 23:22:14,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3571120.0, ans=0.125 2024-08-17 23:22:19,038 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3571120.0, ans=0.125 2024-08-17 23:22:19,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3571120.0, ans=0.0 2024-08-17 23:22:20,121 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-17 23:22:29,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3571220.0, ans=0.025 2024-08-17 23:22:34,291 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3571220.0, ans=0.125 2024-08-17 23:22:43,034 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3571320.0, ans=0.0 2024-08-17 23:22:50,768 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 14000, loss[loss=0.1149, beats_loss=0.01006, ecapa_loss=0.0001208, whisper_loss=0.1036, over 22124.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01048, ecapa_loss=0.0001468, whisper_loss=0.09179, over 3910012.69 frames. ], batch size: 84, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:22:57,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3571420.0, ans=10.0 2024-08-17 23:23:21,396 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3571620.0, ans=0.125 2024-08-17 23:23:29,235 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 18 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-17 23:23:34,986 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3571720.0, ans=0.0 2024-08-17 23:23:39,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3571720.0, ans=0.0 2024-08-17 23:23:49,150 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 19 from LS+wenet, 9 from Vox, 28 fro AS 2024-08-17 23:23:50,676 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.35 vs. limit=10.0 2024-08-17 23:23:56,241 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3571820.0, ans=0.1 2024-08-17 23:23:57,176 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-17 23:24:00,248 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3571920.0, ans=0.0 2024-08-17 23:24:01,024 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 14050, loss[loss=0.07715, beats_loss=0.01259, ecapa_loss=0.0001361, whisper_loss=0.0632, over 14198.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01054, ecapa_loss=0.000146, whisper_loss=0.09155, over 3913522.11 frames. ], batch size: 59, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:24:01,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3571920.0, ans=0.125 2024-08-17 23:24:05,370 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 23:24:06,128 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.345e+01 2.533e+01 2.843e+01 6.962e+01, threshold=5.065e+01, percent-clipped=1.0 2024-08-17 23:24:06,609 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3571920.0, ans=0.125 2024-08-17 23:24:09,424 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3571920.0, ans=0.0 2024-08-17 23:24:37,795 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3572120.0, ans=0.07 2024-08-17 23:24:47,147 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 30 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-17 23:24:49,870 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=3572220.0, ans=0.02 2024-08-17 23:25:04,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3572320.0, ans=0.125 2024-08-17 23:25:09,206 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 14100, loss[loss=0.1203, beats_loss=0.00775, ecapa_loss=0.0001712, whisper_loss=0.1109, over 21875.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01053, ecapa_loss=0.0001469, whisper_loss=0.0919, over 3953836.25 frames. ], batch size: 88, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:25:09,879 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.26 vs. limit=12.0 2024-08-17 23:25:21,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3572520.0, ans=0.125 2024-08-17 23:25:42,795 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-17 23:25:57,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3572720.0, ans=0.125 2024-08-17 23:25:58,724 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-17 23:26:12,307 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3572820.0, ans=0.0 2024-08-17 23:26:12,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3572820.0, ans=0.0 2024-08-17 23:26:14,644 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 15 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-17 23:26:16,590 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-17 23:26:16,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3572920.0, ans=0.125 2024-08-17 23:26:17,690 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 14150, loss[loss=0.1079, beats_loss=0.009251, ecapa_loss=0.000163, whisper_loss=0.097, over 17212.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01047, ecapa_loss=0.0001472, whisper_loss=0.09179, over 3926510.57 frames. ], batch size: 68, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:26:18,138 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3572920.0, ans=0.125 2024-08-17 23:26:22,839 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.750e+01 2.363e+01 2.612e+01 2.970e+01 1.774e+02, threshold=5.225e+01, percent-clipped=3.0 2024-08-17 23:26:30,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3573020.0, ans=0.2 2024-08-17 23:27:00,207 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3573220.0, ans=0.2 2024-08-17 23:27:06,315 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.77 vs. limit=22.5 2024-08-17 23:27:06,760 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-17 23:27:18,994 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 23:27:26,611 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 14200, loss[loss=0.08271, beats_loss=0.01278, ecapa_loss=0.0001125, whisper_loss=0.0688, over 16629.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01052, ecapa_loss=0.0001459, whisper_loss=0.09126, over 3943010.06 frames. ], batch size: 65, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:27:41,626 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-17 23:27:49,882 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 27 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-17 23:27:52,788 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3573620.0, ans=0.125 2024-08-17 23:27:55,321 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3573620.0, ans=0.125 2024-08-17 23:27:58,082 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 23:28:16,104 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 21 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-17 23:28:19,149 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.23 vs. limit=15.0 2024-08-17 23:28:33,350 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 14250, loss[loss=0.1089, beats_loss=0.01148, ecapa_loss=0.0001278, whisper_loss=0.09614, over 23445.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0105, ecapa_loss=0.0001464, whisper_loss=0.09124, over 3956224.62 frames. ], batch size: 93, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:28:38,814 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.386e+01 2.591e+01 2.986e+01 7.501e+01, threshold=5.182e+01, percent-clipped=1.0 2024-08-17 23:28:39,120 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3573920.0, ans=0.0 2024-08-17 23:28:59,443 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-17 23:29:17,528 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.81 vs. limit=15.0 2024-08-17 23:29:27,588 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-17 23:29:39,966 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 22 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-17 23:29:42,733 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 14300, loss[loss=0.1272, beats_loss=0.009719, ecapa_loss=0.0001372, whisper_loss=0.1161, over 23029.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01051, ecapa_loss=0.0001458, whisper_loss=0.09072, over 3941826.26 frames. ], batch size: 90, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:29:44,805 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3574420.0, ans=0.2 2024-08-17 23:30:06,994 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 37 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-17 23:30:24,380 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2024-08-17 23:30:28,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3574720.0, ans=0.2 2024-08-17 23:30:29,409 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3574720.0, ans=0.125 2024-08-17 23:30:30,243 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 20 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-17 23:30:52,348 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 14350, loss[loss=0.09821, beats_loss=0.009735, ecapa_loss=0.0001563, whisper_loss=0.08691, over 19208.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0105, ecapa_loss=0.000145, whisper_loss=0.09065, over 3967250.27 frames. ], batch size: 75, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:30:54,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=3574920.0, ans=22.5 2024-08-17 23:30:56,472 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-17 23:30:57,415 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.260e+01 2.482e+01 2.823e+01 6.177e+01, threshold=4.963e+01, percent-clipped=1.0 2024-08-17 23:30:59,567 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 23:31:04,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3575020.0, ans=0.125 2024-08-17 23:31:09,683 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 36 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-17 23:31:12,542 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 23:31:19,846 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.08 vs. limit=15.0 2024-08-17 23:31:28,217 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3575120.0, ans=0.0 2024-08-17 23:31:45,289 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3575220.0, ans=0.2 2024-08-17 23:31:56,408 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 19 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-17 23:32:01,448 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 14400, loss[loss=0.1032, beats_loss=0.008276, ecapa_loss=0.000182, whisper_loss=0.09307, over 13378.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01057, ecapa_loss=0.0001465, whisper_loss=0.09054, over 3946137.40 frames. ], batch size: 56, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:32:01,598 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 22 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-17 23:32:04,671 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3575420.0, ans=0.0 2024-08-17 23:32:11,647 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 22 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-17 23:32:24,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3575520.0, ans=0.125 2024-08-17 23:32:25,927 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3575520.0, ans=0.025 2024-08-17 23:32:45,275 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-17 23:33:00,941 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 22 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-17 23:33:15,033 INFO [train_multi_KD3.py:1116] (2/4) Epoch 24, batch 14450, loss[loss=0.09308, beats_loss=0.0105, ecapa_loss=0.0001462, whisper_loss=0.08112, over 19115.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01061, ecapa_loss=0.0001459, whisper_loss=0.09054, over 3925787.28 frames. ], batch size: 80, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:33:21,139 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.457e+01 2.673e+01 3.011e+01 5.974e+01, threshold=5.346e+01, percent-clipped=2.0 2024-08-17 23:33:34,886 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.whiten.whitening_limit, batch_count=3576020.0, ans=12.0 2024-08-17 23:33:36,393 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3576020.0, ans=0.125 2024-08-17 23:33:39,936 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 21 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-17 23:33:44,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3576120.0, ans=0.0 2024-08-17 23:33:52,501 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 15 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-17 23:33:58,432 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-17 23:34:02,471 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-17 23:34:55,194 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 0, loss[loss=0.09121, beats_loss=0.01028, ecapa_loss=0.0001534, whisper_loss=0.07939, over 18061.00 frames. ], tot_loss[loss=0.09121, beats_loss=0.01028, ecapa_loss=0.0001534, whisper_loss=0.07939, over 18061.00 frames. ], batch size: 74, lr: 2.48e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:34:55,194 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-17 23:35:34,956 INFO [train_multi_KD3.py:1149] (2/4) Epoch 25, validation on ASR_libri: loss=0.253, beats_loss=0, ecapa_loss=0.000529, whisper_loss=0.2477, over 922467.00 frames. 2024-08-17 23:35:49,872 INFO [train_multi_KD3.py:1149] (2/4) Epoch 25, validation on SV_voxceleb1: loss=0.004106, beats_loss=0, ecapa_loss=0.0004106, whisper_loss=0, over 939242.00 frames. 2024-08-17 23:37:32,581 INFO [train_multi_KD3.py:1149] (2/4) Epoch 25, validation on AT_audioset: loss=0.02333, beats_loss=0.02333, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-17 23:37:32,584 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31584MB 2024-08-17 23:37:57,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3576420.0, ans=0.0 2024-08-17 23:37:59,430 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3576420.0, ans=0.0 2024-08-17 23:38:21,538 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3576520.0, ans=0.125 2024-08-17 23:38:23,412 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 14 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-17 23:38:33,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3576520.0, ans=0.0 2024-08-17 23:38:39,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3576520.0, ans=0.125 2024-08-17 23:38:56,934 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.09 vs. limit=12.0 2024-08-17 23:39:30,728 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3576820.0, ans=0.125 2024-08-17 23:39:32,413 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 50, loss[loss=0.1182, beats_loss=0.009323, ecapa_loss=0.0001487, whisper_loss=0.1074, over 23302.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.009686, ecapa_loss=0.0001475, whisper_loss=0.08969, over 899378.79 frames. ], batch size: 91, lr: 2.48e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:40:00,168 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3576920.0, ans=0.125 2024-08-17 23:40:00,223 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 23:40:04,011 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.415e+01 2.699e+01 3.079e+01 5.308e+01, threshold=5.398e+01, percent-clipped=0.0 2024-08-17 23:40:20,548 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.09 vs. limit=22.5 2024-08-17 23:40:51,168 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3577120.0, ans=0.0 2024-08-17 23:40:51,314 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.26 vs. limit=15.0 2024-08-17 23:41:01,140 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3577220.0, ans=0.1 2024-08-17 23:41:01,375 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.05 vs. limit=15.0 2024-08-17 23:41:04,060 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.54 vs. limit=15.0 2024-08-17 23:41:16,552 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-17 23:41:20,756 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 18 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-17 23:41:21,772 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 100, loss[loss=0.0747, beats_loss=0.01164, ecapa_loss=0.0001338, whisper_loss=0.06173, over 20840.00 frames. ], tot_loss[loss=0.101, beats_loss=0.009632, ecapa_loss=0.000146, whisper_loss=0.08995, over 1546727.30 frames. ], batch size: 86, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:41:28,223 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 23 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-17 23:41:31,201 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.24 vs. limit=22.5 2024-08-17 23:41:41,087 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 28 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-17 23:41:41,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3577320.0, ans=0.125 2024-08-17 23:41:51,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3577420.0, ans=0.07 2024-08-17 23:42:06,666 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3577520.0, ans=0.125 2024-08-17 23:42:08,678 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 26 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-17 23:42:12,002 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-17 23:42:19,332 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 18 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-17 23:42:28,632 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.64 vs. limit=12.0 2024-08-17 23:42:35,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3577620.0, ans=0.125 2024-08-17 23:42:39,260 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.60 vs. limit=15.0 2024-08-17 23:42:49,522 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3577720.0, ans=0.0 2024-08-17 23:43:04,530 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 150, loss[loss=0.09642, beats_loss=0.009902, ecapa_loss=0.0001425, whisper_loss=0.0851, over 22102.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.009677, ecapa_loss=0.0001449, whisper_loss=0.09022, over 2048052.74 frames. ], batch size: 88, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:43:20,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3577920.0, ans=0.05 2024-08-17 23:43:22,090 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 23:43:27,248 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.000e+01 2.543e+01 2.785e+01 3.052e+01 4.688e+01, threshold=5.571e+01, percent-clipped=0.0 2024-08-17 23:43:31,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3577920.0, ans=0.0 2024-08-17 23:43:43,535 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 16 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-17 23:43:50,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3578020.0, ans=0.0 2024-08-17 23:43:57,903 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 23 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-17 23:44:14,696 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.86 vs. limit=12.0 2024-08-17 23:44:18,740 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-17 23:44:23,170 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 200, loss[loss=0.09149, beats_loss=0.01021, ecapa_loss=0.0001391, whisper_loss=0.07989, over 14497.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.009841, ecapa_loss=0.0001469, whisper_loss=0.09022, over 2437708.92 frames. ], batch size: 55, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:44:24,812 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-17 23:44:41,592 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3578420.0, ans=0.1 2024-08-17 23:44:48,811 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3578420.0, ans=0.2 2024-08-17 23:44:50,207 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3578520.0, ans=0.125 2024-08-17 23:44:51,340 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3578520.0, ans=0.0 2024-08-17 23:44:54,575 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.19 vs. limit=10.0 2024-08-17 23:44:55,378 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 16 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-17 23:45:07,171 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 37 from LS+wenet, 29 from Vox, 23 fro AS 2024-08-17 23:45:15,662 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-17 23:45:18,643 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 26 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-17 23:45:23,005 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3578720.0, ans=0.125 2024-08-17 23:45:27,032 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 21 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-17 23:45:33,523 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 250, loss[loss=0.1089, beats_loss=0.008741, ecapa_loss=0.0001422, whisper_loss=0.09875, over 22699.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01004, ecapa_loss=0.0001472, whisper_loss=0.09059, over 2756088.98 frames. ], batch size: 87, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:45:35,551 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-17 23:45:53,630 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.761e+01 2.355e+01 2.707e+01 2.999e+01 3.108e+02, threshold=5.414e+01, percent-clipped=1.0 2024-08-17 23:46:01,684 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 22 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-17 23:46:08,955 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-17 23:46:12,022 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3579020.0, ans=10.0 2024-08-17 23:46:31,644 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.94 vs. limit=15.0 2024-08-17 23:46:34,462 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 23:46:35,515 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 15 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-17 23:46:41,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3579320.0, ans=0.125 2024-08-17 23:46:41,976 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 300, loss[loss=0.09056, beats_loss=0.008363, ecapa_loss=0.000179, whisper_loss=0.08041, over 13668.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01014, ecapa_loss=0.0001478, whisper_loss=0.09022, over 2949427.71 frames. ], batch size: 54, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:46:46,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3579320.0, ans=0.1 2024-08-17 23:47:06,765 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3579420.0, ans=0.0 2024-08-17 23:47:10,939 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3579520.0, ans=0.1 2024-08-17 23:47:13,283 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-17 23:47:16,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3579520.0, ans=0.2 2024-08-17 23:47:21,801 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3579620.0, ans=0.0 2024-08-17 23:47:34,969 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 24 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-17 23:47:48,833 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 350, loss[loss=0.1047, beats_loss=0.01195, ecapa_loss=0.0001064, whisper_loss=0.09172, over 14609.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01018, ecapa_loss=0.0001474, whisper_loss=0.09008, over 3112241.74 frames. ], batch size: 53, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:47:50,260 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 27 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-17 23:47:51,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3579820.0, ans=0.125 2024-08-17 23:48:07,192 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.918e+01 2.271e+01 2.566e+01 2.908e+01 1.421e+02, threshold=5.133e+01, percent-clipped=1.0 2024-08-17 23:48:10,199 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3579920.0, ans=0.1 2024-08-17 23:48:15,869 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 16 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-17 23:48:20,964 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3580020.0, ans=0.125 2024-08-17 23:48:44,142 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3580220.0, ans=0.125 2024-08-17 23:48:53,407 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 23 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-17 23:48:56,045 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 400, loss[loss=0.09444, beats_loss=0.008649, ecapa_loss=0.0001891, whisper_loss=0.0839, over 16043.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01015, ecapa_loss=0.000148, whisper_loss=0.09005, over 3237816.21 frames. ], batch size: 66, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:49:09,984 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-17 23:49:10,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3580420.0, ans=0.125 2024-08-17 23:49:20,160 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.37 vs. limit=10.0 2024-08-17 23:49:21,813 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 21 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-17 23:49:37,087 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 16 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-17 23:49:41,522 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3580620.0, ans=0.0 2024-08-17 23:49:44,249 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.67 vs. limit=12.0 2024-08-17 23:49:51,510 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.05 vs. limit=10.0 2024-08-17 23:49:54,032 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3580720.0, ans=0.125 2024-08-17 23:49:59,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3580720.0, ans=0.125 2024-08-17 23:49:59,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3580720.0, ans=0.125 2024-08-17 23:50:04,336 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 450, loss[loss=0.105, beats_loss=0.009805, ecapa_loss=0.0001393, whisper_loss=0.09385, over 18272.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01029, ecapa_loss=0.0001473, whisper_loss=0.08877, over 3347546.55 frames. ], batch size: 67, lr: 2.47e-03, grad_scale: 1.152921504606847e+18 2024-08-17 23:50:23,320 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.243e+01 2.550e+01 2.884e+01 5.686e+01, threshold=5.100e+01, percent-clipped=1.0 2024-08-17 23:50:26,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3580920.0, ans=0.1 2024-08-17 23:50:28,835 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-17 23:50:33,538 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 16 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-17 23:50:46,003 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 27 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-17 23:50:56,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3581120.0, ans=0.1 2024-08-17 23:50:57,514 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 10 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-17 23:51:07,348 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-17 23:51:11,934 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 500, loss[loss=0.108, beats_loss=0.006271, ecapa_loss=0.0001535, whisper_loss=0.1002, over 15378.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01033, ecapa_loss=0.000146, whisper_loss=0.08901, over 3469284.11 frames. ], batch size: 58, lr: 2.47e-03, grad_scale: 1.152921504606847e+18 2024-08-17 23:51:14,221 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.33 vs. limit=15.0 2024-08-17 23:51:17,458 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-17 23:51:24,543 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3581420.0, ans=0.125 2024-08-17 23:51:25,379 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 28 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-17 23:51:28,384 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 23 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-17 23:51:35,081 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3581420.0, ans=0.125 2024-08-17 23:51:36,132 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-17 23:51:39,014 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 21 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-17 23:51:57,317 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2024-08-17 23:52:06,812 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3581720.0, ans=0.0 2024-08-17 23:52:19,255 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 550, loss[loss=0.1156, beats_loss=0.007187, ecapa_loss=0.0001838, whisper_loss=0.1066, over 18671.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01039, ecapa_loss=0.0001461, whisper_loss=0.08901, over 3552020.57 frames. ], batch size: 69, lr: 2.47e-03, grad_scale: 1.152921504606847e+18 2024-08-17 23:52:19,400 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-17 23:52:38,327 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.745e+01 2.286e+01 2.510e+01 2.772e+01 4.019e+01, threshold=5.020e+01, percent-clipped=0.0 2024-08-17 23:52:44,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3581920.0, ans=0.0 2024-08-17 23:52:54,763 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=3582020.0, ans=0.025 2024-08-17 23:53:03,815 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-17 23:53:08,771 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3582120.0, ans=0.05 2024-08-17 23:53:22,748 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3582220.0, ans=0.0 2024-08-17 23:53:23,709 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 35 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-17 23:53:26,330 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 24 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-17 23:53:27,375 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 600, loss[loss=0.09729, beats_loss=0.009552, ecapa_loss=0.0001521, whisper_loss=0.08622, over 18546.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01038, ecapa_loss=0.0001457, whisper_loss=0.08894, over 3587650.11 frames. ], batch size: 74, lr: 2.47e-03, grad_scale: 1.152921504606847e+18 2024-08-17 23:53:29,100 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 18 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-17 23:53:43,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3582420.0, ans=0.1 2024-08-17 23:53:44,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3582420.0, ans=0.0 2024-08-17 23:53:45,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3582420.0, ans=0.125 2024-08-17 23:53:53,924 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3582520.0, ans=0.0 2024-08-17 23:54:17,489 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 24 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-17 23:54:35,277 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 650, loss[loss=0.1187, beats_loss=0.008012, ecapa_loss=0.0001694, whisper_loss=0.109, over 22693.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01023, ecapa_loss=0.0001461, whisper_loss=0.09021, over 3644496.89 frames. ], batch size: 90, lr: 2.47e-03, grad_scale: 1.152921504606847e+18 2024-08-17 23:54:39,679 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 29 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-17 23:54:43,956 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.03 vs. limit=6.0 2024-08-17 23:54:47,458 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3582920.0, ans=0.0 2024-08-17 23:54:53,625 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.263e+01 2.531e+01 2.852e+01 5.403e+01, threshold=5.063e+01, percent-clipped=1.0 2024-08-17 23:54:57,543 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=16.37 vs. limit=15.0 2024-08-17 23:54:58,396 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3582920.0, ans=0.1 2024-08-17 23:55:02,192 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3583020.0, ans=0.125 2024-08-17 23:55:25,388 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3583120.0, ans=0.125 2024-08-17 23:55:34,758 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3583220.0, ans=0.125 2024-08-17 23:55:42,597 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 700, loss[loss=0.1012, beats_loss=0.01096, ecapa_loss=0.0001372, whisper_loss=0.08888, over 14510.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0103, ecapa_loss=0.0001472, whisper_loss=0.09036, over 3678943.29 frames. ], batch size: 57, lr: 2.47e-03, grad_scale: 1.152921504606847e+18 2024-08-17 23:55:51,078 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3583320.0, ans=0.125 2024-08-17 23:55:54,989 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3583420.0, ans=0.1 2024-08-17 23:55:56,485 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3583420.0, ans=0.0 2024-08-17 23:56:08,396 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 28 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-17 23:56:08,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3583520.0, ans=0.0 2024-08-17 23:56:18,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3583520.0, ans=0.125 2024-08-17 23:56:30,869 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3583620.0, ans=0.0 2024-08-17 23:56:34,760 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3583620.0, ans=0.2 2024-08-17 23:56:50,389 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 750, loss[loss=0.07766, beats_loss=0.01345, ecapa_loss=0.000132, whisper_loss=0.0629, over 20195.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01033, ecapa_loss=0.0001451, whisper_loss=0.0907, over 3737270.57 frames. ], batch size: 83, lr: 2.47e-03, grad_scale: 1.152921504606847e+18 2024-08-17 23:56:50,906 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3583820.0, ans=0.125 2024-08-17 23:57:09,142 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-17 23:57:10,154 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.300e+01 2.481e+01 2.765e+01 4.539e+01, threshold=4.963e+01, percent-clipped=0.0 2024-08-17 23:57:16,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3584020.0, ans=0.0 2024-08-17 23:57:26,203 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3584020.0, ans=0.125 2024-08-17 23:57:42,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3584120.0, ans=0.0 2024-08-17 23:57:43,879 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3584220.0, ans=0.2 2024-08-17 23:57:44,863 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 17 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-17 23:57:58,496 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 800, loss[loss=0.08543, beats_loss=0.01227, ecapa_loss=0.0001301, whisper_loss=0.07186, over 16431.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01035, ecapa_loss=0.0001447, whisper_loss=0.09029, over 3752095.92 frames. ], batch size: 64, lr: 2.47e-03, grad_scale: 1.152921504606847e+18 2024-08-17 23:58:01,891 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 18 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-17 23:58:02,731 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.18 vs. limit=15.0 2024-08-17 23:58:08,627 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 17 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-17 23:58:12,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3584420.0, ans=0.125 2024-08-17 23:58:15,046 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3584420.0, ans=0.125 2024-08-17 23:58:15,116 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3584420.0, ans=0.125 2024-08-17 23:58:32,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3584520.0, ans=0.125 2024-08-17 23:58:59,470 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 16 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-17 23:59:04,945 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 850, loss[loss=0.1102, beats_loss=0.01088, ecapa_loss=0.0001418, whisper_loss=0.09791, over 24078.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01035, ecapa_loss=0.0001453, whisper_loss=0.08971, over 3719066.15 frames. ], batch size: 93, lr: 2.47e-03, grad_scale: 1.152921504606847e+18 2024-08-17 23:59:13,676 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 24 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-17 23:59:15,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3584820.0, ans=0.0 2024-08-17 23:59:17,198 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.64 vs. limit=8.0 2024-08-17 23:59:24,147 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.705e+01 2.267e+01 2.454e+01 2.766e+01 5.930e+01, threshold=4.908e+01, percent-clipped=1.0 2024-08-17 23:59:24,449 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 24 from LS+wenet, 9 from Vox, 25 fro AS 2024-08-17 23:59:27,665 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3584920.0, ans=0.125 2024-08-17 23:59:31,301 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3585020.0, ans=0.2 2024-08-17 23:59:41,224 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3585020.0, ans=0.1 2024-08-17 23:59:42,399 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3585020.0, ans=0.125 2024-08-17 23:59:42,663 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=15.86 vs. limit=15.0 2024-08-18 00:00:00,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3585220.0, ans=0.125 2024-08-18 00:00:13,189 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 900, loss[loss=0.1099, beats_loss=0.01155, ecapa_loss=0.0001255, whisper_loss=0.09714, over 23186.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01038, ecapa_loss=0.0001445, whisper_loss=0.09031, over 3754443.13 frames. ], batch size: 92, lr: 2.47e-03, grad_scale: 1.152921504606847e+18 2024-08-18 00:00:24,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3585320.0, ans=0.035 2024-08-18 00:00:31,319 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 26 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-18 00:00:40,522 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.77 vs. limit=15.0 2024-08-18 00:00:41,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3585520.0, ans=0.0 2024-08-18 00:00:43,984 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3585520.0, ans=0.05 2024-08-18 00:00:47,241 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.90 vs. limit=15.0 2024-08-18 00:01:19,704 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3585820.0, ans=0.125 2024-08-18 00:01:19,714 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3585820.0, ans=0.125 2024-08-18 00:01:20,411 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 950, loss[loss=0.1101, beats_loss=0.01137, ecapa_loss=9.588e-05, whisper_loss=0.09775, over 22357.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01049, ecapa_loss=0.0001436, whisper_loss=0.08934, over 3753470.18 frames. ], batch size: 82, lr: 2.47e-03, grad_scale: 1.152921504606847e+18 2024-08-18 00:01:22,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3585820.0, ans=0.125 2024-08-18 00:01:27,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3585820.0, ans=0.125 2024-08-18 00:01:41,291 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.321e+01 2.549e+01 2.773e+01 6.184e+01, threshold=5.098e+01, percent-clipped=1.0 2024-08-18 00:01:52,323 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3586020.0, ans=0.1 2024-08-18 00:02:22,250 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3586220.0, ans=0.125 2024-08-18 00:02:27,636 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.76 vs. limit=22.5 2024-08-18 00:02:28,153 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 1000, loss[loss=0.08204, beats_loss=0.01403, ecapa_loss=9.938e-05, whisper_loss=0.06701, over 21236.00 frames. ], tot_loss[loss=0.101, beats_loss=0.0105, ecapa_loss=0.0001432, whisper_loss=0.08909, over 3764975.68 frames. ], batch size: 84, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:02:31,522 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3586320.0, ans=0.1 2024-08-18 00:02:37,392 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 21 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-18 00:02:37,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3586320.0, ans=0.125 2024-08-18 00:02:38,631 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 16 from Vox, 49 fro AS 2024-08-18 00:02:57,051 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3586520.0, ans=0.1 2024-08-18 00:03:00,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3586520.0, ans=0.1 2024-08-18 00:03:02,202 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3586520.0, ans=0.125 2024-08-18 00:03:09,069 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3586620.0, ans=0.0 2024-08-18 00:03:36,344 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 1050, loss[loss=0.1011, beats_loss=0.009753, ecapa_loss=0.0001175, whisper_loss=0.09016, over 22444.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01042, ecapa_loss=0.0001432, whisper_loss=0.08943, over 3780806.15 frames. ], batch size: 84, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:03:57,046 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.918e+01 2.345e+01 2.573e+01 2.782e+01 6.018e+01, threshold=5.145e+01, percent-clipped=1.0 2024-08-18 00:04:01,170 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3586920.0, ans=0.125 2024-08-18 00:04:18,064 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3587120.0, ans=0.125 2024-08-18 00:04:18,360 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.69 vs. limit=15.0 2024-08-18 00:04:24,378 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-18 00:04:32,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3587220.0, ans=0.125 2024-08-18 00:04:40,960 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3587220.0, ans=0.125 2024-08-18 00:04:43,203 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 1100, loss[loss=0.09942, beats_loss=0.009684, ecapa_loss=0.0001687, whisper_loss=0.08805, over 21298.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01045, ecapa_loss=0.0001431, whisper_loss=0.08997, over 3783114.54 frames. ], batch size: 88, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:04:46,256 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3587320.0, ans=0.125 2024-08-18 00:04:54,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3587320.0, ans=0.0 2024-08-18 00:05:26,335 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3587620.0, ans=0.125 2024-08-18 00:05:45,799 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3587720.0, ans=0.125 2024-08-18 00:05:46,832 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 22 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-18 00:05:51,947 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 1150, loss[loss=0.07131, beats_loss=0.008958, ecapa_loss=0.0001806, whisper_loss=0.06055, over 14022.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01047, ecapa_loss=0.0001428, whisper_loss=0.09011, over 3820683.09 frames. ], batch size: 57, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:05:57,774 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 31 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-18 00:06:09,922 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3587920.0, ans=0.0 2024-08-18 00:06:12,109 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.421e+01 2.640e+01 3.029e+01 4.672e+01, threshold=5.280e+01, percent-clipped=0.0 2024-08-18 00:06:12,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3587920.0, ans=0.0 2024-08-18 00:06:47,406 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-18 00:06:49,183 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.09 vs. limit=22.5 2024-08-18 00:07:00,618 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 1200, loss[loss=0.09127, beats_loss=0.009486, ecapa_loss=0.0001392, whisper_loss=0.08039, over 15049.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0105, ecapa_loss=0.0001428, whisper_loss=0.0898, over 3794589.73 frames. ], batch size: 58, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:07:02,482 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3588320.0, ans=0.0 2024-08-18 00:07:06,161 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-18 00:07:06,455 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3588320.0, ans=0.1 2024-08-18 00:07:25,994 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.24 vs. limit=22.5 2024-08-18 00:07:36,445 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 17 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-18 00:07:38,319 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.44 vs. limit=22.5 2024-08-18 00:07:52,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3588620.0, ans=0.125 2024-08-18 00:07:59,039 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3588720.0, ans=0.0 2024-08-18 00:08:08,907 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 20 from LS+wenet, 30 from Vox, 42 fro AS 2024-08-18 00:08:10,064 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 1250, loss[loss=0.07696, beats_loss=0.0121, ecapa_loss=0.000154, whisper_loss=0.06332, over 21806.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01058, ecapa_loss=0.0001429, whisper_loss=0.08858, over 3801516.62 frames. ], batch size: 92, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:08:17,183 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3588820.0, ans=0.0 2024-08-18 00:08:17,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3588820.0, ans=0.125 2024-08-18 00:08:23,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3588920.0, ans=0.0 2024-08-18 00:08:30,299 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.695e+01 2.265e+01 2.496e+01 2.765e+01 1.417e+02, threshold=4.991e+01, percent-clipped=1.0 2024-08-18 00:09:00,414 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3589120.0, ans=0.125 2024-08-18 00:09:12,839 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 21 from LS+wenet, 26 from Vox, 46 fro AS 2024-08-18 00:09:17,043 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 1300, loss[loss=0.06857, beats_loss=0.01476, ecapa_loss=0.0001182, whisper_loss=0.05263, over 14264.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.0106, ecapa_loss=0.0001428, whisper_loss=0.08804, over 3804708.31 frames. ], batch size: 57, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:09:17,203 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 24 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-18 00:09:32,150 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3589420.0, ans=0.1 2024-08-18 00:09:34,243 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 15 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-18 00:09:43,041 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 35 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-18 00:09:47,573 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3589520.0, ans=0.125 2024-08-18 00:09:47,963 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.13 vs. limit=6.0 2024-08-18 00:10:08,174 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-18 00:10:29,996 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=3589820.0, ans=0.5 2024-08-18 00:10:30,872 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 1350, loss[loss=0.0948, beats_loss=0.01187, ecapa_loss=0.000155, whisper_loss=0.08138, over 22147.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01051, ecapa_loss=0.0001434, whisper_loss=0.08818, over 3838541.65 frames. ], batch size: 93, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:10:50,433 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.937e+01 2.253e+01 2.540e+01 2.866e+01 1.653e+02, threshold=5.079e+01, percent-clipped=1.0 2024-08-18 00:11:05,033 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3590020.0, ans=0.0 2024-08-18 00:11:40,128 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 1400, loss[loss=0.07383, beats_loss=0.01193, ecapa_loss=0.0001094, whisper_loss=0.0608, over 17091.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01051, ecapa_loss=0.0001438, whisper_loss=0.08894, over 3841023.08 frames. ], batch size: 66, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:11:47,439 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3590320.0, ans=0.0 2024-08-18 00:11:49,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3590320.0, ans=0.125 2024-08-18 00:11:51,655 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-18 00:11:57,309 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 31 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-18 00:12:14,802 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-18 00:12:20,658 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.11 vs. limit=15.0 2024-08-18 00:12:23,264 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3590620.0, ans=0.125 2024-08-18 00:12:25,420 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.609e-02 2024-08-18 00:12:44,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3590720.0, ans=0.95 2024-08-18 00:12:45,817 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 24 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-18 00:12:51,446 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 1450, loss[loss=0.1145, beats_loss=0.009333, ecapa_loss=0.0001386, whisper_loss=0.1038, over 16811.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01056, ecapa_loss=0.000143, whisper_loss=0.08858, over 3842083.99 frames. ], batch size: 64, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:13:16,334 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 16 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-18 00:13:20,555 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.41 vs. limit=15.0 2024-08-18 00:13:23,795 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.734e+01 2.250e+01 2.508e+01 2.687e+01 4.238e+01, threshold=5.016e+01, percent-clipped=0.0 2024-08-18 00:13:34,503 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3591020.0, ans=0.0 2024-08-18 00:13:38,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3591020.0, ans=0.0 2024-08-18 00:13:40,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3591020.0, ans=0.2 2024-08-18 00:13:44,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3591020.0, ans=0.1 2024-08-18 00:14:00,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3591120.0, ans=0.2 2024-08-18 00:14:05,150 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 17 from LS+wenet, 11 from Vox, 36 fro AS 2024-08-18 00:14:35,488 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 1500, loss[loss=0.1011, beats_loss=0.01103, ecapa_loss=0.0001192, whisper_loss=0.08892, over 23896.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01061, ecapa_loss=0.0001428, whisper_loss=0.08805, over 3820816.16 frames. ], batch size: 93, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:14:48,881 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 18 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-18 00:14:53,537 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-18 00:14:57,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3591420.0, ans=0.0 2024-08-18 00:15:23,185 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 00:15:42,741 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3591620.0, ans=0.125 2024-08-18 00:16:10,067 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 32 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-18 00:16:19,906 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3591720.0, ans=0.125 2024-08-18 00:16:27,731 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 1550, loss[loss=0.08428, beats_loss=0.009981, ecapa_loss=0.0001526, whisper_loss=0.07277, over 14216.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01056, ecapa_loss=0.0001424, whisper_loss=0.08851, over 3827075.72 frames. ], batch size: 54, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:16:54,310 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 14 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-18 00:17:02,555 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.260e+01 2.619e+01 2.888e+01 4.592e+01, threshold=5.239e+01, percent-clipped=0.0 2024-08-18 00:17:04,038 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.36 vs. limit=22.5 2024-08-18 00:17:39,234 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3592120.0, ans=0.125 2024-08-18 00:18:06,565 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 28 from LS+wenet, 8 from Vox, 20 fro AS 2024-08-18 00:18:16,540 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 1600, loss[loss=0.0784, beats_loss=0.01257, ecapa_loss=0.0001261, whisper_loss=0.06457, over 13120.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01057, ecapa_loss=0.0001414, whisper_loss=0.08821, over 3838303.02 frames. ], batch size: 54, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:18:25,662 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 21 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-18 00:18:41,969 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3592420.0, ans=0.0 2024-08-18 00:18:47,980 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3592420.0, ans=0.125 2024-08-18 00:19:10,504 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3592620.0, ans=0.0 2024-08-18 00:19:13,399 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.08 vs. limit=15.0 2024-08-18 00:19:17,222 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.69 vs. limit=10.0 2024-08-18 00:19:38,403 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 1650, loss[loss=0.1008, beats_loss=0.00958, ecapa_loss=0.000156, whisper_loss=0.08963, over 14995.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01053, ecapa_loss=0.0001403, whisper_loss=0.08942, over 3862670.94 frames. ], batch size: 59, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:19:59,577 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.229e+01 2.477e+01 2.865e+01 9.039e+01, threshold=4.953e+01, percent-clipped=1.0 2024-08-18 00:20:19,456 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 15 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-18 00:20:37,287 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 27 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-18 00:20:40,187 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3593220.0, ans=0.0 2024-08-18 00:20:45,582 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3593220.0, ans=0.125 2024-08-18 00:20:47,731 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 1700, loss[loss=0.1112, beats_loss=0.01038, ecapa_loss=0.0001259, whisper_loss=0.09958, over 20015.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01039, ecapa_loss=0.0001406, whisper_loss=0.09061, over 3871755.87 frames. ], batch size: 77, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:20:52,689 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3593320.0, ans=0.1 2024-08-18 00:20:56,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3593320.0, ans=0.125 2024-08-18 00:21:03,049 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3593420.0, ans=0.0 2024-08-18 00:21:04,140 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 32 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-18 00:21:12,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3593420.0, ans=0.1 2024-08-18 00:21:17,690 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 19 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-18 00:21:18,811 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 19 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-18 00:21:34,997 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3593620.0, ans=0.125 2024-08-18 00:21:50,150 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3593720.0, ans=0.0 2024-08-18 00:21:54,628 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 1750, loss[loss=0.09391, beats_loss=0.008679, ecapa_loss=0.0001799, whisper_loss=0.08343, over 14294.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01034, ecapa_loss=0.000143, whisper_loss=0.09147, over 3885412.55 frames. ], batch size: 57, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:22:14,028 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 17 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-18 00:22:15,134 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.504e+01 2.766e+01 3.067e+01 1.559e+02, threshold=5.531e+01, percent-clipped=2.0 2024-08-18 00:22:15,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3593920.0, ans=0.0 2024-08-18 00:22:19,920 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3593920.0, ans=0.035 2024-08-18 00:22:22,941 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=20.64 vs. limit=22.5 2024-08-18 00:22:33,643 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 21 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-18 00:22:45,014 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3594120.0, ans=0.125 2024-08-18 00:22:47,451 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-18 00:23:02,373 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 1800, loss[loss=0.1069, beats_loss=0.009847, ecapa_loss=0.0001202, whisper_loss=0.09583, over 17279.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01033, ecapa_loss=0.0001433, whisper_loss=0.0913, over 3886156.18 frames. ], batch size: 65, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:23:15,756 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.23 vs. limit=12.0 2024-08-18 00:23:23,184 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3594420.0, ans=0.2 2024-08-18 00:23:30,090 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.10 vs. limit=15.0 2024-08-18 00:23:38,472 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 14 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-18 00:24:05,917 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-18 00:24:09,353 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 1850, loss[loss=0.1053, beats_loss=0.01011, ecapa_loss=0.0001659, whisper_loss=0.0935, over 19964.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01026, ecapa_loss=0.0001447, whisper_loss=0.09141, over 3850025.28 frames. ], batch size: 83, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:24:19,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3594820.0, ans=0.0 2024-08-18 00:24:29,360 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.939e+01 2.383e+01 2.607e+01 2.993e+01 6.397e+01, threshold=5.213e+01, percent-clipped=1.0 2024-08-18 00:24:33,538 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 27 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-18 00:24:43,298 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 30 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-18 00:24:43,497 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3595020.0, ans=0.125 2024-08-18 00:25:04,699 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3595220.0, ans=0.125 2024-08-18 00:25:17,661 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 1900, loss[loss=0.1011, beats_loss=0.01, ecapa_loss=0.0001146, whisper_loss=0.08992, over 14412.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01033, ecapa_loss=0.0001448, whisper_loss=0.0908, over 3834081.76 frames. ], batch size: 54, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:25:21,482 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 17 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-18 00:25:45,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3595520.0, ans=0.0 2024-08-18 00:25:47,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3595520.0, ans=0.0 2024-08-18 00:25:47,625 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.46 vs. limit=6.0 2024-08-18 00:26:01,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3595620.0, ans=0.125 2024-08-18 00:26:02,127 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.07 vs. limit=15.0 2024-08-18 00:26:10,798 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3595720.0, ans=0.1 2024-08-18 00:26:23,768 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 1950, loss[loss=0.1203, beats_loss=0.01082, ecapa_loss=0.0001571, whisper_loss=0.1079, over 22334.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01038, ecapa_loss=0.0001454, whisper_loss=0.09034, over 3819469.98 frames. ], batch size: 90, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:26:24,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3595820.0, ans=0.0 2024-08-18 00:26:28,166 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-18 00:26:35,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3595820.0, ans=0.0 2024-08-18 00:26:35,350 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.21 vs. limit=15.0 2024-08-18 00:26:44,849 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.310e+01 2.471e+01 2.786e+01 3.161e+02, threshold=4.942e+01, percent-clipped=2.0 2024-08-18 00:26:58,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3596020.0, ans=0.1 2024-08-18 00:27:03,774 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3596120.0, ans=0.0 2024-08-18 00:27:06,324 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-18 00:27:08,193 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.49 vs. limit=10.0 2024-08-18 00:27:14,221 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-18 00:27:29,300 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 32 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-18 00:27:32,030 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 2000, loss[loss=0.09957, beats_loss=0.01107, ecapa_loss=9.876e-05, whisper_loss=0.08752, over 19938.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01033, ecapa_loss=0.0001441, whisper_loss=0.09119, over 3825677.05 frames. ], batch size: 77, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:27:45,692 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 21 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-18 00:27:53,477 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.02 vs. limit=10.0 2024-08-18 00:27:57,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3596420.0, ans=10.0 2024-08-18 00:28:07,132 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3596520.0, ans=0.1 2024-08-18 00:28:30,534 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3596720.0, ans=0.05 2024-08-18 00:28:40,591 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 2050, loss[loss=0.128, beats_loss=0.009868, ecapa_loss=0.0001438, whisper_loss=0.1166, over 16024.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01041, ecapa_loss=0.0001435, whisper_loss=0.09075, over 3803034.21 frames. ], batch size: 60, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:28:45,021 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 22 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-18 00:28:45,264 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3596820.0, ans=0.125 2024-08-18 00:28:45,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=3596820.0, ans=15.0 2024-08-18 00:29:00,280 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.895e+01 2.408e+01 2.657e+01 2.912e+01 3.324e+02, threshold=5.315e+01, percent-clipped=5.0 2024-08-18 00:29:17,375 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-18 00:29:44,957 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3597220.0, ans=0.125 2024-08-18 00:29:46,025 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3597320.0, ans=0.125 2024-08-18 00:29:46,863 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 2100, loss[loss=0.09264, beats_loss=0.01115, ecapa_loss=0.0001604, whisper_loss=0.07988, over 21200.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01059, ecapa_loss=0.0001422, whisper_loss=0.08924, over 3796310.43 frames. ], batch size: 90, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:29:52,065 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 21 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-18 00:29:57,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3597320.0, ans=0.125 2024-08-18 00:30:14,533 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3597520.0, ans=10.0 2024-08-18 00:30:32,561 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.28 vs. limit=15.0 2024-08-18 00:30:34,087 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=18.85 vs. limit=22.5 2024-08-18 00:30:49,029 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3597720.0, ans=0.0 2024-08-18 00:30:50,988 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 2150, loss[loss=0.1003, beats_loss=0.01276, ecapa_loss=0.0001208, whisper_loss=0.08634, over 23471.00 frames. ], tot_loss[loss=0.101, beats_loss=0.0106, ecapa_loss=0.0001417, whisper_loss=0.08901, over 3747052.66 frames. ], batch size: 92, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:30:57,485 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3597820.0, ans=0.125 2024-08-18 00:31:04,759 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3597920.0, ans=0.125 2024-08-18 00:31:09,705 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.640e+01 2.312e+01 2.569e+01 2.886e+01 3.776e+01, threshold=5.138e+01, percent-clipped=0.0 2024-08-18 00:31:21,273 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 23 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-18 00:31:40,050 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 29 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-18 00:31:53,920 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 2200, loss[loss=0.1066, beats_loss=0.01065, ecapa_loss=0.0001544, whisper_loss=0.09443, over 17054.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01057, ecapa_loss=0.0001417, whisper_loss=0.08978, over 3741300.48 frames. ], batch size: 68, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:32:03,909 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-18 00:32:12,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3598420.0, ans=0.125 2024-08-18 00:32:32,221 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.46 vs. limit=15.0 2024-08-18 00:32:35,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3598620.0, ans=0.125 2024-08-18 00:32:53,201 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3598720.0, ans=0.125 2024-08-18 00:32:53,282 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3598720.0, ans=0.05 2024-08-18 00:32:56,459 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 2250, loss[loss=0.09323, beats_loss=0.01314, ecapa_loss=0.0001667, whisper_loss=0.07842, over 21254.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01067, ecapa_loss=0.0001418, whisper_loss=0.08929, over 3772611.56 frames. ], batch size: 91, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:33:14,784 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.985e+01 2.384e+01 2.636e+01 2.985e+01 4.342e+01, threshold=5.271e+01, percent-clipped=0.0 2024-08-18 00:33:31,167 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-18 00:33:35,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3599120.0, ans=0.0 2024-08-18 00:33:35,871 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-18 00:33:36,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3599120.0, ans=0.1 2024-08-18 00:33:50,034 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.99 vs. limit=15.0 2024-08-18 00:33:56,036 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-18 00:33:58,315 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 2300, loss[loss=0.1074, beats_loss=0.00942, ecapa_loss=0.0001574, whisper_loss=0.09639, over 17383.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01067, ecapa_loss=0.0001428, whisper_loss=0.08948, over 3819269.55 frames. ], batch size: 68, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:34:01,465 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 17 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-18 00:34:05,154 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-18 00:34:10,446 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3599420.0, ans=0.0 2024-08-18 00:34:11,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3599420.0, ans=0.125 2024-08-18 00:34:20,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3599420.0, ans=0.1 2024-08-18 00:34:22,214 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.34 vs. limit=15.0 2024-08-18 00:34:24,023 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-18 00:34:45,870 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3599620.0, ans=0.2 2024-08-18 00:34:49,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3599720.0, ans=0.1 2024-08-18 00:34:54,405 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-18 00:34:54,741 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3599720.0, ans=0.1 2024-08-18 00:34:55,580 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 18 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-18 00:35:02,202 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 2350, loss[loss=0.09955, beats_loss=0.0102, ecapa_loss=0.0001624, whisper_loss=0.08773, over 22411.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01059, ecapa_loss=0.0001437, whisper_loss=0.09059, over 3837249.57 frames. ], batch size: 91, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:35:13,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3599920.0, ans=0.125 2024-08-18 00:35:21,239 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.661e+01 2.275e+01 2.503e+01 2.910e+01 3.749e+01, threshold=5.007e+01, percent-clipped=0.0 2024-08-18 00:35:21,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3599920.0, ans=0.125 2024-08-18 00:35:40,367 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-18 00:35:43,942 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.42 vs. limit=6.0 2024-08-18 00:35:51,883 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 27 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-18 00:35:57,153 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3600220.0, ans=0.2 2024-08-18 00:36:07,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3600320.0, ans=0.125 2024-08-18 00:36:07,928 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 2400, loss[loss=0.1056, beats_loss=0.0122, ecapa_loss=9.476e-05, whisper_loss=0.09247, over 18959.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01054, ecapa_loss=0.0001449, whisper_loss=0.09051, over 3856721.08 frames. ], batch size: 71, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:36:24,633 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-18 00:36:44,316 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 18 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-18 00:36:45,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3600620.0, ans=0.125 2024-08-18 00:36:49,727 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3600620.0, ans=0.0 2024-08-18 00:36:50,689 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 17 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-18 00:36:55,589 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3600620.0, ans=0.2 2024-08-18 00:36:55,631 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3600620.0, ans=0.2 2024-08-18 00:37:10,452 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 2450, loss[loss=0.0945, beats_loss=0.01011, ecapa_loss=0.0001667, whisper_loss=0.08271, over 19670.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01052, ecapa_loss=0.0001437, whisper_loss=0.09042, over 3886704.68 frames. ], batch size: 81, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:37:21,912 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 28 from Vox, 25 fro AS 2024-08-18 00:37:23,589 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=3600920.0, ans=10.0 2024-08-18 00:37:28,846 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.721e+01 2.316e+01 2.565e+01 2.962e+01 6.294e+01, threshold=5.130e+01, percent-clipped=1.0 2024-08-18 00:37:35,419 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 13 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-18 00:37:36,965 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 25 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-18 00:37:44,620 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 20 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-18 00:37:54,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3601120.0, ans=0.125 2024-08-18 00:38:07,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3601220.0, ans=0.1 2024-08-18 00:38:13,059 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 2500, loss[loss=0.1007, beats_loss=0.01048, ecapa_loss=0.0001408, whisper_loss=0.08885, over 22500.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01049, ecapa_loss=0.0001443, whisper_loss=0.09061, over 3867730.31 frames. ], batch size: 91, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:38:13,519 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3601320.0, ans=0.2 2024-08-18 00:38:23,939 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.79 vs. limit=15.0 2024-08-18 00:38:27,090 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 19 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-18 00:38:29,717 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3601420.0, ans=0.0 2024-08-18 00:38:38,243 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 23 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-18 00:38:40,997 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3601520.0, ans=0.0 2024-08-18 00:38:48,989 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.14 vs. limit=22.5 2024-08-18 00:38:52,452 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.76 vs. limit=15.0 2024-08-18 00:38:54,597 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-18 00:39:04,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3601720.0, ans=0.125 2024-08-18 00:39:08,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3601720.0, ans=0.125 2024-08-18 00:39:11,660 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 21 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-18 00:39:13,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3601720.0, ans=0.125 2024-08-18 00:39:15,430 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 2550, loss[loss=0.09752, beats_loss=0.01154, ecapa_loss=0.0001376, whisper_loss=0.0846, over 22352.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01046, ecapa_loss=0.0001436, whisper_loss=0.09067, over 3868499.37 frames. ], batch size: 90, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:39:15,549 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-18 00:39:22,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3601820.0, ans=0.125 2024-08-18 00:39:24,538 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3601820.0, ans=0.1 2024-08-18 00:39:28,530 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.17 vs. limit=15.0 2024-08-18 00:39:33,100 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-18 00:39:34,104 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.759e+01 2.261e+01 2.549e+01 2.807e+01 3.668e+01, threshold=5.099e+01, percent-clipped=0.0 2024-08-18 00:39:36,692 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-18 00:40:02,884 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.24 vs. limit=10.0 2024-08-18 00:40:07,195 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 22 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-18 00:40:17,467 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3602320.0, ans=0.125 2024-08-18 00:40:18,411 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 2600, loss[loss=0.129, beats_loss=0.009417, ecapa_loss=0.0001642, whisper_loss=0.1179, over 19942.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0105, ecapa_loss=0.0001441, whisper_loss=0.09097, over 3885012.98 frames. ], batch size: 80, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:40:33,788 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 18 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-18 00:40:43,581 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 41 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-18 00:40:44,083 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.47 vs. limit=6.0 2024-08-18 00:40:52,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3602520.0, ans=0.125 2024-08-18 00:40:58,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3602620.0, ans=0.125 2024-08-18 00:41:05,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3602620.0, ans=0.125 2024-08-18 00:41:16,447 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3602720.0, ans=0.2 2024-08-18 00:41:21,183 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 2650, loss[loss=0.1042, beats_loss=0.009752, ecapa_loss=0.0001414, whisper_loss=0.09306, over 21253.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01053, ecapa_loss=0.0001442, whisper_loss=0.08994, over 3898956.72 frames. ], batch size: 84, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:41:39,777 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.278e+01 2.614e+01 3.095e+01 1.399e+02, threshold=5.228e+01, percent-clipped=3.0 2024-08-18 00:41:57,805 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3603120.0, ans=0.125 2024-08-18 00:42:01,572 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3603120.0, ans=0.0 2024-08-18 00:42:18,516 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-18 00:42:21,955 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.68 vs. limit=22.5 2024-08-18 00:42:22,708 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3603320.0, ans=0.125 2024-08-18 00:42:23,493 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 2700, loss[loss=0.1046, beats_loss=0.01053, ecapa_loss=0.0001332, whisper_loss=0.09277, over 18753.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01048, ecapa_loss=0.0001454, whisper_loss=0.09047, over 3902065.34 frames. ], batch size: 73, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:42:24,625 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-08-18 00:42:36,612 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3603420.0, ans=0.07 2024-08-18 00:42:47,365 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.06 vs. limit=22.5 2024-08-18 00:42:52,810 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 36 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-18 00:42:59,756 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.50 vs. limit=15.0 2024-08-18 00:43:03,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3603620.0, ans=0.125 2024-08-18 00:43:26,133 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 2750, loss[loss=0.1061, beats_loss=0.01099, ecapa_loss=0.0001307, whisper_loss=0.09382, over 24214.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0105, ecapa_loss=0.0001447, whisper_loss=0.09055, over 3903201.73 frames. ], batch size: 95, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:43:31,359 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3603820.0, ans=0.125 2024-08-18 00:43:44,882 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.850e+01 2.315e+01 2.507e+01 2.815e+01 4.002e+01, threshold=5.013e+01, percent-clipped=0.0 2024-08-18 00:43:45,290 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=3603920.0, ans=0.5 2024-08-18 00:43:55,918 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.36 vs. limit=22.5 2024-08-18 00:43:58,935 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-18 00:44:11,638 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-18 00:44:11,913 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3604120.0, ans=0.125 2024-08-18 00:44:14,755 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3604120.0, ans=0.0 2024-08-18 00:44:26,378 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-18 00:44:27,921 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3604220.0, ans=0.0 2024-08-18 00:44:29,959 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 2800, loss[loss=0.0824, beats_loss=0.01139, ecapa_loss=0.0001254, whisper_loss=0.06976, over 19404.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01056, ecapa_loss=0.0001441, whisper_loss=0.09012, over 3881419.96 frames. ], batch size: 77, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:44:43,550 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 26 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-18 00:44:52,177 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3604420.0, ans=0.2 2024-08-18 00:44:53,474 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3604420.0, ans=0.09899494936611666 2024-08-18 00:45:23,899 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3604720.0, ans=0.0 2024-08-18 00:45:32,838 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3604720.0, ans=0.1 2024-08-18 00:45:34,909 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 2850, loss[loss=0.09873, beats_loss=0.01035, ecapa_loss=0.0001384, whisper_loss=0.08699, over 22333.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01056, ecapa_loss=0.0001448, whisper_loss=0.09001, over 3855214.59 frames. ], batch size: 89, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:45:49,675 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3604920.0, ans=0.0 2024-08-18 00:45:53,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3604920.0, ans=0.2 2024-08-18 00:45:54,727 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.370e+01 2.583e+01 2.901e+01 2.742e+02, threshold=5.166e+01, percent-clipped=3.0 2024-08-18 00:45:59,540 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.86 vs. limit=22.5 2024-08-18 00:46:02,013 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3605020.0, ans=0.1 2024-08-18 00:46:06,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3605020.0, ans=0.95 2024-08-18 00:46:22,034 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.32 vs. limit=6.0 2024-08-18 00:46:28,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3605220.0, ans=0.125 2024-08-18 00:46:29,160 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3605220.0, ans=0.125 2024-08-18 00:46:36,735 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-18 00:46:37,986 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-18 00:46:40,324 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 2900, loss[loss=0.0936, beats_loss=0.009746, ecapa_loss=0.0001547, whisper_loss=0.0823, over 15601.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01062, ecapa_loss=0.0001447, whisper_loss=0.08949, over 3871099.77 frames. ], batch size: 60, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:46:42,194 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3605320.0, ans=0.0 2024-08-18 00:46:43,173 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 21 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-18 00:46:48,524 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3605320.0, ans=0.1 2024-08-18 00:46:57,672 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3605420.0, ans=0.04949747468305833 2024-08-18 00:47:08,170 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3605520.0, ans=0.125 2024-08-18 00:47:19,962 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3605620.0, ans=0.2 2024-08-18 00:47:21,964 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.66 vs. limit=15.0 2024-08-18 00:47:28,303 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.55 vs. limit=15.0 2024-08-18 00:47:28,980 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 17 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-18 00:47:31,479 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-18 00:47:33,304 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=25.03 vs. limit=22.5 2024-08-18 00:47:42,368 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2024-08-18 00:47:45,555 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 2950, loss[loss=0.1135, beats_loss=0.009779, ecapa_loss=0.0001662, whisper_loss=0.1021, over 18924.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01065, ecapa_loss=0.0001452, whisper_loss=0.08964, over 3882049.57 frames. ], batch size: 77, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:47:45,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3605820.0, ans=0.1 2024-08-18 00:48:04,354 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.015e+01 2.406e+01 2.598e+01 2.862e+01 3.819e+01, threshold=5.197e+01, percent-clipped=0.0 2024-08-18 00:48:14,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3606020.0, ans=0.125 2024-08-18 00:48:23,022 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3606120.0, ans=0.1 2024-08-18 00:48:32,839 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-18 00:48:38,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3606220.0, ans=0.0 2024-08-18 00:48:41,746 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-18 00:48:47,607 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 3000, loss[loss=0.1109, beats_loss=0.01139, ecapa_loss=0.000133, whisper_loss=0.0982, over 23508.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01064, ecapa_loss=0.000145, whisper_loss=0.08975, over 3914205.61 frames. ], batch size: 94, lr: 2.46e-03, grad_scale: 1.152921504606847e+18 2024-08-18 00:48:47,608 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-18 00:49:07,724 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.9170, 1.7129, 1.7192, 1.4410], device='cuda:2') 2024-08-18 00:49:20,061 INFO [train_multi_KD3.py:1149] (2/4) Epoch 25, validation on ASR_libri: loss=0.2529, beats_loss=0, ecapa_loss=0.0005235, whisper_loss=0.2477, over 922467.00 frames. 2024-08-18 00:49:35,436 INFO [train_multi_KD3.py:1149] (2/4) Epoch 25, validation on SV_voxceleb1: loss=0.004164, beats_loss=0, ecapa_loss=0.0004164, whisper_loss=0, over 939242.00 frames. 2024-08-18 00:50:11,031 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.4172, 2.7512, 2.4378, 2.3296], device='cuda:2') 2024-08-18 00:51:10,612 INFO [train_multi_KD3.py:1149] (2/4) Epoch 25, validation on AT_audioset: loss=0.02327, beats_loss=0.02327, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 00:51:10,616 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31584MB 2024-08-18 00:51:11,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3606320.0, ans=0.1 2024-08-18 00:51:13,407 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 00:51:13,538 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3606320.0, ans=0.2 2024-08-18 00:51:15,682 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 27 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-18 00:51:15,990 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3606320.0, ans=0.1 2024-08-18 00:51:18,462 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3606320.0, ans=10.0 2024-08-18 00:51:29,599 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-18 00:51:31,378 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3606420.0, ans=0.0 2024-08-18 00:51:37,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3606520.0, ans=0.125 2024-08-18 00:51:39,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3606520.0, ans=0.125 2024-08-18 00:51:47,472 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3606620.0, ans=0.015 2024-08-18 00:52:03,136 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3606720.0, ans=0.125 2024-08-18 00:52:05,882 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3606720.0, ans=0.125 2024-08-18 00:52:14,895 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 3050, loss[loss=0.1288, beats_loss=0.007392, ecapa_loss=0.0001812, whisper_loss=0.1196, over 20181.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0106, ecapa_loss=0.0001461, whisper_loss=0.09076, over 3924839.99 frames. ], batch size: 83, lr: 2.46e-03, grad_scale: 1.152921504606847e+18 2024-08-18 00:52:21,057 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-18 00:52:25,430 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3606820.0, ans=0.125 2024-08-18 00:52:32,958 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3606920.0, ans=0.0 2024-08-18 00:52:33,628 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.357e+01 2.578e+01 2.867e+01 5.974e+01, threshold=5.156e+01, percent-clipped=1.0 2024-08-18 00:52:43,973 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3607020.0, ans=0.1 2024-08-18 00:52:46,326 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3607020.0, ans=0.0 2024-08-18 00:53:01,110 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 24 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-18 00:53:18,089 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 3100, loss[loss=0.08486, beats_loss=0.01044, ecapa_loss=0.0001329, whisper_loss=0.07309, over 14405.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01056, ecapa_loss=0.0001455, whisper_loss=0.0914, over 3923268.16 frames. ], batch size: 55, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:53:23,581 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3607320.0, ans=0.2 2024-08-18 00:53:31,926 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-18 00:53:43,373 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-18 00:53:46,966 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3607520.0, ans=0.0 2024-08-18 00:53:54,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=3607620.0, ans=0.5 2024-08-18 00:53:55,860 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3607620.0, ans=0.1 2024-08-18 00:54:02,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3607620.0, ans=0.0 2024-08-18 00:54:20,768 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 3150, loss[loss=0.1079, beats_loss=0.008991, ecapa_loss=0.0001534, whisper_loss=0.09733, over 18296.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01056, ecapa_loss=0.0001457, whisper_loss=0.09157, over 3898304.91 frames. ], batch size: 71, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:54:31,561 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.094e-01 2024-08-18 00:54:41,091 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.301e+01 2.605e+01 2.918e+01 7.177e+01, threshold=5.210e+01, percent-clipped=2.0 2024-08-18 00:54:42,723 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 20 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-18 00:54:48,583 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.54 vs. limit=12.0 2024-08-18 00:54:59,389 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 26 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-18 00:55:18,712 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3608220.0, ans=0.0 2024-08-18 00:55:21,721 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.89 vs. limit=15.0 2024-08-18 00:55:24,618 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 3200, loss[loss=0.1094, beats_loss=0.009841, ecapa_loss=0.0001823, whisper_loss=0.09772, over 19811.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01059, ecapa_loss=0.0001455, whisper_loss=0.09136, over 3877525.88 frames. ], batch size: 84, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:55:28,791 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 29 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-18 00:55:38,728 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 26 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-18 00:55:39,920 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 24 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-18 00:55:40,058 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3608420.0, ans=0.125 2024-08-18 00:55:40,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3608420.0, ans=0.125 2024-08-18 00:55:46,961 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3608420.0, ans=0.95 2024-08-18 00:55:50,487 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 16 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-18 00:55:50,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3608520.0, ans=0.125 2024-08-18 00:55:54,480 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3608520.0, ans=0.125 2024-08-18 00:56:07,227 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3608620.0, ans=0.125 2024-08-18 00:56:20,906 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-18 00:56:22,410 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3608720.0, ans=0.125 2024-08-18 00:56:28,448 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 3250, loss[loss=0.1126, beats_loss=0.01004, ecapa_loss=0.0001573, whisper_loss=0.101, over 22643.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01057, ecapa_loss=0.0001464, whisper_loss=0.09167, over 3859458.14 frames. ], batch size: 91, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:56:33,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3608820.0, ans=0.125 2024-08-18 00:56:46,247 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3608920.0, ans=0.125 2024-08-18 00:56:48,231 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.222e+01 2.522e+01 2.821e+01 3.582e+01, threshold=5.044e+01, percent-clipped=0.0 2024-08-18 00:56:53,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3609020.0, ans=0.07 2024-08-18 00:56:54,563 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 19 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-18 00:57:21,659 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3609220.0, ans=0.1 2024-08-18 00:57:26,827 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3609220.0, ans=0.125 2024-08-18 00:57:29,734 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.87 vs. limit=15.0 2024-08-18 00:57:31,059 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 3300, loss[loss=0.1035, beats_loss=0.0122, ecapa_loss=0.0001343, whisper_loss=0.09001, over 22040.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01059, ecapa_loss=0.0001457, whisper_loss=0.09129, over 3868972.41 frames. ], batch size: 89, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:57:35,068 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 00:57:46,499 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 00:57:49,931 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3609420.0, ans=0.125 2024-08-18 00:57:53,350 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 32 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-18 00:58:20,289 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3609720.0, ans=0.125 2024-08-18 00:58:32,554 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3609820.0, ans=0.2 2024-08-18 00:58:33,276 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 3350, loss[loss=0.1012, beats_loss=0.01423, ecapa_loss=9.923e-05, whisper_loss=0.08593, over 14645.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01055, ecapa_loss=0.0001458, whisper_loss=0.09122, over 3861224.37 frames. ], batch size: 57, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:58:45,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3609920.0, ans=0.125 2024-08-18 00:58:53,216 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.316e+01 2.495e+01 2.843e+01 4.202e+01, threshold=4.990e+01, percent-clipped=0.0 2024-08-18 00:59:05,440 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-18 00:59:09,251 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3610020.0, ans=0.125 2024-08-18 00:59:22,375 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3610120.0, ans=0.125 2024-08-18 00:59:28,092 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 14 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-18 00:59:36,520 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 3400, loss[loss=0.1038, beats_loss=0.01202, ecapa_loss=0.000115, whisper_loss=0.09062, over 15456.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01058, ecapa_loss=0.0001448, whisper_loss=0.09101, over 3882790.30 frames. ], batch size: 61, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:00:07,163 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3610520.0, ans=0.125 2024-08-18 01:00:10,114 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.00 vs. limit=12.0 2024-08-18 01:00:10,889 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3610520.0, ans=0.0 2024-08-18 01:00:20,325 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.17 vs. limit=15.0 2024-08-18 01:00:39,936 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 3450, loss[loss=0.0931, beats_loss=0.008861, ecapa_loss=0.0001624, whisper_loss=0.08261, over 14934.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01058, ecapa_loss=0.0001445, whisper_loss=0.0905, over 3881833.99 frames. ], batch size: 57, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:00:43,988 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 31 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-18 01:00:48,241 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.04 vs. limit=15.0 2024-08-18 01:00:53,089 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=15.0 2024-08-18 01:00:54,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3610920.0, ans=0.0 2024-08-18 01:00:54,330 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3610920.0, ans=0.1 2024-08-18 01:00:55,373 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 19 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-18 01:01:00,145 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.883e+01 2.316e+01 2.519e+01 2.832e+01 7.230e+01, threshold=5.039e+01, percent-clipped=2.0 2024-08-18 01:01:03,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3610920.0, ans=0.125 2024-08-18 01:01:43,285 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 3500, loss[loss=0.1051, beats_loss=0.01104, ecapa_loss=0.0001572, whisper_loss=0.09246, over 21417.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01056, ecapa_loss=0.000146, whisper_loss=0.09011, over 3878520.34 frames. ], batch size: 89, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:01:44,131 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 01:02:13,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3611520.0, ans=0.0 2024-08-18 01:02:41,556 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 21 from LS+wenet, 23 from Vox, 18 fro AS 2024-08-18 01:02:47,592 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.37 vs. limit=15.0 2024-08-18 01:02:48,013 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 3550, loss[loss=0.09678, beats_loss=0.0114, ecapa_loss=0.0001015, whisper_loss=0.08437, over 22141.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01057, ecapa_loss=0.0001457, whisper_loss=0.09023, over 3895997.15 frames. ], batch size: 84, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:03:00,460 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3611920.0, ans=0.125 2024-08-18 01:03:02,089 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.07 vs. limit=15.0 2024-08-18 01:03:04,449 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3611920.0, ans=0.05 2024-08-18 01:03:08,573 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3611920.0, ans=0.125 2024-08-18 01:03:09,428 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.733e+01 2.315e+01 2.541e+01 2.759e+01 3.730e+01, threshold=5.082e+01, percent-clipped=0.0 2024-08-18 01:03:12,724 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3611920.0, ans=0.0 2024-08-18 01:03:22,541 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 27 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-18 01:03:26,582 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3612120.0, ans=0.0 2024-08-18 01:03:35,810 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-18 01:03:46,563 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 20 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-18 01:03:54,333 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 3600, loss[loss=0.09653, beats_loss=0.01082, ecapa_loss=0.0001517, whisper_loss=0.08419, over 22276.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01045, ecapa_loss=0.0001463, whisper_loss=0.09068, over 3856616.11 frames. ], batch size: 92, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:03:58,097 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.50 vs. limit=5.0 2024-08-18 01:04:17,903 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3612420.0, ans=0.09899494936611666 2024-08-18 01:04:20,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3612520.0, ans=0.125 2024-08-18 01:04:24,321 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3612520.0, ans=0.125 2024-08-18 01:04:35,596 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2024-08-18 01:04:36,279 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-18 01:04:40,982 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.31 vs. limit=10.0 2024-08-18 01:04:52,222 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=15.57 vs. limit=15.0 2024-08-18 01:04:53,324 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-18 01:05:00,018 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 24 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-18 01:05:02,470 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 3650, loss[loss=0.1227, beats_loss=0.009249, ecapa_loss=0.0001296, whisper_loss=0.1121, over 23916.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01041, ecapa_loss=0.0001459, whisper_loss=0.0912, over 3862032.38 frames. ], batch size: 91, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:05:24,126 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.387e+01 2.671e+01 3.011e+01 4.673e+01, threshold=5.342e+01, percent-clipped=0.0 2024-08-18 01:05:25,571 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-18 01:05:40,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3613020.0, ans=0.125 2024-08-18 01:05:46,604 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 21 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-18 01:05:55,145 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3613120.0, ans=0.0 2024-08-18 01:06:00,141 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-18 01:06:10,507 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 3700, loss[loss=0.1161, beats_loss=0.01185, ecapa_loss=0.0001699, whisper_loss=0.1026, over 22386.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01035, ecapa_loss=0.0001464, whisper_loss=0.09159, over 3851309.98 frames. ], batch size: 91, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:06:17,542 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-18 01:06:35,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3613420.0, ans=0.125 2024-08-18 01:06:43,320 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 16 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-18 01:06:54,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3613620.0, ans=0.2 2024-08-18 01:07:03,621 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.22 vs. limit=22.5 2024-08-18 01:07:12,216 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-18 01:07:12,493 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3613720.0, ans=0.125 2024-08-18 01:07:12,722 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.17 vs. limit=15.0 2024-08-18 01:07:18,716 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 3750, loss[loss=0.09549, beats_loss=0.00994, ecapa_loss=0.0001446, whisper_loss=0.0841, over 18070.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01045, ecapa_loss=0.0001471, whisper_loss=0.09137, over 3877317.82 frames. ], batch size: 71, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:07:23,174 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3613820.0, ans=0.125 2024-08-18 01:07:29,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3613820.0, ans=0.0 2024-08-18 01:07:30,904 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-18 01:07:33,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3613920.0, ans=0.0 2024-08-18 01:07:40,269 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.244e+01 2.423e+01 2.765e+01 4.056e+01, threshold=4.845e+01, percent-clipped=0.0 2024-08-18 01:07:50,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3614020.0, ans=0.125 2024-08-18 01:07:51,224 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-18 01:07:55,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3614020.0, ans=0.125 2024-08-18 01:08:06,817 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3614120.0, ans=0.125 2024-08-18 01:08:09,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3614120.0, ans=0.1 2024-08-18 01:08:16,768 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3614220.0, ans=0.1 2024-08-18 01:08:22,282 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3614220.0, ans=0.2 2024-08-18 01:08:24,412 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 29 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-18 01:08:27,581 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 3800, loss[loss=0.09947, beats_loss=0.01078, ecapa_loss=0.0001162, whisper_loss=0.08753, over 20624.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01052, ecapa_loss=0.0001477, whisper_loss=0.09084, over 3895391.19 frames. ], batch size: 81, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:08:31,372 WARNING [optim.py:496] (2/4) Scaling gradients by 0.09450822323560715, model_norm_threshold=48.454036712646484 2024-08-18 01:08:31,542 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.766e+04, grad_sumsq=3.766e+04, orig_rms_sq=1.000e+00 2024-08-18 01:08:45,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3614420.0, ans=0.0 2024-08-18 01:08:46,191 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 23 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-18 01:08:52,262 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3614420.0, ans=0.1 2024-08-18 01:08:54,958 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3614520.0, ans=0.0 2024-08-18 01:09:06,188 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3614520.0, ans=0.5 2024-08-18 01:09:10,955 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-18 01:09:26,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3614720.0, ans=0.07 2024-08-18 01:09:28,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3614720.0, ans=0.2 2024-08-18 01:09:32,587 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.88 vs. limit=15.0 2024-08-18 01:09:36,999 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 3850, loss[loss=0.1087, beats_loss=0.009624, ecapa_loss=0.000157, whisper_loss=0.0975, over 22071.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01056, ecapa_loss=0.0001482, whisper_loss=0.0906, over 3878715.92 frames. ], batch size: 91, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:09:37,688 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3614820.0, ans=0.125 2024-08-18 01:09:41,825 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-18 01:09:53,638 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-18 01:09:59,336 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.260e+01 2.585e+01 2.853e+01 5.127e+02, threshold=5.170e+01, percent-clipped=2.0 2024-08-18 01:10:11,352 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-18 01:10:12,564 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.48 vs. limit=22.5 2024-08-18 01:10:15,448 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 01:10:16,344 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-18 01:10:18,862 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 37 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-18 01:10:46,185 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 3900, loss[loss=0.09445, beats_loss=0.01203, ecapa_loss=0.000184, whisper_loss=0.08057, over 16605.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01059, ecapa_loss=0.0001479, whisper_loss=0.0909, over 3898653.68 frames. ], batch size: 72, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:10:46,347 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-18 01:10:52,771 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.45 vs. limit=15.0 2024-08-18 01:11:13,929 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3615520.0, ans=0.0 2024-08-18 01:11:53,106 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3615720.0, ans=0.0 2024-08-18 01:11:55,244 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 3950, loss[loss=0.1171, beats_loss=0.009457, ecapa_loss=0.0001616, whisper_loss=0.1061, over 22609.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01058, ecapa_loss=0.0001478, whisper_loss=0.09109, over 3913018.44 frames. ], batch size: 90, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:12:17,819 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.347e+01 2.555e+01 2.976e+01 4.736e+01, threshold=5.111e+01, percent-clipped=0.0 2024-08-18 01:12:25,385 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3616020.0, ans=0.1 2024-08-18 01:12:55,083 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3616220.0, ans=0.125 2024-08-18 01:13:04,525 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 4000, loss[loss=0.09969, beats_loss=0.0114, ecapa_loss=0.0001116, whisper_loss=0.08717, over 17303.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01047, ecapa_loss=0.0001484, whisper_loss=0.09158, over 3921166.54 frames. ], batch size: 64, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:13:08,972 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3616320.0, ans=0.2 2024-08-18 01:13:10,150 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3616320.0, ans=0.125 2024-08-18 01:13:23,160 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3616420.0, ans=0.125 2024-08-18 01:13:29,081 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-18 01:13:56,527 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3616620.0, ans=0.125 2024-08-18 01:13:58,741 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-18 01:14:01,680 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 18 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-18 01:14:13,793 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 4050, loss[loss=0.08016, beats_loss=0.01417, ecapa_loss=0.0001169, whisper_loss=0.06482, over 21554.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01051, ecapa_loss=0.0001493, whisper_loss=0.09106, over 3935051.55 frames. ], batch size: 88, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:14:14,175 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3616820.0, ans=0.125 2024-08-18 01:14:16,480 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-18 01:14:29,708 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.48 vs. limit=12.0 2024-08-18 01:14:36,059 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.283e+01 2.581e+01 2.870e+01 1.380e+02, threshold=5.162e+01, percent-clipped=1.0 2024-08-18 01:14:37,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3616920.0, ans=0.2 2024-08-18 01:14:47,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3617020.0, ans=0.2 2024-08-18 01:14:47,187 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3617020.0, ans=0.125 2024-08-18 01:14:47,463 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.95 vs. limit=15.0 2024-08-18 01:14:56,453 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 24 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-18 01:15:04,813 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-18 01:15:13,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3617220.0, ans=0.0 2024-08-18 01:15:23,100 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 4100, loss[loss=0.104, beats_loss=0.01071, ecapa_loss=0.000143, whisper_loss=0.09189, over 14281.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01049, ecapa_loss=0.0001485, whisper_loss=0.09147, over 3933915.53 frames. ], batch size: 56, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:15:23,988 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.06 vs. limit=15.0 2024-08-18 01:15:27,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3617320.0, ans=0.1 2024-08-18 01:15:34,214 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3617320.0, ans=0.2 2024-08-18 01:15:36,800 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-18 01:15:43,821 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3617420.0, ans=0.0 2024-08-18 01:15:43,852 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3617420.0, ans=0.1 2024-08-18 01:15:47,920 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2024-08-18 01:15:55,309 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-18 01:15:55,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3617520.0, ans=0.125 2024-08-18 01:16:06,539 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.66 vs. limit=22.5 2024-08-18 01:16:25,354 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3617720.0, ans=0.125 2024-08-18 01:16:30,624 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 4150, loss[loss=0.1088, beats_loss=0.01171, ecapa_loss=0.0001695, whisper_loss=0.09539, over 21558.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01048, ecapa_loss=0.000149, whisper_loss=0.09144, over 3939095.60 frames. ], batch size: 91, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:16:34,399 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-18 01:16:42,509 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-18 01:16:45,152 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-18 01:16:51,737 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.642e+01 2.314e+01 2.519e+01 2.790e+01 4.414e+01, threshold=5.038e+01, percent-clipped=0.0 2024-08-18 01:17:14,297 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.86 vs. limit=22.5 2024-08-18 01:17:35,172 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.26 vs. limit=10.0 2024-08-18 01:17:37,121 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 4200, loss[loss=0.08982, beats_loss=0.01125, ecapa_loss=0.0001707, whisper_loss=0.07686, over 18010.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01047, ecapa_loss=0.0001481, whisper_loss=0.09159, over 3910365.94 frames. ], batch size: 70, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:17:44,012 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3618320.0, ans=0.125 2024-08-18 01:17:48,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3618320.0, ans=0.125 2024-08-18 01:17:51,941 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 17 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-18 01:18:05,091 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 29 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-18 01:18:12,477 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3618520.0, ans=0.1 2024-08-18 01:18:23,656 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-18 01:18:33,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3618720.0, ans=0.0 2024-08-18 01:18:36,218 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.35 vs. limit=22.5 2024-08-18 01:18:38,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3618720.0, ans=0.2 2024-08-18 01:18:42,600 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 4250, loss[loss=0.07872, beats_loss=0.01048, ecapa_loss=0.0001403, whisper_loss=0.06683, over 22957.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01047, ecapa_loss=0.0001487, whisper_loss=0.09146, over 3910618.38 frames. ], batch size: 94, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:18:42,931 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3618820.0, ans=0.125 2024-08-18 01:18:58,667 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3618920.0, ans=0.125 2024-08-18 01:18:59,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3618920.0, ans=0.125 2024-08-18 01:19:02,364 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 18 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-18 01:19:03,388 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.262e+01 2.529e+01 2.791e+01 4.848e+01, threshold=5.057e+01, percent-clipped=0.0 2024-08-18 01:19:04,998 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3618920.0, ans=0.125 2024-08-18 01:19:29,346 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 34 from LS+wenet, 29 from Vox, 21 fro AS 2024-08-18 01:19:36,101 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3619220.0, ans=0.125 2024-08-18 01:19:39,922 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 27 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-18 01:19:44,775 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 16 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-18 01:19:48,223 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 4300, loss[loss=0.1097, beats_loss=0.009228, ecapa_loss=0.0001555, whisper_loss=0.09887, over 19976.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01048, ecapa_loss=0.0001478, whisper_loss=0.0907, over 3871190.01 frames. ], batch size: 80, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:19:50,978 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 23 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-18 01:19:56,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3619320.0, ans=0.125 2024-08-18 01:19:57,429 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-18 01:20:11,532 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-18 01:20:28,495 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3619620.0, ans=0.125 2024-08-18 01:20:31,950 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 23 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-18 01:20:33,077 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-18 01:20:39,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3619720.0, ans=0.0 2024-08-18 01:20:42,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3619720.0, ans=0.2 2024-08-18 01:20:45,569 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-18 01:20:49,499 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-18 01:20:52,152 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 4350, loss[loss=0.109, beats_loss=0.009407, ecapa_loss=0.0001255, whisper_loss=0.09829, over 19604.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01046, ecapa_loss=0.000149, whisper_loss=0.09066, over 3885103.96 frames. ], batch size: 76, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:21:12,854 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.260e+01 2.529e+01 2.654e+01 4.063e+01, threshold=5.059e+01, percent-clipped=0.0 2024-08-18 01:21:22,635 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.72 vs. limit=15.0 2024-08-18 01:21:33,323 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.46 vs. limit=10.0 2024-08-18 01:21:39,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3620120.0, ans=0.2 2024-08-18 01:21:54,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3620220.0, ans=0.07 2024-08-18 01:21:54,479 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.46 vs. limit=10.0 2024-08-18 01:21:57,553 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 4400, loss[loss=0.09167, beats_loss=0.008792, ecapa_loss=0.0001538, whisper_loss=0.08134, over 15272.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01054, ecapa_loss=0.0001479, whisper_loss=0.09022, over 3883001.79 frames. ], batch size: 59, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:22:10,959 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3620420.0, ans=0.1 2024-08-18 01:22:12,748 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3620420.0, ans=0.1 2024-08-18 01:22:16,939 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3620420.0, ans=0.125 2024-08-18 01:22:45,016 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 21 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-18 01:22:50,626 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3620620.0, ans=0.0 2024-08-18 01:23:08,308 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 4450, loss[loss=0.1107, beats_loss=0.0105, ecapa_loss=0.0001284, whisper_loss=0.09887, over 22558.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01058, ecapa_loss=0.0001474, whisper_loss=0.09, over 3899644.20 frames. ], batch size: 87, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:23:17,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3620820.0, ans=0.125 2024-08-18 01:23:22,490 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3620920.0, ans=0.125 2024-08-18 01:23:27,483 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 33 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-18 01:23:28,930 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-18 01:23:31,461 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.820e+01 2.280e+01 2.529e+01 2.933e+01 4.954e+01, threshold=5.058e+01, percent-clipped=0.0 2024-08-18 01:23:35,172 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.10 vs. limit=15.0 2024-08-18 01:23:48,619 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 18 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-18 01:23:57,185 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-18 01:24:12,477 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3621220.0, ans=0.0 2024-08-18 01:24:16,556 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-18 01:24:20,855 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 4500, loss[loss=0.1185, beats_loss=0.009771, ecapa_loss=0.0001267, whisper_loss=0.1075, over 14084.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01056, ecapa_loss=0.0001476, whisper_loss=0.09092, over 3944465.45 frames. ], batch size: 53, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:24:31,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3621320.0, ans=0.125 2024-08-18 01:24:33,343 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3621320.0, ans=0.125 2024-08-18 01:24:36,835 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 9 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-18 01:24:38,932 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-18 01:24:41,673 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3621420.0, ans=0.0 2024-08-18 01:24:43,705 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 30 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-18 01:24:47,557 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-18 01:25:01,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3621520.0, ans=0.125 2024-08-18 01:25:07,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3621620.0, ans=0.04949747468305833 2024-08-18 01:25:24,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3621720.0, ans=0.0 2024-08-18 01:25:36,859 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 4550, loss[loss=0.1005, beats_loss=0.01095, ecapa_loss=0.0001774, whisper_loss=0.08782, over 20899.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01044, ecapa_loss=0.0001477, whisper_loss=0.09167, over 3958969.05 frames. ], batch size: 88, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:25:47,080 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.44 vs. limit=22.5 2024-08-18 01:26:02,121 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.248e+01 2.517e+01 2.867e+01 6.444e+01, threshold=5.035e+01, percent-clipped=1.0 2024-08-18 01:26:05,403 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 20 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-18 01:26:20,346 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3622020.0, ans=0.1 2024-08-18 01:26:31,889 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3622120.0, ans=0.125 2024-08-18 01:27:11,268 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 4600, loss[loss=0.1103, beats_loss=0.0102, ecapa_loss=0.0001422, whisper_loss=0.0987, over 15623.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01042, ecapa_loss=0.0001467, whisper_loss=0.09184, over 3942759.08 frames. ], batch size: 61, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:27:29,146 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.62 vs. limit=15.0 2024-08-18 01:27:38,848 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3622420.0, ans=0.125 2024-08-18 01:27:57,903 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 18 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-18 01:28:08,611 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3622520.0, ans=0.2 2024-08-18 01:28:18,363 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3622620.0, ans=0.125 2024-08-18 01:28:28,926 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.04 vs. limit=10.0 2024-08-18 01:28:42,534 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 4650, loss[loss=0.1123, beats_loss=0.01072, ecapa_loss=0.0001327, whisper_loss=0.1002, over 19421.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01043, ecapa_loss=0.0001474, whisper_loss=0.09127, over 3930982.35 frames. ], batch size: 77, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:28:43,064 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3622820.0, ans=0.2 2024-08-18 01:28:47,730 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-18 01:28:48,013 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3622820.0, ans=0.125 2024-08-18 01:29:05,784 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.266e+01 2.454e+01 2.690e+01 4.860e+01, threshold=4.909e+01, percent-clipped=0.0 2024-08-18 01:29:08,247 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.08 vs. limit=15.0 2024-08-18 01:29:16,515 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3623020.0, ans=0.95 2024-08-18 01:29:32,579 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=15.0 2024-08-18 01:29:42,676 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 23 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-18 01:29:47,317 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.46 vs. limit=15.0 2024-08-18 01:29:49,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3623220.0, ans=0.125 2024-08-18 01:29:49,624 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3623220.0, ans=0.2 2024-08-18 01:29:55,231 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 4700, loss[loss=0.1026, beats_loss=0.01089, ecapa_loss=0.0001402, whisper_loss=0.09027, over 22695.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01045, ecapa_loss=0.0001465, whisper_loss=0.0911, over 3897966.69 frames. ], batch size: 94, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:30:01,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3623320.0, ans=0.125 2024-08-18 01:30:32,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3623520.0, ans=0.1 2024-08-18 01:30:51,498 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3623720.0, ans=0.2 2024-08-18 01:31:08,295 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 4750, loss[loss=0.1167, beats_loss=0.007734, ecapa_loss=0.0001466, whisper_loss=0.1075, over 15109.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0105, ecapa_loss=0.0001467, whisper_loss=0.09089, over 3897389.93 frames. ], batch size: 57, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:31:17,344 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3623820.0, ans=0.125 2024-08-18 01:31:35,828 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.329e+01 2.503e+01 2.951e+01 3.896e+01, threshold=5.007e+01, percent-clipped=0.0 2024-08-18 01:31:53,666 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3624020.0, ans=0.125 2024-08-18 01:32:16,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3624120.0, ans=0.125 2024-08-18 01:32:19,921 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-18 01:32:37,361 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 4800, loss[loss=0.07981, beats_loss=0.01463, ecapa_loss=0.0001382, whisper_loss=0.0638, over 21883.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01061, ecapa_loss=0.0001466, whisper_loss=0.09056, over 3935395.57 frames. ], batch size: 94, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:32:38,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3624320.0, ans=0.2 2024-08-18 01:32:58,339 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3624420.0, ans=0.125 2024-08-18 01:33:01,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=3624420.0, ans=0.05 2024-08-18 01:33:19,176 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.40 vs. limit=15.0 2024-08-18 01:34:10,451 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 4850, loss[loss=0.1173, beats_loss=0.01097, ecapa_loss=0.000118, whisper_loss=0.1051, over 16872.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01056, ecapa_loss=0.0001478, whisper_loss=0.09109, over 3931458.17 frames. ], batch size: 65, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:34:25,935 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3624920.0, ans=0.125 2024-08-18 01:34:31,559 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3624920.0, ans=0.125 2024-08-18 01:34:34,382 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.805e+01 2.307e+01 2.618e+01 2.911e+01 3.524e+01, threshold=5.235e+01, percent-clipped=0.0 2024-08-18 01:34:51,965 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-18 01:34:53,587 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3625120.0, ans=0.0 2024-08-18 01:34:56,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3625120.0, ans=0.2 2024-08-18 01:34:58,890 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 34 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-18 01:35:24,641 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 4900, loss[loss=0.1118, beats_loss=0.009517, ecapa_loss=0.0001321, whisper_loss=0.101, over 23872.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01051, ecapa_loss=0.0001474, whisper_loss=0.09096, over 3909360.44 frames. ], batch size: 92, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:35:29,704 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 19 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-18 01:35:41,250 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 22 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-18 01:35:51,159 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 27 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-18 01:35:58,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3625520.0, ans=0.125 2024-08-18 01:35:59,940 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3625520.0, ans=0.2 2024-08-18 01:36:00,147 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.38 vs. limit=12.0 2024-08-18 01:36:08,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3625520.0, ans=0.125 2024-08-18 01:36:17,535 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.51 vs. limit=15.0 2024-08-18 01:36:27,144 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 01:36:51,395 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 4950, loss[loss=0.106, beats_loss=0.01085, ecapa_loss=0.0001805, whisper_loss=0.09337, over 21286.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01056, ecapa_loss=0.0001463, whisper_loss=0.09018, over 3880032.22 frames. ], batch size: 90, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:36:55,336 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 19 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-18 01:37:01,064 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3625820.0, ans=0.125 2024-08-18 01:37:20,311 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.328e+01 2.566e+01 3.021e+01 2.036e+02, threshold=5.131e+01, percent-clipped=1.0 2024-08-18 01:37:31,692 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3626020.0, ans=0.0 2024-08-18 01:37:38,352 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-18 01:37:50,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3626120.0, ans=0.0 2024-08-18 01:37:51,892 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3626120.0, ans=0.2 2024-08-18 01:37:51,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3626120.0, ans=0.0 2024-08-18 01:37:55,232 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-18 01:37:57,474 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3626120.0, ans=0.0 2024-08-18 01:38:08,685 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-18 01:38:12,807 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3626220.0, ans=0.0 2024-08-18 01:38:24,920 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 5000, loss[loss=0.1077, beats_loss=0.008975, ecapa_loss=0.0001711, whisper_loss=0.09697, over 17145.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01056, ecapa_loss=0.0001456, whisper_loss=0.09054, over 3887440.26 frames. ], batch size: 68, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:38:28,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3626320.0, ans=0.125 2024-08-18 01:38:29,379 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-18 01:38:33,377 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 23 from LS+wenet, 22 from Vox, 50 fro AS 2024-08-18 01:38:34,957 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3626320.0, ans=0.125 2024-08-18 01:38:42,410 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3626420.0, ans=0.5 2024-08-18 01:38:48,496 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.34 vs. limit=15.0 2024-08-18 01:38:53,251 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-18 01:39:05,793 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3626520.0, ans=0.1 2024-08-18 01:39:08,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3626520.0, ans=0.125 2024-08-18 01:39:13,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3626620.0, ans=0.125 2024-08-18 01:39:22,522 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 29 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-18 01:39:24,138 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3626720.0, ans=0.2 2024-08-18 01:39:25,675 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3626720.0, ans=0.04949747468305833 2024-08-18 01:39:38,862 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 5050, loss[loss=0.1028, beats_loss=0.01104, ecapa_loss=0.0001278, whisper_loss=0.09044, over 22965.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01063, ecapa_loss=0.0001449, whisper_loss=0.09046, over 3894257.21 frames. ], batch size: 90, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:39:52,410 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3626920.0, ans=0.1 2024-08-18 01:40:01,603 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.612e+01 2.360e+01 2.516e+01 2.885e+01 7.843e+01, threshold=5.031e+01, percent-clipped=2.0 2024-08-18 01:40:02,762 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.19 vs. limit=15.0 2024-08-18 01:40:32,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3627120.0, ans=0.125 2024-08-18 01:40:42,704 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.80 vs. limit=15.0 2024-08-18 01:40:54,607 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 5100, loss[loss=0.09673, beats_loss=0.008512, ecapa_loss=0.0001428, whisper_loss=0.08679, over 16095.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0107, ecapa_loss=0.0001446, whisper_loss=0.09077, over 3897588.18 frames. ], batch size: 62, lr: 2.46e-03, grad_scale: 1.152921504606847e+18 2024-08-18 01:41:19,697 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 33 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-18 01:41:23,226 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3627420.0, ans=0.125 2024-08-18 01:41:24,791 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-18 01:41:30,118 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 14 from Vox, 43 fro AS 2024-08-18 01:41:55,225 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3627620.0, ans=0.1 2024-08-18 01:42:08,903 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3627720.0, ans=0.0 2024-08-18 01:42:20,123 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 5150, loss[loss=0.1074, beats_loss=0.009501, ecapa_loss=0.0001929, whisper_loss=0.09594, over 13461.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01072, ecapa_loss=0.0001433, whisper_loss=0.09115, over 3885660.71 frames. ], batch size: 57, lr: 2.46e-03, grad_scale: 1.152921504606847e+18 2024-08-18 01:42:22,567 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3627820.0, ans=0.125 2024-08-18 01:42:36,937 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=15.0 2024-08-18 01:42:46,839 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.928e+01 2.463e+01 2.680e+01 3.103e+01 6.728e+01, threshold=5.360e+01, percent-clipped=1.0 2024-08-18 01:42:50,334 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 21 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-18 01:43:10,774 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3628120.0, ans=0.1 2024-08-18 01:43:11,810 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 23 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-18 01:43:32,058 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 5200, loss[loss=0.1163, beats_loss=0.008402, ecapa_loss=0.0001347, whisper_loss=0.1066, over 23330.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01069, ecapa_loss=0.0001436, whisper_loss=0.09083, over 3872860.81 frames. ], batch size: 88, lr: 2.46e-03, grad_scale: 1.152921504606847e+18 2024-08-18 01:43:37,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3628320.0, ans=0.1 2024-08-18 01:43:49,165 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.35 vs. limit=15.0 2024-08-18 01:43:50,413 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.90 vs. limit=10.0 2024-08-18 01:43:54,496 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.37 vs. limit=15.0 2024-08-18 01:43:58,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3628520.0, ans=0.125 2024-08-18 01:44:15,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3628620.0, ans=0.125 2024-08-18 01:44:20,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3628620.0, ans=0.125 2024-08-18 01:44:22,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3628720.0, ans=0.125 2024-08-18 01:44:29,675 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3628720.0, ans=0.125 2024-08-18 01:44:35,679 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 5250, loss[loss=0.1022, beats_loss=0.01196, ecapa_loss=0.0001454, whisper_loss=0.08874, over 16597.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01065, ecapa_loss=0.0001453, whisper_loss=0.09083, over 3865281.67 frames. ], batch size: 64, lr: 2.46e-03, grad_scale: 1.152921504606847e+18 2024-08-18 01:44:45,363 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3628820.0, ans=0.125 2024-08-18 01:44:56,617 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.248e+01 2.479e+01 2.831e+01 4.103e+01, threshold=4.957e+01, percent-clipped=0.0 2024-08-18 01:44:56,745 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 22 from LS+wenet, 35 from Vox, 35 fro AS 2024-08-18 01:45:15,308 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3629120.0, ans=0.125 2024-08-18 01:45:27,637 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-18 01:45:40,403 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 5300, loss[loss=0.1286, beats_loss=0.009289, ecapa_loss=0.0001736, whisper_loss=0.1176, over 21523.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01064, ecapa_loss=0.0001455, whisper_loss=0.09085, over 3885043.31 frames. ], batch size: 87, lr: 2.46e-03, grad_scale: 1.152921504606847e+18 2024-08-18 01:46:01,919 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 19 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-18 01:46:05,400 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.91 vs. limit=15.0 2024-08-18 01:46:16,378 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=3629520.0, ans=0.05 2024-08-18 01:46:30,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3629620.0, ans=0.0 2024-08-18 01:46:38,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3629620.0, ans=0.125 2024-08-18 01:46:48,120 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3629720.0, ans=0.125 2024-08-18 01:46:53,140 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3629820.0, ans=0.125 2024-08-18 01:46:54,268 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 5350, loss[loss=0.08959, beats_loss=0.01104, ecapa_loss=0.0001441, whisper_loss=0.07711, over 17151.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01058, ecapa_loss=0.0001459, whisper_loss=0.09061, over 3887076.13 frames. ], batch size: 68, lr: 2.46e-03, grad_scale: 1.152921504606847e+18 2024-08-18 01:46:57,002 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 27 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-18 01:47:15,416 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3629920.0, ans=0.0 2024-08-18 01:47:20,944 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.419e+01 2.661e+01 2.975e+01 6.061e+01, threshold=5.322e+01, percent-clipped=1.0 2024-08-18 01:47:27,883 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3630020.0, ans=0.125 2024-08-18 01:47:32,861 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3630020.0, ans=0.125 2024-08-18 01:47:33,375 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.99 vs. limit=15.0 2024-08-18 01:47:38,818 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3630020.0, ans=0.2 2024-08-18 01:47:52,483 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3630120.0, ans=0.125 2024-08-18 01:48:08,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3630220.0, ans=0.2 2024-08-18 01:48:19,659 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 5400, loss[loss=0.1031, beats_loss=0.01065, ecapa_loss=0.0001547, whisper_loss=0.09094, over 23545.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01059, ecapa_loss=0.0001455, whisper_loss=0.09039, over 3897930.82 frames. ], batch size: 93, lr: 2.46e-03, grad_scale: 1.152921504606847e+18 2024-08-18 01:48:32,181 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 01:48:37,461 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 37 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-18 01:48:45,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3630420.0, ans=0.125 2024-08-18 01:48:45,624 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3630420.0, ans=0.1 2024-08-18 01:48:51,033 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3630420.0, ans=0.2 2024-08-18 01:49:12,279 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3630520.0, ans=0.125 2024-08-18 01:49:39,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3630720.0, ans=0.0 2024-08-18 01:50:00,819 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 5450, loss[loss=0.1023, beats_loss=0.01195, ecapa_loss=0.0001276, whisper_loss=0.08903, over 22016.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01055, ecapa_loss=0.0001457, whisper_loss=0.09109, over 3872633.21 frames. ], batch size: 88, lr: 2.46e-03, grad_scale: 1.152921504606847e+18 2024-08-18 01:50:05,395 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2024-08-18 01:50:41,908 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.974e+01 2.356e+01 2.667e+01 2.958e+01 1.634e+02, threshold=5.334e+01, percent-clipped=1.0 2024-08-18 01:50:58,100 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 34 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-18 01:50:58,746 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3631020.0, ans=0.125 2024-08-18 01:51:08,727 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3631020.0, ans=0.125 2024-08-18 01:51:31,531 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-18 01:51:45,308 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 19 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-18 01:51:49,849 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 24 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-18 01:52:01,703 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 5500, loss[loss=0.112, beats_loss=0.01093, ecapa_loss=0.0001387, whisper_loss=0.09971, over 23515.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01053, ecapa_loss=0.0001447, whisper_loss=0.09084, over 3892868.67 frames. ], batch size: 92, lr: 2.46e-03, grad_scale: 1.152921504606847e+18 2024-08-18 01:52:12,186 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 13 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-18 01:52:26,254 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-18 01:52:30,665 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3631420.0, ans=0.125 2024-08-18 01:52:35,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3631420.0, ans=0.0 2024-08-18 01:52:38,242 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3631520.0, ans=0.0 2024-08-18 01:52:52,675 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.53 vs. limit=15.0 2024-08-18 01:53:50,401 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 20 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-18 01:53:51,781 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 5550, loss[loss=0.08521, beats_loss=0.01144, ecapa_loss=0.0001602, whisper_loss=0.07217, over 17172.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01059, ecapa_loss=0.0001456, whisper_loss=0.09056, over 3925502.29 frames. ], batch size: 72, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:54:01,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3631820.0, ans=0.125 2024-08-18 01:54:05,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3631820.0, ans=0.1 2024-08-18 01:54:20,451 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3631920.0, ans=0.0 2024-08-18 01:54:25,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3631920.0, ans=0.0 2024-08-18 01:54:33,487 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.313e+01 2.540e+01 2.843e+01 1.440e+02, threshold=5.080e+01, percent-clipped=2.0 2024-08-18 01:54:34,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3631920.0, ans=0.125 2024-08-18 01:55:00,017 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3632020.0, ans=0.1 2024-08-18 01:55:01,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3632020.0, ans=0.0 2024-08-18 01:55:52,278 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 5600, loss[loss=0.0951, beats_loss=0.01054, ecapa_loss=0.0001675, whisper_loss=0.08288, over 18646.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01061, ecapa_loss=0.0001457, whisper_loss=0.09019, over 3924582.88 frames. ], batch size: 79, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:55:54,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3632320.0, ans=0.125 2024-08-18 01:56:03,492 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3632320.0, ans=0.0 2024-08-18 01:56:41,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3632620.0, ans=0.125 2024-08-18 01:56:54,064 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.13 vs. limit=10.0 2024-08-18 01:57:06,746 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3632720.0, ans=0.125 2024-08-18 01:57:09,269 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-18 01:57:10,355 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 5650, loss[loss=0.09428, beats_loss=0.01245, ecapa_loss=0.0001506, whisper_loss=0.08033, over 21103.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01059, ecapa_loss=0.0001464, whisper_loss=0.09011, over 3913854.49 frames. ], batch size: 87, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:57:12,497 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=22.5 2024-08-18 01:57:26,480 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3632920.0, ans=0.125 2024-08-18 01:57:33,008 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 37 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-18 01:57:34,257 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 25 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-18 01:57:35,818 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.365e+01 2.651e+01 2.943e+01 2.212e+02, threshold=5.303e+01, percent-clipped=3.0 2024-08-18 01:57:42,366 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-18 01:57:47,047 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 24 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-18 01:57:48,799 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3633020.0, ans=0.025 2024-08-18 01:58:25,783 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-18 01:58:30,351 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 5700, loss[loss=0.1116, beats_loss=0.01175, ecapa_loss=0.0001553, whisper_loss=0.09828, over 14493.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01057, ecapa_loss=0.0001467, whisper_loss=0.09068, over 3916925.07 frames. ], batch size: 58, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:58:30,956 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3633320.0, ans=0.0 2024-08-18 01:58:38,664 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-18 01:58:39,215 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.66 vs. limit=15.0 2024-08-18 01:58:51,514 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-18 01:58:51,680 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3633420.0, ans=0.0 2024-08-18 01:59:02,771 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.02 vs. limit=15.0 2024-08-18 01:59:08,354 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-18 01:59:18,093 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.58 vs. limit=6.0 2024-08-18 01:59:26,451 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 33 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-18 01:59:30,406 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.893e-02 2024-08-18 01:59:33,297 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 19 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-18 01:59:36,626 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3633720.0, ans=10.0 2024-08-18 01:59:38,335 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3633720.0, ans=0.0 2024-08-18 01:59:48,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3633720.0, ans=0.2 2024-08-18 01:59:53,171 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 5750, loss[loss=0.136, beats_loss=0.008193, ecapa_loss=0.0001597, whisper_loss=0.1262, over 21703.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01053, ecapa_loss=0.0001469, whisper_loss=0.09135, over 3924112.81 frames. ], batch size: 84, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:00:19,680 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.317e+01 2.629e+01 2.885e+01 4.138e+01, threshold=5.257e+01, percent-clipped=0.0 2024-08-18 02:00:53,054 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 28 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-18 02:00:57,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=3634220.0, ans=15.0 2024-08-18 02:01:01,604 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.89 vs. limit=6.0 2024-08-18 02:01:08,535 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 5800, loss[loss=0.08324, beats_loss=0.01206, ecapa_loss=0.0001706, whisper_loss=0.06948, over 18738.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0105, ecapa_loss=0.0001472, whisper_loss=0.09124, over 3892698.80 frames. ], batch size: 81, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:01:10,247 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-18 02:01:25,096 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3634420.0, ans=0.2 2024-08-18 02:01:40,107 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.13 vs. limit=15.0 2024-08-18 02:01:52,065 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3634520.0, ans=0.0 2024-08-18 02:01:58,643 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.40 vs. limit=10.0 2024-08-18 02:02:08,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=3634620.0, ans=15.0 2024-08-18 02:02:13,139 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3634720.0, ans=0.1 2024-08-18 02:02:27,860 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 5850, loss[loss=0.09513, beats_loss=0.01067, ecapa_loss=0.000149, whisper_loss=0.08297, over 21044.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01048, ecapa_loss=0.000147, whisper_loss=0.09191, over 3894260.30 frames. ], batch size: 88, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:02:38,331 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3634820.0, ans=0.1 2024-08-18 02:02:42,981 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-18 02:02:55,246 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.380e+01 2.623e+01 2.964e+01 4.271e+01, threshold=5.246e+01, percent-clipped=0.0 2024-08-18 02:03:04,609 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3635020.0, ans=0.04949747468305833 2024-08-18 02:03:08,251 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.24 vs. limit=15.0 2024-08-18 02:03:25,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3635120.0, ans=0.125 2024-08-18 02:03:29,011 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=3635220.0, ans=0.05 2024-08-18 02:03:41,473 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 5900, loss[loss=0.09323, beats_loss=0.01227, ecapa_loss=0.0001521, whisper_loss=0.07944, over 20247.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01059, ecapa_loss=0.0001468, whisper_loss=0.09064, over 3881747.97 frames. ], batch size: 84, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:03:42,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=3635320.0, ans=22.5 2024-08-18 02:04:07,043 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-18 02:04:31,164 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-18 02:04:32,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3635620.0, ans=0.0 2024-08-18 02:04:34,406 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3635620.0, ans=0.125 2024-08-18 02:04:38,999 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-18 02:04:40,553 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3635720.0, ans=0.125 2024-08-18 02:04:41,876 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 15 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-18 02:04:55,745 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 5950, loss[loss=0.1124, beats_loss=0.008284, ecapa_loss=0.0001641, whisper_loss=0.1024, over 16012.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01058, ecapa_loss=0.0001473, whisper_loss=0.09019, over 3869155.68 frames. ], batch size: 61, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:05:17,462 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 20 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-18 02:05:17,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3635920.0, ans=0.0 2024-08-18 02:05:18,106 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.11 vs. limit=15.0 2024-08-18 02:05:21,851 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.770e+01 2.282e+01 2.567e+01 2.844e+01 4.690e+01, threshold=5.134e+01, percent-clipped=0.0 2024-08-18 02:05:41,076 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 23 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-18 02:06:10,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3636220.0, ans=0.125 2024-08-18 02:06:13,467 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 6000, loss[loss=0.1033, beats_loss=0.01148, ecapa_loss=0.0001716, whisper_loss=0.09014, over 23418.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01064, ecapa_loss=0.0001467, whisper_loss=0.09059, over 3878259.67 frames. ], batch size: 94, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:06:13,468 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-18 02:06:47,651 INFO [train_multi_KD3.py:1149] (2/4) Epoch 25, validation on ASR_libri: loss=0.2524, beats_loss=0, ecapa_loss=0.0005148, whisper_loss=0.2472, over 922467.00 frames. 2024-08-18 02:07:03,264 INFO [train_multi_KD3.py:1149] (2/4) Epoch 25, validation on SV_voxceleb1: loss=0.004108, beats_loss=0, ecapa_loss=0.0004108, whisper_loss=0, over 939242.00 frames. 2024-08-18 02:08:43,123 INFO [train_multi_KD3.py:1149] (2/4) Epoch 25, validation on AT_audioset: loss=0.02328, beats_loss=0.02328, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 02:08:43,126 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31584MB 2024-08-18 02:08:51,251 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 22 from LS+wenet, 19 from Vox, 52 fro AS 2024-08-18 02:08:53,152 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.02 vs. limit=15.0 2024-08-18 02:08:54,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=3636320.0, ans=22.5 2024-08-18 02:09:06,417 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 17 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-18 02:09:23,316 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-18 02:09:36,490 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-18 02:09:37,141 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3636720.0, ans=0.04949747468305833 2024-08-18 02:09:47,641 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 6050, loss[loss=0.09841, beats_loss=0.01142, ecapa_loss=0.0001335, whisper_loss=0.08565, over 16033.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01061, ecapa_loss=0.000147, whisper_loss=0.09054, over 3879674.41 frames. ], batch size: 63, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:09:50,981 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3636820.0, ans=0.2 2024-08-18 02:09:53,255 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 38 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-18 02:10:02,414 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2024-08-18 02:10:02,432 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.59 vs. limit=15.0 2024-08-18 02:10:08,392 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 02:10:09,396 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.386e+01 2.689e+01 3.046e+01 1.667e+02, threshold=5.379e+01, percent-clipped=1.0 2024-08-18 02:10:13,450 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-18 02:10:19,072 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3637020.0, ans=0.09899494936611666 2024-08-18 02:10:42,457 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 32 from Vox, 26 fro AS 2024-08-18 02:10:46,063 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 19 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-18 02:10:52,445 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 6100, loss[loss=0.11, beats_loss=0.01138, ecapa_loss=0.0001388, whisper_loss=0.09724, over 23021.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01055, ecapa_loss=0.0001478, whisper_loss=0.09115, over 3912164.78 frames. ], batch size: 92, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:11:07,690 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 19 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-18 02:11:19,403 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 29 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-18 02:11:34,343 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 22 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-18 02:11:34,610 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3637620.0, ans=0.05 2024-08-18 02:11:40,798 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3637620.0, ans=0.125 2024-08-18 02:11:55,593 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 6150, loss[loss=0.08277, beats_loss=0.0119, ecapa_loss=0.0001229, whisper_loss=0.06964, over 15279.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0106, ecapa_loss=0.0001478, whisper_loss=0.09063, over 3915086.07 frames. ], batch size: 57, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:12:07,756 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.86 vs. limit=15.0 2024-08-18 02:12:09,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3637920.0, ans=0.125 2024-08-18 02:12:14,581 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 15 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-18 02:12:16,890 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.229e+01 2.428e+01 2.694e+01 3.816e+01, threshold=4.856e+01, percent-clipped=0.0 2024-08-18 02:12:17,502 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3637920.0, ans=0.1 2024-08-18 02:12:20,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3638020.0, ans=0.125 2024-08-18 02:12:37,547 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 30 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-18 02:12:46,653 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-18 02:12:55,298 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.33 vs. limit=10.0 2024-08-18 02:12:58,556 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 02:12:59,356 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 6200, loss[loss=0.09871, beats_loss=0.01214, ecapa_loss=0.000137, whisper_loss=0.0852, over 22123.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01056, ecapa_loss=0.0001472, whisper_loss=0.09113, over 3899617.92 frames. ], batch size: 90, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:13:01,200 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3638320.0, ans=0.125 2024-08-18 02:13:21,010 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3638420.0, ans=0.0 2024-08-18 02:13:44,307 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3638620.0, ans=0.2 2024-08-18 02:13:44,311 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3638620.0, ans=0.2 2024-08-18 02:13:53,119 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3638720.0, ans=0.125 2024-08-18 02:13:56,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3638720.0, ans=0.125 2024-08-18 02:14:02,842 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 6250, loss[loss=0.1253, beats_loss=0.006661, ecapa_loss=0.0001733, whisper_loss=0.1169, over 17459.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01055, ecapa_loss=0.0001472, whisper_loss=0.09081, over 3891012.46 frames. ], batch size: 68, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:14:06,347 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3638820.0, ans=0.125 2024-08-18 02:14:07,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3638820.0, ans=0.125 2024-08-18 02:14:08,468 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-18 02:14:09,139 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.58 vs. limit=15.0 2024-08-18 02:14:20,108 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3638920.0, ans=0.09899494936611666 2024-08-18 02:14:24,935 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.392e+01 2.630e+01 2.962e+01 1.833e+02, threshold=5.259e+01, percent-clipped=3.0 2024-08-18 02:14:34,059 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3639020.0, ans=0.125 2024-08-18 02:14:36,435 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3639020.0, ans=0.2 2024-08-18 02:14:40,616 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.44 vs. limit=15.0 2024-08-18 02:14:45,242 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 36 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-18 02:14:53,289 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.78 vs. limit=22.5 2024-08-18 02:15:06,352 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 6300, loss[loss=0.1138, beats_loss=0.008026, ecapa_loss=0.0001771, whisper_loss=0.104, over 21138.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01051, ecapa_loss=0.0001471, whisper_loss=0.0912, over 3896055.98 frames. ], batch size: 86, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:15:52,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3639620.0, ans=0.0 2024-08-18 02:16:10,548 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 6350, loss[loss=0.09694, beats_loss=0.008577, ecapa_loss=0.0001502, whisper_loss=0.08686, over 14654.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01054, ecapa_loss=0.0001476, whisper_loss=0.09014, over 3856421.14 frames. ], batch size: 54, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:16:11,440 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.98 vs. limit=15.0 2024-08-18 02:16:14,570 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 21 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-18 02:16:21,164 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3639820.0, ans=0.1 2024-08-18 02:16:21,183 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3639820.0, ans=0.2 2024-08-18 02:16:22,347 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3639920.0, ans=0.0 2024-08-18 02:16:32,120 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.636e+01 2.370e+01 2.587e+01 3.086e+01 3.331e+02, threshold=5.174e+01, percent-clipped=2.0 2024-08-18 02:16:36,732 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3639920.0, ans=0.0 2024-08-18 02:16:40,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3640020.0, ans=0.0 2024-08-18 02:16:40,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3640020.0, ans=0.0 2024-08-18 02:16:43,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3640020.0, ans=0.0 2024-08-18 02:16:46,788 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 23 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-18 02:17:10,185 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.83 vs. limit=10.0 2024-08-18 02:17:17,029 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 6400, loss[loss=0.1218, beats_loss=0.009989, ecapa_loss=0.000158, whisper_loss=0.1103, over 23957.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01054, ecapa_loss=0.0001469, whisper_loss=0.09027, over 3869223.22 frames. ], batch size: 93, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:17:22,747 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3640320.0, ans=0.125 2024-08-18 02:17:26,629 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.45 vs. limit=12.0 2024-08-18 02:17:29,956 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 18 from LS+wenet, 14 from Vox, 47 fro AS 2024-08-18 02:17:33,484 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 17 from LS+wenet, 19 from Vox, 57 fro AS 2024-08-18 02:17:34,518 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 17 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-18 02:17:48,435 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 26 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-18 02:17:48,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3640520.0, ans=0.125 2024-08-18 02:17:51,640 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.38 vs. limit=12.0 2024-08-18 02:17:56,896 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-18 02:18:00,902 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3640620.0, ans=0.1 2024-08-18 02:18:08,880 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3640720.0, ans=0.09899494936611666 2024-08-18 02:18:13,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3640720.0, ans=0.2 2024-08-18 02:18:19,975 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 6450, loss[loss=0.1206, beats_loss=0.008993, ecapa_loss=0.0001624, whisper_loss=0.11, over 23009.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01062, ecapa_loss=0.0001468, whisper_loss=0.08981, over 3893347.60 frames. ], batch size: 92, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:18:24,009 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 17 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-18 02:18:34,095 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-18 02:18:41,561 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.367e+01 2.593e+01 3.031e+01 7.766e+01, threshold=5.185e+01, percent-clipped=4.0 2024-08-18 02:18:52,913 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-18 02:19:03,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3641120.0, ans=0.125 2024-08-18 02:19:12,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3641220.0, ans=0.125 2024-08-18 02:19:12,967 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-18 02:19:22,899 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 6500, loss[loss=0.09883, beats_loss=0.0122, ecapa_loss=0.0001724, whisper_loss=0.0849, over 21759.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01061, ecapa_loss=0.0001457, whisper_loss=0.09061, over 3897004.51 frames. ], batch size: 91, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:19:23,256 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3641320.0, ans=0.0 2024-08-18 02:19:25,752 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 24 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-18 02:19:38,482 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3641420.0, ans=0.0 2024-08-18 02:19:39,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3641420.0, ans=0.1 2024-08-18 02:19:49,483 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 17 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-18 02:19:50,606 WARNING [optim.py:496] (2/4) Scaling gradients by 0.09095240384340286, model_norm_threshold=51.85380935668945 2024-08-18 02:19:50,775 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.1.norm.log_scale with proportion 0.12, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.024e+04, grad_sumsq=4.024e+04, orig_rms_sq=1.000e+00 2024-08-18 02:20:08,707 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-18 02:20:10,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3641620.0, ans=0.125 2024-08-18 02:20:24,285 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.24 vs. limit=15.0 2024-08-18 02:20:26,008 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 6550, loss[loss=0.1013, beats_loss=0.01147, ecapa_loss=0.0001444, whisper_loss=0.0884, over 21757.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01059, ecapa_loss=0.000147, whisper_loss=0.09086, over 3899681.95 frames. ], batch size: 90, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:20:29,959 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 14 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-18 02:20:47,246 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.617e+01 2.352e+01 2.620e+01 3.012e+01 5.701e+02, threshold=5.240e+01, percent-clipped=4.0 2024-08-18 02:20:49,048 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 18 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-18 02:20:55,921 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.80 vs. limit=22.5 2024-08-18 02:21:10,329 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-18 02:21:13,352 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.42 vs. limit=15.0 2024-08-18 02:21:29,281 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 6600, loss[loss=0.1114, beats_loss=0.01206, ecapa_loss=0.0001222, whisper_loss=0.09814, over 23042.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01051, ecapa_loss=0.0001471, whisper_loss=0.09185, over 3933378.62 frames. ], batch size: 89, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:21:38,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3642320.0, ans=0.125 2024-08-18 02:21:42,119 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3642420.0, ans=0.125 2024-08-18 02:21:49,348 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-18 02:21:50,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3642420.0, ans=0.125 2024-08-18 02:21:50,808 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3642420.0, ans=0.2 2024-08-18 02:22:00,475 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=3642520.0, ans=10.0 2024-08-18 02:22:14,500 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 27 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-18 02:22:17,353 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3642620.0, ans=0.2 2024-08-18 02:22:23,179 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-18 02:22:27,303 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 30 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-18 02:22:32,352 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 6650, loss[loss=0.09858, beats_loss=0.0103, ecapa_loss=0.0001453, whisper_loss=0.08682, over 18503.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01051, ecapa_loss=0.0001475, whisper_loss=0.09157, over 3962130.25 frames. ], batch size: 70, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:22:35,840 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.93 vs. limit=15.0 2024-08-18 02:22:36,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3642820.0, ans=0.125 2024-08-18 02:22:37,823 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3642820.0, ans=0.125 2024-08-18 02:22:42,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3642820.0, ans=0.0 2024-08-18 02:22:51,636 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3642920.0, ans=0.1 2024-08-18 02:22:53,773 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.312e+01 2.682e+01 2.929e+01 4.632e+01, threshold=5.364e+01, percent-clipped=0.0 2024-08-18 02:23:12,268 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=3643120.0, ans=0.2 2024-08-18 02:23:23,828 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.48 vs. limit=10.0 2024-08-18 02:23:28,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3643220.0, ans=0.125 2024-08-18 02:23:29,194 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.02 vs. limit=15.0 2024-08-18 02:23:36,041 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 6700, loss[loss=0.1074, beats_loss=0.01045, ecapa_loss=0.0001373, whisper_loss=0.09562, over 18907.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01048, ecapa_loss=0.0001478, whisper_loss=0.0916, over 3957541.90 frames. ], batch size: 74, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:23:44,224 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3643320.0, ans=0.125 2024-08-18 02:23:49,412 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3643420.0, ans=0.0 2024-08-18 02:24:09,769 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3643520.0, ans=0.1 2024-08-18 02:24:23,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3643620.0, ans=0.0 2024-08-18 02:24:27,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3643720.0, ans=0.1 2024-08-18 02:24:31,090 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=3643720.0, ans=0.2 2024-08-18 02:24:36,261 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.518e-03 2024-08-18 02:24:39,521 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 6750, loss[loss=0.07957, beats_loss=0.0101, ecapa_loss=0.0001306, whisper_loss=0.06816, over 17177.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01049, ecapa_loss=0.0001469, whisper_loss=0.09128, over 3931542.03 frames. ], batch size: 65, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:24:46,426 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3643820.0, ans=0.0 2024-08-18 02:24:58,059 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3643920.0, ans=0.125 2024-08-18 02:25:01,356 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.309e+01 2.478e+01 2.753e+01 4.386e+01, threshold=4.956e+01, percent-clipped=0.0 2024-08-18 02:25:07,223 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3644020.0, ans=0.125 2024-08-18 02:25:07,479 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.87 vs. limit=22.5 2024-08-18 02:25:10,610 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3644020.0, ans=0.0 2024-08-18 02:25:18,141 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3644120.0, ans=0.0 2024-08-18 02:25:22,243 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2024-08-18 02:25:26,774 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 31 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-18 02:25:37,316 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3644220.0, ans=0.0 2024-08-18 02:25:39,495 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-18 02:25:43,311 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 6800, loss[loss=0.08782, beats_loss=0.009331, ecapa_loss=0.0002147, whisper_loss=0.07634, over 21213.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01041, ecapa_loss=0.0001472, whisper_loss=0.092, over 3950027.66 frames. ], batch size: 93, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:25:48,283 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 17 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-18 02:25:49,600 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 22 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-18 02:25:53,071 WARNING [optim.py:496] (2/4) Scaling gradients by 0.054543543606996536, model_norm_threshold=49.555973052978516 2024-08-18 02:25:53,245 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.17, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.396e+05, grad_sumsq=4.165e+04, orig_rms_sq=3.352e+00 2024-08-18 02:26:01,423 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 22 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-18 02:26:04,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3644420.0, ans=0.07 2024-08-18 02:26:22,119 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 15 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-18 02:26:22,674 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.69 vs. limit=22.5 2024-08-18 02:26:33,538 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 23 from LS+wenet, 16 from Vox, 17 fro AS 2024-08-18 02:26:40,119 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-18 02:26:47,874 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 6850, loss[loss=0.1093, beats_loss=0.01006, ecapa_loss=0.000144, whisper_loss=0.09784, over 22847.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01038, ecapa_loss=0.0001479, whisper_loss=0.09134, over 3890021.16 frames. ], batch size: 93, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:27:07,608 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3644920.0, ans=0.0 2024-08-18 02:27:09,751 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.836e+01 2.331e+01 2.507e+01 2.800e+01 9.086e+02, threshold=5.014e+01, percent-clipped=3.0 2024-08-18 02:27:20,119 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-18 02:27:21,582 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-18 02:27:23,362 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3645020.0, ans=0.0 2024-08-18 02:27:51,595 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 6900, loss[loss=0.09898, beats_loss=0.009424, ecapa_loss=0.0001716, whisper_loss=0.08784, over 17690.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0105, ecapa_loss=0.0001473, whisper_loss=0.09006, over 3888342.17 frames. ], batch size: 74, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:28:21,140 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3645520.0, ans=0.125 2024-08-18 02:28:40,898 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-18 02:28:43,448 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-18 02:28:49,246 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.90 vs. limit=22.5 2024-08-18 02:28:54,603 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 6950, loss[loss=0.09627, beats_loss=0.0106, ecapa_loss=0.0001518, whisper_loss=0.08414, over 20039.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0106, ecapa_loss=0.0001456, whisper_loss=0.09048, over 3904157.51 frames. ], batch size: 81, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:29:04,314 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-18 02:29:16,088 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.856e+01 2.274e+01 2.573e+01 2.776e+01 4.485e+01, threshold=5.146e+01, percent-clipped=0.0 2024-08-18 02:29:16,506 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 27 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-18 02:29:28,113 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-18 02:29:30,969 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-18 02:29:59,583 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 7000, loss[loss=0.1088, beats_loss=0.009099, ecapa_loss=0.0001503, whisper_loss=0.09817, over 17666.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01058, ecapa_loss=0.0001451, whisper_loss=0.09082, over 3896994.62 frames. ], batch size: 68, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:30:02,256 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 17 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-18 02:30:31,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3646520.0, ans=0.0 2024-08-18 02:30:40,732 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 34 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-18 02:30:44,533 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 18 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-18 02:30:45,057 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3646620.0, ans=0.125 2024-08-18 02:30:57,089 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.10 vs. limit=15.0 2024-08-18 02:31:01,848 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 22 from LS+wenet, 32 from Vox, 38 fro AS 2024-08-18 02:31:02,090 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3646720.0, ans=0.125 2024-08-18 02:31:09,216 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 7050, loss[loss=0.1201, beats_loss=0.01069, ecapa_loss=0.0001425, whisper_loss=0.108, over 23635.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01054, ecapa_loss=0.000146, whisper_loss=0.0911, over 3913385.64 frames. ], batch size: 93, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:31:11,849 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 19 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-18 02:31:22,069 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3646920.0, ans=0.125 2024-08-18 02:31:32,738 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.611e+01 2.265e+01 2.505e+01 2.734e+01 3.913e+01, threshold=5.009e+01, percent-clipped=0.0 2024-08-18 02:31:41,895 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-18 02:31:44,677 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3647020.0, ans=0.2 2024-08-18 02:32:06,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3647220.0, ans=0.125 2024-08-18 02:32:07,950 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3647220.0, ans=0.05 2024-08-18 02:32:13,621 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 7100, loss[loss=0.09208, beats_loss=0.01039, ecapa_loss=0.0001093, whisper_loss=0.0806, over 16633.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01057, ecapa_loss=0.0001435, whisper_loss=0.09119, over 3899659.75 frames. ], batch size: 62, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:32:15,506 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3647320.0, ans=0.0 2024-08-18 02:32:31,433 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3647420.0, ans=0.125 2024-08-18 02:32:36,710 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.86 vs. limit=22.5 2024-08-18 02:32:37,197 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-18 02:32:37,768 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3647520.0, ans=0.125 2024-08-18 02:32:45,500 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 36 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-18 02:32:49,537 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.65 vs. limit=15.0 2024-08-18 02:33:04,681 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.18 vs. limit=15.0 2024-08-18 02:33:05,323 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-18 02:33:05,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3647720.0, ans=0.125 2024-08-18 02:33:11,223 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-18 02:33:15,157 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 7150, loss[loss=0.1004, beats_loss=0.009687, ecapa_loss=0.0001467, whisper_loss=0.08927, over 15347.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01057, ecapa_loss=0.0001433, whisper_loss=0.09124, over 3935965.13 frames. ], batch size: 59, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:33:30,223 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 28 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-18 02:33:34,224 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3647920.0, ans=0.2 2024-08-18 02:33:36,649 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.260e+01 2.515e+01 2.745e+01 4.282e+01, threshold=5.030e+01, percent-clipped=0.0 2024-08-18 02:33:56,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3648120.0, ans=0.0 2024-08-18 02:34:18,935 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 02:34:18,972 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3648320.0, ans=0.1 2024-08-18 02:34:19,717 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 7200, loss[loss=0.1037, beats_loss=0.01099, ecapa_loss=0.0001595, whisper_loss=0.09114, over 22322.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01053, ecapa_loss=0.0001428, whisper_loss=0.09166, over 3942520.81 frames. ], batch size: 93, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:34:28,700 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3648320.0, ans=0.125 2024-08-18 02:34:40,392 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3648420.0, ans=0.0 2024-08-18 02:34:40,663 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.29 vs. limit=12.0 2024-08-18 02:34:49,258 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3648520.0, ans=0.0 2024-08-18 02:34:50,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3648520.0, ans=0.0 2024-08-18 02:35:10,187 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3648720.0, ans=0.0 2024-08-18 02:35:17,180 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 25 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-18 02:35:21,971 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 7250, loss[loss=0.1223, beats_loss=0.00828, ecapa_loss=0.0001639, whisper_loss=0.1124, over 15264.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01048, ecapa_loss=0.0001437, whisper_loss=0.09189, over 3959774.45 frames. ], batch size: 60, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:35:25,959 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3648820.0, ans=0.0 2024-08-18 02:35:30,731 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 27 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-18 02:35:43,429 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.335e+01 2.544e+01 2.816e+01 3.698e+01, threshold=5.087e+01, percent-clipped=0.0 2024-08-18 02:35:55,985 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-18 02:35:57,570 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3649020.0, ans=0.1 2024-08-18 02:36:08,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3649120.0, ans=0.0 2024-08-18 02:36:10,058 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3649120.0, ans=0.0 2024-08-18 02:36:24,100 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.64 vs. limit=15.0 2024-08-18 02:36:24,295 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 7300, loss[loss=0.06449, beats_loss=0.01158, ecapa_loss=0.0001869, whisper_loss=0.05103, over 13010.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01044, ecapa_loss=0.0001446, whisper_loss=0.09227, over 3940731.60 frames. ], batch size: 53, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:36:24,928 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3649320.0, ans=0.0 2024-08-18 02:36:40,169 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3649420.0, ans=0.125 2024-08-18 02:36:41,248 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3649420.0, ans=0.1 2024-08-18 02:36:53,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3649520.0, ans=0.0 2024-08-18 02:36:55,570 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.98 vs. limit=8.0 2024-08-18 02:36:56,360 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3649520.0, ans=0.2 2024-08-18 02:36:59,790 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-18 02:37:01,244 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-18 02:37:08,994 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3649620.0, ans=0.04949747468305833 2024-08-18 02:37:27,379 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 7350, loss[loss=0.1054, beats_loss=0.01079, ecapa_loss=0.0001676, whisper_loss=0.09295, over 19944.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01042, ecapa_loss=0.0001462, whisper_loss=0.09154, over 3909480.79 frames. ], batch size: 80, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:37:36,524 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3649820.0, ans=0.2 2024-08-18 02:37:38,958 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3649920.0, ans=0.0 2024-08-18 02:37:46,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3649920.0, ans=0.125 2024-08-18 02:37:48,685 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.729e+01 2.224e+01 2.419e+01 2.653e+01 3.642e+01, threshold=4.838e+01, percent-clipped=0.0 2024-08-18 02:38:21,001 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-18 02:38:22,466 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 02:38:22,892 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.33 vs. limit=15.0 2024-08-18 02:38:23,463 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-18 02:38:29,263 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 7400, loss[loss=0.08082, beats_loss=0.01266, ecapa_loss=0.000158, whisper_loss=0.06658, over 16992.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01054, ecapa_loss=0.0001454, whisper_loss=0.09066, over 3891689.55 frames. ], batch size: 68, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:38:43,220 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-18 02:38:47,190 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3650420.0, ans=0.035 2024-08-18 02:39:09,199 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3650620.0, ans=0.1 2024-08-18 02:39:30,934 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 7450, loss[loss=0.08545, beats_loss=0.01046, ecapa_loss=0.0001556, whisper_loss=0.07343, over 13762.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01052, ecapa_loss=0.0001446, whisper_loss=0.09098, over 3867743.92 frames. ], batch size: 55, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:39:52,035 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.351e+01 2.534e+01 2.762e+01 3.772e+01, threshold=5.068e+01, percent-clipped=0.0 2024-08-18 02:39:52,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3650920.0, ans=0.125 2024-08-18 02:39:52,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3650920.0, ans=0.125 2024-08-18 02:39:52,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3650920.0, ans=0.07 2024-08-18 02:39:56,492 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3651020.0, ans=0.2 2024-08-18 02:39:57,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3651020.0, ans=0.125 2024-08-18 02:40:04,596 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-18 02:40:05,794 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-18 02:40:07,685 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.36 vs. limit=12.0 2024-08-18 02:40:09,643 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3651120.0, ans=0.125 2024-08-18 02:40:19,749 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3651220.0, ans=0.125 2024-08-18 02:40:24,195 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-18 02:40:32,762 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 7500, loss[loss=0.1035, beats_loss=0.01114, ecapa_loss=0.0001042, whisper_loss=0.09128, over 23360.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01059, ecapa_loss=0.0001438, whisper_loss=0.09082, over 3874947.84 frames. ], batch size: 91, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:40:38,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3651320.0, ans=0.125 2024-08-18 02:40:58,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3651520.0, ans=0.0 2024-08-18 02:41:12,151 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.72 vs. limit=15.0 2024-08-18 02:41:26,613 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3651720.0, ans=0.0 2024-08-18 02:41:35,019 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 7550, loss[loss=0.1208, beats_loss=0.008869, ecapa_loss=0.0001337, whisper_loss=0.1106, over 24259.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01052, ecapa_loss=0.000144, whisper_loss=0.09116, over 3849071.55 frames. ], batch size: 92, lr: 2.45e-03, grad_scale: 1.152921504606847e+18 2024-08-18 02:41:37,705 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-18 02:41:39,089 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-18 02:41:40,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3651820.0, ans=0.125 2024-08-18 02:41:50,020 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 21 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-18 02:41:51,586 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3651920.0, ans=0.125 2024-08-18 02:41:56,106 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.283e+01 2.523e+01 2.754e+01 3.706e+01, threshold=5.046e+01, percent-clipped=0.0 2024-08-18 02:42:01,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3652020.0, ans=0.0 2024-08-18 02:42:02,108 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.18 vs. limit=6.0 2024-08-18 02:42:04,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3652020.0, ans=0.2 2024-08-18 02:42:07,523 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-18 02:42:08,819 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3652020.0, ans=0.125 2024-08-18 02:42:12,696 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-18 02:42:14,452 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.41 vs. limit=15.0 2024-08-18 02:42:15,416 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3652120.0, ans=0.0 2024-08-18 02:42:15,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3652120.0, ans=0.125 2024-08-18 02:42:17,723 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 27 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-18 02:42:37,830 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 7600, loss[loss=0.1167, beats_loss=0.008827, ecapa_loss=0.0001515, whisper_loss=0.1063, over 17731.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01052, ecapa_loss=0.0001456, whisper_loss=0.09103, over 3864479.89 frames. ], batch size: 69, lr: 2.45e-03, grad_scale: 1.152921504606847e+18 2024-08-18 02:42:49,362 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3652420.0, ans=0.125 2024-08-18 02:42:56,058 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.44 vs. limit=15.0 2024-08-18 02:43:01,948 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3652520.0, ans=0.125 2024-08-18 02:43:04,088 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 22 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-18 02:43:07,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3652520.0, ans=0.125 2024-08-18 02:43:09,605 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.54 vs. limit=12.0 2024-08-18 02:43:20,271 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 18 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-18 02:43:25,461 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 02:43:26,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3652720.0, ans=0.0 2024-08-18 02:43:36,419 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3652720.0, ans=0.0 2024-08-18 02:43:39,552 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.50 vs. limit=12.0 2024-08-18 02:43:40,161 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 7650, loss[loss=0.1343, beats_loss=0.007674, ecapa_loss=0.0001401, whisper_loss=0.1252, over 19823.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01044, ecapa_loss=0.0001459, whisper_loss=0.09113, over 3875877.71 frames. ], batch size: 76, lr: 2.45e-03, grad_scale: 1.152921504606847e+18 2024-08-18 02:43:52,870 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3652920.0, ans=0.5 2024-08-18 02:43:55,441 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-18 02:44:01,635 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.389e+01 2.635e+01 3.049e+01 5.266e+01, threshold=5.270e+01, percent-clipped=1.0 2024-08-18 02:44:03,376 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3652920.0, ans=0.025 2024-08-18 02:44:11,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3653020.0, ans=0.0 2024-08-18 02:44:18,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3653120.0, ans=0.1 2024-08-18 02:44:22,060 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3653120.0, ans=0.2 2024-08-18 02:44:23,368 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3653120.0, ans=0.2 2024-08-18 02:44:34,740 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3653220.0, ans=0.05 2024-08-18 02:44:43,051 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 7700, loss[loss=0.08045, beats_loss=0.01098, ecapa_loss=0.0001468, whisper_loss=0.06801, over 13495.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01043, ecapa_loss=0.0001468, whisper_loss=0.09084, over 3862700.14 frames. ], batch size: 54, lr: 2.45e-03, grad_scale: 1.152921504606847e+18 2024-08-18 02:44:43,458 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3653320.0, ans=0.125 2024-08-18 02:45:13,247 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3653520.0, ans=0.125 2024-08-18 02:45:24,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3653620.0, ans=0.07 2024-08-18 02:45:26,075 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2024-08-18 02:45:33,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3653720.0, ans=0.125 2024-08-18 02:45:34,782 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-18 02:45:34,972 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3653720.0, ans=0.0 2024-08-18 02:45:44,576 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 7750, loss[loss=0.1101, beats_loss=0.01229, ecapa_loss=0.0001492, whisper_loss=0.09634, over 22893.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01052, ecapa_loss=0.0001461, whisper_loss=0.09038, over 3872138.42 frames. ], batch size: 92, lr: 2.45e-03, grad_scale: 1.152921504606847e+18 2024-08-18 02:45:45,841 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-18 02:45:49,862 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3653820.0, ans=0.125 2024-08-18 02:46:06,082 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.345e+01 2.664e+01 3.102e+01 5.489e+01, threshold=5.327e+01, percent-clipped=1.0 2024-08-18 02:46:19,185 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3654020.0, ans=0.1 2024-08-18 02:46:26,728 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 20 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-18 02:46:29,946 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.08 vs. limit=12.0 2024-08-18 02:46:30,923 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.99 vs. limit=15.0 2024-08-18 02:46:37,532 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 31 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-18 02:46:37,824 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3654220.0, ans=0.125 2024-08-18 02:46:46,560 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 24 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-18 02:46:47,619 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 7800, loss[loss=0.1091, beats_loss=0.01012, ecapa_loss=0.0001178, whisper_loss=0.09778, over 19356.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01047, ecapa_loss=0.0001466, whisper_loss=0.09056, over 3892915.25 frames. ], batch size: 74, lr: 2.45e-03, grad_scale: 1.152921504606847e+18 2024-08-18 02:47:05,185 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-18 02:47:09,065 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3654420.0, ans=0.1 2024-08-18 02:47:09,155 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3654420.0, ans=0.125 2024-08-18 02:47:10,085 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 20 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-18 02:47:20,323 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3654520.0, ans=0.1 2024-08-18 02:47:20,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3654520.0, ans=0.0 2024-08-18 02:47:22,688 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 26 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-18 02:47:22,939 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 02:47:26,188 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 23 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-18 02:47:30,390 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.75 vs. limit=15.0 2024-08-18 02:47:37,010 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=3654720.0, ans=15.0 2024-08-18 02:47:40,064 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 17 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-18 02:47:46,711 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.85 vs. limit=15.0 2024-08-18 02:47:49,620 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 7850, loss[loss=0.09712, beats_loss=0.01205, ecapa_loss=0.0001294, whisper_loss=0.08378, over 17694.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0105, ecapa_loss=0.0001469, whisper_loss=0.09047, over 3894253.12 frames. ], batch size: 71, lr: 2.45e-03, grad_scale: 1.152921504606847e+18 2024-08-18 02:47:51,622 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.16 vs. limit=15.0 2024-08-18 02:48:08,450 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3654920.0, ans=0.05 2024-08-18 02:48:10,560 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.350e+01 2.619e+01 2.873e+01 2.983e+02, threshold=5.237e+01, percent-clipped=1.0 2024-08-18 02:48:13,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3655020.0, ans=0.1 2024-08-18 02:48:16,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3655020.0, ans=0.2 2024-08-18 02:48:19,673 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3655020.0, ans=0.125 2024-08-18 02:48:21,089 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3655020.0, ans=0.035 2024-08-18 02:48:24,051 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2024-08-18 02:48:26,146 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3655120.0, ans=0.125 2024-08-18 02:48:44,536 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3655220.0, ans=0.1 2024-08-18 02:48:51,567 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 7900, loss[loss=0.09555, beats_loss=0.009587, ecapa_loss=0.0001246, whisper_loss=0.08472, over 16889.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01058, ecapa_loss=0.0001459, whisper_loss=0.09015, over 3867364.41 frames. ], batch size: 65, lr: 2.45e-03, grad_scale: 1.152921504606847e+18 2024-08-18 02:48:53,856 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.75 vs. limit=15.0 2024-08-18 02:48:55,567 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3655320.0, ans=0.125 2024-08-18 02:48:55,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3655320.0, ans=0.0 2024-08-18 02:49:09,278 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.41 vs. limit=12.0 2024-08-18 02:49:09,930 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 19 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-18 02:49:16,252 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 23 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-18 02:49:22,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3655520.0, ans=0.0 2024-08-18 02:49:43,788 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-18 02:49:46,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3655720.0, ans=0.125 2024-08-18 02:49:50,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3655720.0, ans=0.125 2024-08-18 02:49:53,548 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 7950, loss[loss=0.09953, beats_loss=0.01041, ecapa_loss=0.0001722, whisper_loss=0.0874, over 17814.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0106, ecapa_loss=0.0001462, whisper_loss=0.09014, over 3891494.98 frames. ], batch size: 73, lr: 2.45e-03, grad_scale: 1.152921504606847e+18 2024-08-18 02:49:57,605 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.30 vs. limit=15.0 2024-08-18 02:50:02,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3655820.0, ans=0.0 2024-08-18 02:50:07,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3655920.0, ans=0.0 2024-08-18 02:50:08,812 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3655920.0, ans=0.125 2024-08-18 02:50:14,425 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.39 vs. limit=15.0 2024-08-18 02:50:14,862 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.726e+01 2.294e+01 2.620e+01 2.907e+01 9.033e+01, threshold=5.239e+01, percent-clipped=1.0 2024-08-18 02:50:17,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3656020.0, ans=0.0 2024-08-18 02:50:37,154 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 27 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-18 02:50:46,157 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3656220.0, ans=0.125 2024-08-18 02:50:48,929 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3656220.0, ans=0.0 2024-08-18 02:50:55,543 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 8000, loss[loss=0.09413, beats_loss=0.01057, ecapa_loss=0.0001338, whisper_loss=0.08223, over 21364.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01062, ecapa_loss=0.0001459, whisper_loss=0.08993, over 3892288.07 frames. ], batch size: 81, lr: 2.45e-03, grad_scale: 1.152921504606847e+18 2024-08-18 02:50:55,768 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-18 02:50:56,269 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.86 vs. limit=22.5 2024-08-18 02:51:35,400 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 26 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-18 02:51:43,909 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 15 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-18 02:51:44,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3656720.0, ans=0.0 2024-08-18 02:51:57,240 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 8050, loss[loss=0.1046, beats_loss=0.008672, ecapa_loss=0.0001586, whisper_loss=0.09429, over 19527.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01055, ecapa_loss=0.0001445, whisper_loss=0.09022, over 3870633.40 frames. ], batch size: 78, lr: 2.45e-03, grad_scale: 1.152921504606847e+18 2024-08-18 02:52:11,057 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 20 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-18 02:52:15,273 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.53 vs. limit=15.0 2024-08-18 02:52:19,600 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.387e+01 2.609e+01 2.994e+01 4.076e+01, threshold=5.217e+01, percent-clipped=0.0 2024-08-18 02:52:19,966 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3656920.0, ans=0.125 2024-08-18 02:52:35,910 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 24 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-18 02:52:50,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3657220.0, ans=0.0 2024-08-18 02:52:58,771 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3657320.0, ans=0.2 2024-08-18 02:52:59,773 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 8100, loss[loss=0.1025, beats_loss=0.008947, ecapa_loss=0.000155, whisper_loss=0.092, over 19431.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01059, ecapa_loss=0.0001444, whisper_loss=0.0893, over 3828281.31 frames. ], batch size: 78, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:53:04,111 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3657320.0, ans=0.1 2024-08-18 02:53:50,668 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-18 02:54:02,043 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 8150, loss[loss=0.1133, beats_loss=0.009475, ecapa_loss=0.0001711, whisper_loss=0.1021, over 21250.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01047, ecapa_loss=0.0001457, whisper_loss=0.09025, over 3833123.90 frames. ], batch size: 88, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:54:24,341 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.796e+01 2.183e+01 2.473e+01 2.825e+01 3.767e+01, threshold=4.945e+01, percent-clipped=0.0 2024-08-18 02:54:35,755 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3658020.0, ans=0.1 2024-08-18 02:54:52,033 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3658220.0, ans=0.0 2024-08-18 02:54:55,684 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3658220.0, ans=0.1 2024-08-18 02:55:03,648 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 8200, loss[loss=0.05838, beats_loss=0.01424, ecapa_loss=0.0001152, whisper_loss=0.04299, over 14674.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01045, ecapa_loss=0.0001477, whisper_loss=0.08969, over 3836251.59 frames. ], batch size: 58, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:55:06,617 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.927e+05 2024-08-18 02:55:09,331 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.72 vs. limit=15.0 2024-08-18 02:55:10,228 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3658320.0, ans=10.0 2024-08-18 02:55:14,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3658320.0, ans=0.0 2024-08-18 02:55:15,297 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.31 vs. limit=22.5 2024-08-18 02:55:28,270 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-18 02:55:30,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3658520.0, ans=0.0 2024-08-18 02:55:44,249 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-18 02:55:51,683 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 23 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-18 02:55:54,302 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 21 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-18 02:56:04,687 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.79 vs. limit=12.0 2024-08-18 02:56:05,250 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 8250, loss[loss=0.09634, beats_loss=0.01131, ecapa_loss=0.0001361, whisper_loss=0.08367, over 22783.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01045, ecapa_loss=0.000147, whisper_loss=0.08986, over 3854150.84 frames. ], batch size: 91, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:56:05,991 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.97 vs. limit=22.5 2024-08-18 02:56:27,176 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.707e+01 2.331e+01 2.566e+01 2.988e+01 3.927e+01, threshold=5.131e+01, percent-clipped=0.0 2024-08-18 02:56:29,962 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-18 02:56:30,209 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.431e-01 2024-08-18 02:56:31,045 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-18 02:56:41,636 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3659120.0, ans=0.07 2024-08-18 02:56:48,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3659120.0, ans=0.125 2024-08-18 02:56:50,063 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-18 02:56:56,182 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 19 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-18 02:56:59,018 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3659220.0, ans=0.2 2024-08-18 02:57:00,712 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.29 vs. limit=22.5 2024-08-18 02:57:05,169 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3659220.0, ans=0.1 2024-08-18 02:57:05,473 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.72 vs. limit=10.0 2024-08-18 02:57:07,218 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 8300, loss[loss=0.1096, beats_loss=0.008252, ecapa_loss=0.0001144, whisper_loss=0.1002, over 18894.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01047, ecapa_loss=0.0001468, whisper_loss=0.08932, over 3831697.01 frames. ], batch size: 70, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:57:15,215 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3659320.0, ans=0.125 2024-08-18 02:57:38,425 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-18 02:57:46,537 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3659620.0, ans=0.0 2024-08-18 02:57:57,650 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3659720.0, ans=0.0 2024-08-18 02:58:03,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3659720.0, ans=0.125 2024-08-18 02:58:07,752 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3659720.0, ans=0.125 2024-08-18 02:58:09,755 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 8350, loss[loss=0.085, beats_loss=0.009949, ecapa_loss=0.0001324, whisper_loss=0.07373, over 16187.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01043, ecapa_loss=0.0001468, whisper_loss=0.09004, over 3850340.86 frames. ], batch size: 60, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:58:09,949 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-18 02:58:10,213 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.518e-03 2024-08-18 02:58:14,628 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-18 02:58:25,262 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3659920.0, ans=0.125 2024-08-18 02:58:29,723 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-18 02:58:32,225 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.807e+01 2.239e+01 2.555e+01 2.867e+01 4.693e+01, threshold=5.109e+01, percent-clipped=0.0 2024-08-18 02:58:36,480 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.734e+00 2024-08-18 02:58:41,522 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3660020.0, ans=0.0 2024-08-18 02:58:53,965 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=3660120.0, ans=0.05 2024-08-18 02:58:56,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3660120.0, ans=0.125 2024-08-18 02:58:57,423 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-18 02:59:00,109 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3660220.0, ans=0.125 2024-08-18 02:59:12,361 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 8400, loss[loss=0.1013, beats_loss=0.009344, ecapa_loss=0.0001412, whisper_loss=0.09058, over 22925.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01045, ecapa_loss=0.0001459, whisper_loss=0.08997, over 3853193.89 frames. ], batch size: 90, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:59:15,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3660320.0, ans=0.125 2024-08-18 02:59:24,086 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 29 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-18 02:59:25,193 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-18 02:59:25,925 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.85 vs. limit=15.0 2024-08-18 02:59:27,706 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-18 02:59:56,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3660620.0, ans=0.125 2024-08-18 03:00:02,485 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3660720.0, ans=0.125 2024-08-18 03:00:10,879 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-18 03:00:14,077 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 8450, loss[loss=0.1009, beats_loss=0.00934, ecapa_loss=0.0001815, whisper_loss=0.08977, over 21920.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01044, ecapa_loss=0.0001462, whisper_loss=0.0895, over 3849053.73 frames. ], batch size: 89, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:00:14,184 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-18 03:00:26,899 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3660920.0, ans=0.2 2024-08-18 03:00:30,568 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2024-08-18 03:00:34,086 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3660920.0, ans=0.125 2024-08-18 03:00:36,000 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.643e+01 2.348e+01 2.564e+01 2.815e+01 4.784e+01, threshold=5.128e+01, percent-clipped=0.0 2024-08-18 03:00:36,183 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-18 03:00:54,106 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3661120.0, ans=0.2 2024-08-18 03:01:02,635 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-18 03:01:07,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3661220.0, ans=0.05 2024-08-18 03:01:16,237 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 8500, loss[loss=0.1032, beats_loss=0.00957, ecapa_loss=0.0001465, whisper_loss=0.09217, over 22788.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01042, ecapa_loss=0.0001464, whisper_loss=0.09041, over 3872608.80 frames. ], batch size: 93, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:01:18,007 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.30 vs. limit=15.0 2024-08-18 03:01:33,002 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.57 vs. limit=6.0 2024-08-18 03:01:37,889 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3661420.0, ans=0.1 2024-08-18 03:01:40,504 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3661520.0, ans=0.125 2024-08-18 03:01:47,527 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 15 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-18 03:01:57,394 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 20 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-18 03:02:13,547 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-18 03:02:15,292 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3661720.0, ans=0.125 2024-08-18 03:02:17,869 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.35 vs. limit=15.0 2024-08-18 03:02:18,319 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 8550, loss[loss=0.113, beats_loss=0.01077, ecapa_loss=0.0001146, whisper_loss=0.1011, over 17908.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0104, ecapa_loss=0.0001465, whisper_loss=0.09059, over 3880541.97 frames. ], batch size: 68, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:02:28,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3661820.0, ans=0.0 2024-08-18 03:02:28,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3661820.0, ans=0.04949747468305833 2024-08-18 03:02:31,933 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-18 03:02:40,829 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.327e+01 2.493e+01 2.871e+01 1.525e+02, threshold=4.987e+01, percent-clipped=2.0 2024-08-18 03:02:41,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3661920.0, ans=0.125 2024-08-18 03:03:09,402 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 16 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-18 03:03:20,359 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 8600, loss[loss=0.1036, beats_loss=0.009804, ecapa_loss=0.0001582, whisper_loss=0.0922, over 21818.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0104, ecapa_loss=0.0001467, whisper_loss=0.09048, over 3871717.88 frames. ], batch size: 90, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:03:26,481 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 21 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-18 03:03:35,302 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3662420.0, ans=0.125 2024-08-18 03:03:44,259 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3662520.0, ans=0.0 2024-08-18 03:03:44,704 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.53 vs. limit=22.5 2024-08-18 03:03:46,317 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 36 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-18 03:03:51,798 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.65 vs. limit=12.0 2024-08-18 03:03:55,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3662520.0, ans=0.1 2024-08-18 03:04:00,669 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3662620.0, ans=0.125 2024-08-18 03:04:21,548 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3662820.0, ans=0.1 2024-08-18 03:04:22,293 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 8650, loss[loss=0.09446, beats_loss=0.01247, ecapa_loss=0.0001545, whisper_loss=0.08045, over 16726.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01041, ecapa_loss=0.000147, whisper_loss=0.09045, over 3858924.05 frames. ], batch size: 70, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:04:44,737 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.998e+01 2.275e+01 2.600e+01 2.961e+01 1.282e+02, threshold=5.200e+01, percent-clipped=4.0 2024-08-18 03:04:50,111 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.14 vs. limit=6.0 2024-08-18 03:04:54,107 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.87 vs. limit=6.0 2024-08-18 03:05:07,188 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 26 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-18 03:05:13,998 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3663220.0, ans=0.0 2024-08-18 03:05:14,894 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 17 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-18 03:05:25,013 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 8700, loss[loss=0.081, beats_loss=0.01243, ecapa_loss=0.0001363, whisper_loss=0.06721, over 16533.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01044, ecapa_loss=0.0001471, whisper_loss=0.09026, over 3845445.73 frames. ], batch size: 66, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:05:43,173 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 28 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-18 03:05:45,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3663420.0, ans=0.05 2024-08-18 03:05:50,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3663520.0, ans=0.125 2024-08-18 03:06:03,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3663620.0, ans=0.2 2024-08-18 03:06:13,636 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3663620.0, ans=0.0 2024-08-18 03:06:14,883 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3663720.0, ans=0.125 2024-08-18 03:06:28,356 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 8750, loss[loss=0.07827, beats_loss=0.01485, ecapa_loss=0.0001041, whisper_loss=0.06238, over 18726.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01045, ecapa_loss=0.0001473, whisper_loss=0.09018, over 3840776.60 frames. ], batch size: 75, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:06:28,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3663820.0, ans=0.1 2024-08-18 03:06:46,183 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-18 03:06:46,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3663920.0, ans=0.0 2024-08-18 03:06:47,813 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.01 vs. limit=12.0 2024-08-18 03:06:51,114 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.333e+01 2.575e+01 2.893e+01 4.359e+01, threshold=5.151e+01, percent-clipped=0.0 2024-08-18 03:06:54,945 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3664020.0, ans=0.1 2024-08-18 03:07:13,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3664120.0, ans=0.125 2024-08-18 03:07:31,105 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 8800, loss[loss=0.1297, beats_loss=0.009611, ecapa_loss=0.0001326, whisper_loss=0.1188, over 23368.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01053, ecapa_loss=0.0001465, whisper_loss=0.09024, over 3808942.35 frames. ], batch size: 90, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:07:35,212 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3664320.0, ans=0.125 2024-08-18 03:07:51,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3664420.0, ans=0.0 2024-08-18 03:07:58,277 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.42 vs. limit=22.5 2024-08-18 03:08:01,677 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3664520.0, ans=0.125 2024-08-18 03:08:10,989 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3664620.0, ans=0.2 2024-08-18 03:08:21,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3664720.0, ans=0.125 2024-08-18 03:08:27,771 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-18 03:08:32,075 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 03:08:35,182 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 8850, loss[loss=0.1168, beats_loss=0.008416, ecapa_loss=0.0001371, whisper_loss=0.107, over 18837.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01056, ecapa_loss=0.0001451, whisper_loss=0.09044, over 3809000.65 frames. ], batch size: 70, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:08:57,245 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 16 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-18 03:08:58,345 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.273e+01 2.462e+01 2.757e+01 3.654e+01, threshold=4.925e+01, percent-clipped=0.0 2024-08-18 03:09:00,093 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 13 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-18 03:09:06,774 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3665020.0, ans=0.125 2024-08-18 03:09:06,776 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3665020.0, ans=0.125 2024-08-18 03:09:14,323 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 28 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-18 03:09:14,832 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3665120.0, ans=0.2 2024-08-18 03:09:31,373 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 36 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-18 03:09:40,646 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 8900, loss[loss=0.1257, beats_loss=0.01058, ecapa_loss=0.0001086, whisper_loss=0.114, over 23627.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01053, ecapa_loss=0.0001457, whisper_loss=0.09036, over 3783477.05 frames. ], batch size: 88, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:09:50,611 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-18 03:10:00,651 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.10 vs. limit=15.0 2024-08-18 03:10:07,624 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 13 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-18 03:10:16,116 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=12.40 vs. limit=15.0 2024-08-18 03:10:27,822 WARNING [optim.py:496] (2/4) Scaling gradients by 0.04924134910106659, model_norm_threshold=49.24854278564453 2024-08-18 03:10:27,996 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.21, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.092e+05, grad_sumsq=2.092e+05, orig_rms_sq=1.000e+00 2024-08-18 03:10:28,406 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3665620.0, ans=0.125 2024-08-18 03:10:36,770 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 23 from LS+wenet, 13 from Vox, 42 fro AS 2024-08-18 03:10:48,306 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 8950, loss[loss=0.1071, beats_loss=0.01192, ecapa_loss=0.0001544, whisper_loss=0.09361, over 22064.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01059, ecapa_loss=0.0001447, whisper_loss=0.08966, over 3814015.22 frames. ], batch size: 91, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:11:00,782 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.18 vs. limit=22.5 2024-08-18 03:11:04,382 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3665920.0, ans=0.125 2024-08-18 03:11:12,379 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.306e+01 2.588e+01 2.937e+01 1.000e+03, threshold=5.176e+01, percent-clipped=1.0 2024-08-18 03:11:29,820 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.92 vs. limit=15.0 2024-08-18 03:11:49,340 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 17 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-18 03:11:52,021 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 26 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-18 03:11:53,255 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-18 03:11:53,502 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3666320.0, ans=0.125 2024-08-18 03:11:54,328 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 9000, loss[loss=0.1195, beats_loss=0.01055, ecapa_loss=0.0001445, whisper_loss=0.1075, over 22301.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01059, ecapa_loss=0.0001443, whisper_loss=0.0903, over 3838763.31 frames. ], batch size: 90, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:11:54,328 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-18 03:12:27,776 INFO [train_multi_KD3.py:1149] (2/4) Epoch 25, validation on ASR_libri: loss=0.254, beats_loss=0, ecapa_loss=0.0005275, whisper_loss=0.2487, over 922467.00 frames. 2024-08-18 03:12:44,052 INFO [train_multi_KD3.py:1149] (2/4) Epoch 25, validation on SV_voxceleb1: loss=0.004142, beats_loss=0, ecapa_loss=0.0004142, whisper_loss=0, over 939242.00 frames. 2024-08-18 03:13:46,296 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.4431, 1.8331, 1.5793, 1.2970, 1.4550, 1.4437, 1.6479, 1.6046], device='cuda:2') 2024-08-18 03:13:53,009 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([2.5743, 1.8637, 1.8416, 1.7202, 2.1632, 1.7883, 1.8449, 1.8071], device='cuda:2') 2024-08-18 03:14:18,533 INFO [train_multi_KD3.py:1149] (2/4) Epoch 25, validation on AT_audioset: loss=0.02319, beats_loss=0.02319, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 03:14:18,537 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31584MB 2024-08-18 03:14:28,316 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 23 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-18 03:14:31,306 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 32 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-18 03:14:54,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3666520.0, ans=0.0 2024-08-18 03:15:12,230 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.97 vs. limit=15.0 2024-08-18 03:15:21,537 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 22 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-18 03:15:22,328 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.61 vs. limit=10.0 2024-08-18 03:15:32,506 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 9050, loss[loss=0.1057, beats_loss=0.01092, ecapa_loss=0.0001066, whisper_loss=0.09369, over 20316.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01062, ecapa_loss=0.0001447, whisper_loss=0.08958, over 3817017.04 frames. ], batch size: 77, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:15:32,962 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3666820.0, ans=0.1 2024-08-18 03:15:34,133 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 24 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-18 03:15:51,681 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 17 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-18 03:15:59,335 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.276e+01 2.505e+01 2.759e+01 3.701e+01, threshold=5.009e+01, percent-clipped=0.0 2024-08-18 03:16:09,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3667020.0, ans=0.1 2024-08-18 03:16:15,470 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 19 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-18 03:16:18,265 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 24 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-18 03:16:18,699 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3667120.0, ans=0.0 2024-08-18 03:16:29,265 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 19 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-18 03:16:44,229 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 9100, loss[loss=0.1107, beats_loss=0.009563, ecapa_loss=0.0001619, whisper_loss=0.09953, over 22284.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01054, ecapa_loss=0.0001456, whisper_loss=0.09005, over 3838969.31 frames. ], batch size: 90, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:17:30,565 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.27 vs. limit=22.5 2024-08-18 03:17:33,087 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3667620.0, ans=0.2 2024-08-18 03:17:53,562 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3667720.0, ans=0.125 2024-08-18 03:17:54,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3667720.0, ans=0.1 2024-08-18 03:17:56,920 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 9150, loss[loss=0.122, beats_loss=0.009934, ecapa_loss=0.0001367, whisper_loss=0.1107, over 22989.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01056, ecapa_loss=0.0001455, whisper_loss=0.08954, over 3832846.62 frames. ], batch size: 89, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:17:57,977 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.67 vs. limit=15.0 2024-08-18 03:17:59,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3667820.0, ans=0.125 2024-08-18 03:18:14,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3667920.0, ans=0.0 2024-08-18 03:18:17,232 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 36 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-18 03:18:22,954 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.335e+01 2.594e+01 2.955e+01 5.294e+01, threshold=5.187e+01, percent-clipped=1.0 2024-08-18 03:18:37,257 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-18 03:18:47,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3668120.0, ans=0.05 2024-08-18 03:18:48,132 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.22 vs. limit=5.0 2024-08-18 03:18:55,980 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3668220.0, ans=0.0 2024-08-18 03:18:58,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3668220.0, ans=0.125 2024-08-18 03:19:09,847 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 9200, loss[loss=0.08115, beats_loss=0.01346, ecapa_loss=0.0001611, whisper_loss=0.06608, over 21703.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01067, ecapa_loss=0.0001458, whisper_loss=0.08877, over 3853174.22 frames. ], batch size: 93, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:19:10,598 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.77 vs. limit=22.5 2024-08-18 03:19:18,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3668320.0, ans=0.1 2024-08-18 03:19:39,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3668520.0, ans=0.1 2024-08-18 03:19:42,083 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-18 03:19:46,742 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 20 from LS+wenet, 24 from Vox, 51 fro AS 2024-08-18 03:19:50,013 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3668520.0, ans=0.2 2024-08-18 03:19:50,019 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3668520.0, ans=0.025 2024-08-18 03:20:17,530 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 32 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-18 03:20:18,285 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.64 vs. limit=15.0 2024-08-18 03:20:18,950 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3668720.0, ans=0.125 2024-08-18 03:20:23,728 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 9250, loss[loss=0.1049, beats_loss=0.008848, ecapa_loss=0.0001402, whisper_loss=0.0947, over 19155.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01065, ecapa_loss=0.0001467, whisper_loss=0.08834, over 3869637.55 frames. ], batch size: 72, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:20:28,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=3668820.0, ans=0.2 2024-08-18 03:20:34,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3668820.0, ans=0.0 2024-08-18 03:20:39,403 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.28 vs. limit=22.5 2024-08-18 03:20:43,786 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3668920.0, ans=0.125 2024-08-18 03:20:51,779 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.747e+01 2.259e+01 2.533e+01 2.843e+01 4.399e+01, threshold=5.065e+01, percent-clipped=0.0 2024-08-18 03:20:55,309 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3669020.0, ans=0.04949747468305833 2024-08-18 03:21:01,949 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.48 vs. limit=15.0 2024-08-18 03:21:25,021 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.82 vs. limit=15.0 2024-08-18 03:21:40,416 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 9300, loss[loss=0.09154, beats_loss=0.0109, ecapa_loss=0.0001716, whisper_loss=0.07892, over 19940.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01065, ecapa_loss=0.0001458, whisper_loss=0.08855, over 3895206.17 frames. ], batch size: 84, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:21:40,608 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-18 03:21:40,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3669320.0, ans=0.125 2024-08-18 03:22:02,241 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 31 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-18 03:22:17,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3669520.0, ans=0.0 2024-08-18 03:22:27,811 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3669620.0, ans=0.2 2024-08-18 03:22:30,812 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3669620.0, ans=0.125 2024-08-18 03:22:32,018 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3669620.0, ans=0.125 2024-08-18 03:22:45,133 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.94 vs. limit=22.5 2024-08-18 03:22:48,249 WARNING [optim.py:496] (2/4) Scaling gradients by 0.053159911185503006, model_norm_threshold=50.65474319458008 2024-08-18 03:22:48,421 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.22, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.966e+05, grad_sumsq=1.966e+05, orig_rms_sq=1.000e+00 2024-08-18 03:22:54,346 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 13 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-18 03:22:56,867 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 9350, loss[loss=0.1132, beats_loss=0.009788, ecapa_loss=0.0001442, whisper_loss=0.102, over 22739.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01058, ecapa_loss=0.0001453, whisper_loss=0.08934, over 3868996.32 frames. ], batch size: 90, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:23:02,344 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3669820.0, ans=0.125 2024-08-18 03:23:08,198 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2024-08-18 03:23:08,925 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-18 03:23:13,589 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 36 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-18 03:23:16,933 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 14 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-18 03:23:22,056 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3669920.0, ans=0.04949747468305833 2024-08-18 03:23:24,285 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.389e+01 2.563e+01 2.950e+01 9.529e+02, threshold=5.125e+01, percent-clipped=2.0 2024-08-18 03:23:24,930 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=3.117e-02 2024-08-18 03:23:55,001 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.93 vs. limit=15.0 2024-08-18 03:24:11,760 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 9400, loss[loss=0.09673, beats_loss=0.01207, ecapa_loss=0.0001473, whisper_loss=0.08319, over 21973.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01053, ecapa_loss=0.0001465, whisper_loss=0.08992, over 3898359.60 frames. ], batch size: 90, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:24:25,918 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-18 03:24:34,447 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-18 03:24:37,757 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 15 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-18 03:24:38,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3670420.0, ans=0.1 2024-08-18 03:24:46,526 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3670520.0, ans=0.125 2024-08-18 03:24:51,665 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.22 vs. limit=10.0 2024-08-18 03:24:53,002 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.74 vs. limit=15.0 2024-08-18 03:25:03,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3670620.0, ans=0.2 2024-08-18 03:25:11,245 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 03:25:26,771 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 9450, loss[loss=0.08659, beats_loss=0.01288, ecapa_loss=0.0001467, whisper_loss=0.07225, over 16888.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01058, ecapa_loss=0.0001463, whisper_loss=0.08943, over 3908968.15 frames. ], batch size: 69, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:25:31,637 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-18 03:25:41,321 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-18 03:25:55,276 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.676e+01 2.213e+01 2.450e+01 2.826e+01 4.775e+01, threshold=4.900e+01, percent-clipped=0.0 2024-08-18 03:26:05,596 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-18 03:26:26,581 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.60 vs. limit=15.0 2024-08-18 03:26:28,435 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3671120.0, ans=0.125 2024-08-18 03:26:29,588 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3671220.0, ans=0.0 2024-08-18 03:26:45,633 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 9500, loss[loss=0.08061, beats_loss=0.01183, ecapa_loss=0.0001224, whisper_loss=0.06756, over 17908.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01061, ecapa_loss=0.0001467, whisper_loss=0.08944, over 3928978.94 frames. ], batch size: 69, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:26:50,273 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3671320.0, ans=0.0 2024-08-18 03:27:03,159 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.26 vs. limit=15.0 2024-08-18 03:27:03,806 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-18 03:27:06,127 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 23 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-18 03:27:14,130 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3671520.0, ans=0.1 2024-08-18 03:27:21,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3671520.0, ans=0.0 2024-08-18 03:27:47,644 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 22 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-18 03:27:51,805 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-18 03:27:56,490 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3671720.0, ans=0.125 2024-08-18 03:27:59,178 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 9550, loss[loss=0.07234, beats_loss=0.01231, ecapa_loss=0.0001458, whisper_loss=0.05857, over 15130.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01061, ecapa_loss=0.0001472, whisper_loss=0.0889, over 3953497.74 frames. ], batch size: 63, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:28:08,033 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-18 03:28:08,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3671820.0, ans=0.125 2024-08-18 03:28:17,151 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 25 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-18 03:28:22,066 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 22 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-18 03:28:25,963 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.340e+01 2.613e+01 3.002e+01 5.321e+01, threshold=5.225e+01, percent-clipped=2.0 2024-08-18 03:28:26,124 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 22 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-18 03:28:28,037 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.89 vs. limit=22.5 2024-08-18 03:28:40,490 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3672020.0, ans=0.125 2024-08-18 03:28:54,272 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 20 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-18 03:28:55,662 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-18 03:29:04,250 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 35 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-18 03:29:11,736 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 9600, loss[loss=0.1156, beats_loss=0.008525, ecapa_loss=0.0001622, whisper_loss=0.1055, over 22474.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01055, ecapa_loss=0.0001479, whisper_loss=0.08933, over 3920632.91 frames. ], batch size: 91, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:29:27,966 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3672420.0, ans=0.0 2024-08-18 03:29:28,368 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.76 vs. limit=15.0 2024-08-18 03:29:30,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3672420.0, ans=0.2 2024-08-18 03:29:36,896 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3672420.0, ans=0.1 2024-08-18 03:29:39,517 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-18 03:29:42,338 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3672520.0, ans=0.025 2024-08-18 03:29:43,543 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3672520.0, ans=0.1 2024-08-18 03:29:59,690 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=16.48 vs. limit=15.0 2024-08-18 03:30:07,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3672620.0, ans=0.0 2024-08-18 03:30:08,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3672620.0, ans=0.125 2024-08-18 03:30:15,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3672720.0, ans=0.125 2024-08-18 03:30:18,451 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3672720.0, ans=0.125 2024-08-18 03:30:24,978 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 9650, loss[loss=0.08744, beats_loss=0.01275, ecapa_loss=0.0001533, whisper_loss=0.07316, over 15977.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01051, ecapa_loss=0.0001481, whisper_loss=0.08947, over 3887693.43 frames. ], batch size: 64, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:30:50,766 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.281e+01 2.485e+01 2.819e+01 4.590e+01, threshold=4.970e+01, percent-clipped=0.0 2024-08-18 03:31:00,663 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-18 03:31:03,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3673020.0, ans=0.0 2024-08-18 03:31:18,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3673120.0, ans=0.0 2024-08-18 03:31:28,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3673220.0, ans=0.0 2024-08-18 03:31:29,175 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 27 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-18 03:31:41,945 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3673220.0, ans=0.0 2024-08-18 03:31:46,269 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 9700, loss[loss=0.113, beats_loss=0.009549, ecapa_loss=0.0001623, whisper_loss=0.1019, over 22836.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01045, ecapa_loss=0.0001486, whisper_loss=0.09035, over 3893255.04 frames. ], batch size: 91, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:31:46,357 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 26 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-18 03:31:47,750 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.09 vs. limit=15.0 2024-08-18 03:31:49,175 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3673320.0, ans=0.05 2024-08-18 03:31:58,005 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3673320.0, ans=0.125 2024-08-18 03:32:10,154 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-18 03:32:51,202 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3673620.0, ans=0.125 2024-08-18 03:33:12,934 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 9750, loss[loss=0.1125, beats_loss=0.008623, ecapa_loss=0.0001685, whisper_loss=0.1022, over 19730.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01044, ecapa_loss=0.0001484, whisper_loss=0.09023, over 3866133.84 frames. ], batch size: 79, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:33:20,101 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3673820.0, ans=0.2 2024-08-18 03:33:47,831 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.256e+01 2.570e+01 2.973e+01 3.927e+01, threshold=5.141e+01, percent-clipped=0.0 2024-08-18 03:33:53,683 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.52 vs. limit=12.0 2024-08-18 03:33:54,966 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.41 vs. limit=15.0 2024-08-18 03:33:59,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3674020.0, ans=0.05 2024-08-18 03:34:07,672 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3674020.0, ans=0.125 2024-08-18 03:34:28,159 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.28 vs. limit=10.0 2024-08-18 03:34:35,242 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 36 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-18 03:34:55,687 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 9800, loss[loss=0.07936, beats_loss=0.0124, ecapa_loss=0.0001462, whisper_loss=0.0655, over 14106.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01054, ecapa_loss=0.0001478, whisper_loss=0.09009, over 3875739.35 frames. ], batch size: 60, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:34:56,326 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2024-08-18 03:35:11,769 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3674320.0, ans=0.0 2024-08-18 03:35:13,821 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3674420.0, ans=0.125 2024-08-18 03:35:26,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3674420.0, ans=0.1 2024-08-18 03:35:33,131 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-18 03:35:39,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3674520.0, ans=0.09899494936611666 2024-08-18 03:35:40,640 WARNING [optim.py:496] (2/4) Scaling gradients by 0.09654395282268524, model_norm_threshold=51.40534973144531 2024-08-18 03:35:40,807 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.0.self_attn_weights.in_proj.bias with proportion 0.22, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.176e+04, grad_sumsq=6.832e+03, orig_rms_sq=9.039e+00 2024-08-18 03:35:44,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3674520.0, ans=0.125 2024-08-18 03:36:12,380 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.12 vs. limit=15.0 2024-08-18 03:36:30,666 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.23 vs. limit=15.0 2024-08-18 03:36:36,468 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 9850, loss[loss=0.1029, beats_loss=0.01146, ecapa_loss=0.0001873, whisper_loss=0.08957, over 17695.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01051, ecapa_loss=0.0001482, whisper_loss=0.09031, over 3864471.91 frames. ], batch size: 73, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:36:49,239 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 25 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-18 03:36:57,098 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3674920.0, ans=0.1 2024-08-18 03:37:12,184 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.872e+01 2.310e+01 2.559e+01 2.794e+01 5.325e+02, threshold=5.118e+01, percent-clipped=2.0 2024-08-18 03:37:57,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3675220.0, ans=0.125 2024-08-18 03:37:57,862 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3675220.0, ans=0.125 2024-08-18 03:38:04,278 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3675220.0, ans=0.1 2024-08-18 03:38:21,314 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 9900, loss[loss=0.08294, beats_loss=0.01161, ecapa_loss=0.0001434, whisper_loss=0.06989, over 21464.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01054, ecapa_loss=0.0001472, whisper_loss=0.09045, over 3893344.66 frames. ], batch size: 90, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:38:25,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3675320.0, ans=0.125 2024-08-18 03:38:33,743 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3675320.0, ans=0.0 2024-08-18 03:38:46,992 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3675420.0, ans=0.125 2024-08-18 03:39:02,916 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3675520.0, ans=0.05 2024-08-18 03:39:05,088 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-18 03:39:07,119 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3675620.0, ans=0.0 2024-08-18 03:39:12,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3675620.0, ans=0.125 2024-08-18 03:39:22,465 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.63 vs. limit=15.0 2024-08-18 03:39:25,143 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 18 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-18 03:39:34,817 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 9950, loss[loss=0.1086, beats_loss=0.007158, ecapa_loss=0.0001358, whisper_loss=0.1001, over 15431.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01061, ecapa_loss=0.0001459, whisper_loss=0.09016, over 3909626.70 frames. ], batch size: 57, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:39:44,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3675820.0, ans=0.1 2024-08-18 03:39:47,490 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.73 vs. limit=22.5 2024-08-18 03:40:00,875 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.255e+01 2.488e+01 2.824e+01 3.882e+01, threshold=4.975e+01, percent-clipped=0.0 2024-08-18 03:40:14,239 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3676020.0, ans=0.2 2024-08-18 03:40:33,822 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.24 vs. limit=22.5 2024-08-18 03:40:39,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3676220.0, ans=0.0 2024-08-18 03:40:40,428 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 23 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-18 03:40:42,340 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3676220.0, ans=0.1 2024-08-18 03:40:48,482 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 10000, loss[loss=0.07344, beats_loss=0.01247, ecapa_loss=0.0001427, whisper_loss=0.05954, over 14612.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01064, ecapa_loss=0.0001454, whisper_loss=0.08992, over 3906479.76 frames. ], batch size: 60, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:40:59,333 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 20 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-18 03:41:01,153 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.30 vs. limit=10.0 2024-08-18 03:41:03,580 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 12 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-18 03:41:07,833 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 27 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-18 03:41:08,292 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3676420.0, ans=0.0 2024-08-18 03:41:33,290 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.00 vs. limit=22.5 2024-08-18 03:41:37,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3676620.0, ans=0.5 2024-08-18 03:41:40,775 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3676620.0, ans=0.1 2024-08-18 03:41:49,140 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=15.0 2024-08-18 03:41:50,348 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-18 03:42:05,020 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 10050, loss[loss=0.1098, beats_loss=0.008599, ecapa_loss=0.0001574, whisper_loss=0.09967, over 22816.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01059, ecapa_loss=0.0001466, whisper_loss=0.08994, over 3907437.06 frames. ], batch size: 93, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:42:06,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3676820.0, ans=0.125 2024-08-18 03:42:21,446 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3676920.0, ans=0.125 2024-08-18 03:42:27,311 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3676920.0, ans=0.125 2024-08-18 03:42:31,485 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.717e+01 2.309e+01 2.495e+01 2.743e+01 6.109e+01, threshold=4.990e+01, percent-clipped=1.0 2024-08-18 03:42:48,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3677120.0, ans=0.125 2024-08-18 03:42:51,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3677120.0, ans=0.1 2024-08-18 03:42:54,823 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-18 03:43:04,674 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.21 vs. limit=15.0 2024-08-18 03:43:08,118 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-18 03:43:08,419 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3677220.0, ans=0.125 2024-08-18 03:43:15,108 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-18 03:43:16,749 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=3677320.0, ans=0.95 2024-08-18 03:43:17,838 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 10100, loss[loss=0.142, beats_loss=0.007729, ecapa_loss=0.000122, whisper_loss=0.1331, over 17373.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01054, ecapa_loss=0.0001475, whisper_loss=0.09046, over 3875727.29 frames. ], batch size: 63, lr: 2.44e-03, grad_scale: 1.152921504606847e+18 2024-08-18 03:43:29,789 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 8 from Vox, 35 fro AS 2024-08-18 03:43:41,509 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 33 from Vox, 30 fro AS 2024-08-18 03:43:41,773 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3677420.0, ans=0.0 2024-08-18 03:43:55,098 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 19 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-18 03:43:59,331 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.02 vs. limit=15.0 2024-08-18 03:44:04,463 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3677620.0, ans=0.125 2024-08-18 03:44:09,551 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.93 vs. limit=15.0 2024-08-18 03:44:13,799 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3677620.0, ans=0.1 2024-08-18 03:44:13,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3677620.0, ans=0.0 2024-08-18 03:44:16,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3677720.0, ans=0.1 2024-08-18 03:44:22,837 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3677720.0, ans=0.0 2024-08-18 03:44:24,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3677720.0, ans=0.1 2024-08-18 03:44:34,272 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 10150, loss[loss=0.09865, beats_loss=0.008872, ecapa_loss=0.0001416, whisper_loss=0.08836, over 18987.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01059, ecapa_loss=0.000147, whisper_loss=0.09018, over 3914151.37 frames. ], batch size: 73, lr: 2.44e-03, grad_scale: 1.152921504606847e+18 2024-08-18 03:44:37,161 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-18 03:44:47,182 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3677920.0, ans=0.0 2024-08-18 03:44:47,268 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3677920.0, ans=0.09899494936611666 2024-08-18 03:44:47,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3677920.0, ans=0.1 2024-08-18 03:44:59,053 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.02 vs. limit=12.0 2024-08-18 03:44:59,668 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.325e+01 2.524e+01 2.849e+01 6.626e+01, threshold=5.048e+01, percent-clipped=1.0 2024-08-18 03:45:02,892 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-18 03:45:08,553 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 35 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-18 03:45:11,802 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 17 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-18 03:45:15,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3678020.0, ans=0.125 2024-08-18 03:45:19,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3678120.0, ans=0.1 2024-08-18 03:45:47,065 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 10200, loss[loss=0.1194, beats_loss=0.008041, ecapa_loss=0.0001403, whisper_loss=0.11, over 17102.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01049, ecapa_loss=0.0001477, whisper_loss=0.09069, over 3888001.36 frames. ], batch size: 65, lr: 2.44e-03, grad_scale: 1.152921504606847e+18 2024-08-18 03:45:50,199 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 16 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-18 03:45:54,573 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 10 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-18 03:45:54,885 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3678320.0, ans=0.0 2024-08-18 03:46:11,129 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3678420.0, ans=0.125 2024-08-18 03:46:25,639 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.41 vs. limit=10.0 2024-08-18 03:46:41,447 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3678620.0, ans=0.125 2024-08-18 03:47:01,132 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 10250, loss[loss=0.104, beats_loss=0.009624, ecapa_loss=0.0001441, whisper_loss=0.09298, over 15846.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01042, ecapa_loss=0.0001487, whisper_loss=0.09067, over 3887218.30 frames. ], batch size: 63, lr: 2.44e-03, grad_scale: 1.152921504606847e+18 2024-08-18 03:47:03,121 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3678820.0, ans=0.025 2024-08-18 03:47:07,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3678820.0, ans=0.2 2024-08-18 03:47:07,392 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.81 vs. limit=15.0 2024-08-18 03:47:27,725 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.753e+01 2.259e+01 2.500e+01 2.782e+01 3.829e+01, threshold=5.000e+01, percent-clipped=0.0 2024-08-18 03:47:28,866 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.06 vs. limit=22.5 2024-08-18 03:47:53,077 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.64 vs. limit=15.0 2024-08-18 03:48:15,378 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 10300, loss[loss=0.1043, beats_loss=0.008914, ecapa_loss=0.0001756, whisper_loss=0.09358, over 19573.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01058, ecapa_loss=0.0001473, whisper_loss=0.09001, over 3915956.52 frames. ], batch size: 79, lr: 2.44e-03, grad_scale: 1.152921504606847e+18 2024-08-18 03:48:26,467 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3679320.0, ans=0.0 2024-08-18 03:48:43,218 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 28 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-18 03:48:50,990 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3679520.0, ans=0.125 2024-08-18 03:48:56,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3679520.0, ans=0.125 2024-08-18 03:49:00,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3679620.0, ans=0.0 2024-08-18 03:49:19,951 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 25 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-18 03:49:27,432 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3679720.0, ans=0.125 2024-08-18 03:49:30,239 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 10350, loss[loss=0.09644, beats_loss=0.01035, ecapa_loss=0.0001141, whisper_loss=0.08495, over 16678.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01058, ecapa_loss=0.0001476, whisper_loss=0.09006, over 3889699.74 frames. ], batch size: 63, lr: 2.44e-03, grad_scale: 1.152921504606847e+18 2024-08-18 03:49:32,715 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.22 vs. limit=15.0 2024-08-18 03:49:33,720 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.66 vs. limit=22.5 2024-08-18 03:49:48,731 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 26 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-18 03:49:50,837 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3679920.0, ans=0.95 2024-08-18 03:49:59,401 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.940e+01 2.361e+01 2.642e+01 2.935e+01 4.206e+01, threshold=5.285e+01, percent-clipped=0.0 2024-08-18 03:50:23,937 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 11 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-18 03:50:33,488 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.93 vs. limit=15.0 2024-08-18 03:50:36,215 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 19 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-18 03:50:39,698 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 26 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-18 03:50:43,308 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.18 vs. limit=12.0 2024-08-18 03:50:47,628 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3680320.0, ans=0.2 2024-08-18 03:50:48,603 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 10400, loss[loss=0.08506, beats_loss=0.01082, ecapa_loss=0.0001629, whisper_loss=0.07261, over 17717.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01046, ecapa_loss=0.000148, whisper_loss=0.08993, over 3870075.61 frames. ], batch size: 73, lr: 2.44e-03, grad_scale: 1.152921504606847e+18 2024-08-18 03:51:01,136 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3680320.0, ans=0.2 2024-08-18 03:51:16,604 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3680520.0, ans=0.0 2024-08-18 03:51:21,329 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-18 03:51:21,543 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3680520.0, ans=0.5 2024-08-18 03:51:38,337 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-18 03:51:40,248 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-18 03:51:57,296 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3680720.0, ans=0.125 2024-08-18 03:51:57,357 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3680720.0, ans=0.2 2024-08-18 03:51:58,224 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-18 03:52:02,460 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 10450, loss[loss=0.08102, beats_loss=0.01331, ecapa_loss=0.0001265, whisper_loss=0.06644, over 20553.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01044, ecapa_loss=0.0001479, whisper_loss=0.09031, over 3863349.85 frames. ], batch size: 86, lr: 2.44e-03, grad_scale: 1.152921504606847e+18 2024-08-18 03:52:13,207 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2024-08-18 03:52:19,785 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 31 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-18 03:52:28,740 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.738e+01 2.400e+01 2.629e+01 2.967e+01 1.519e+02, threshold=5.258e+01, percent-clipped=2.0 2024-08-18 03:52:47,207 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.13 vs. limit=15.0 2024-08-18 03:52:53,356 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3681120.0, ans=0.1 2024-08-18 03:52:55,564 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 18 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-18 03:53:02,809 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3681220.0, ans=0.125 2024-08-18 03:53:12,697 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.17 vs. limit=12.0 2024-08-18 03:53:16,365 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 10500, loss[loss=0.09245, beats_loss=0.01145, ecapa_loss=0.0001289, whisper_loss=0.07972, over 19153.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0104, ecapa_loss=0.0001487, whisper_loss=0.09045, over 3835909.26 frames. ], batch size: 77, lr: 2.44e-03, grad_scale: 1.152921504606847e+18 2024-08-18 03:53:19,415 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-18 03:53:20,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3681320.0, ans=0.2 2024-08-18 03:53:24,225 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.37 vs. limit=15.0 2024-08-18 03:53:24,802 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 16 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-18 03:53:47,742 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-18 03:53:49,103 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-18 03:53:56,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3681520.0, ans=0.025 2024-08-18 03:53:59,463 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3681620.0, ans=0.0 2024-08-18 03:54:13,758 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 36 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-18 03:54:31,624 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 10550, loss[loss=0.08664, beats_loss=0.01204, ecapa_loss=0.0001684, whisper_loss=0.07292, over 21010.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01038, ecapa_loss=0.0001473, whisper_loss=0.09083, over 3839165.72 frames. ], batch size: 89, lr: 2.44e-03, grad_scale: 1.152921504606847e+18 2024-08-18 03:54:43,277 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 22 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-18 03:54:44,928 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3681920.0, ans=0.0 2024-08-18 03:54:48,080 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.93 vs. limit=15.0 2024-08-18 03:54:49,533 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-18 03:54:51,346 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=3681920.0, ans=0.02 2024-08-18 03:54:52,225 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-18 03:54:52,933 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.34 vs. limit=15.0 2024-08-18 03:54:57,721 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.705e+01 2.276e+01 2.579e+01 2.977e+01 5.501e+01, threshold=5.159e+01, percent-clipped=1.0 2024-08-18 03:54:59,179 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-18 03:55:17,882 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 21 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-18 03:55:25,876 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.79 vs. limit=15.0 2024-08-18 03:55:33,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3682220.0, ans=0.125 2024-08-18 03:55:51,063 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 10600, loss[loss=0.08326, beats_loss=0.009901, ecapa_loss=0.0001564, whisper_loss=0.07179, over 17743.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01042, ecapa_loss=0.0001471, whisper_loss=0.09018, over 3833658.26 frames. ], batch size: 69, lr: 2.44e-03, grad_scale: 1.152921504606847e+18 2024-08-18 03:55:54,369 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3682320.0, ans=0.07 2024-08-18 03:56:21,074 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.07 vs. limit=22.5 2024-08-18 03:56:50,660 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3682620.0, ans=0.0 2024-08-18 03:56:51,046 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.05 vs. limit=10.0 2024-08-18 03:57:07,001 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 10650, loss[loss=0.0851, beats_loss=0.01464, ecapa_loss=0.0001368, whisper_loss=0.0691, over 16809.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01049, ecapa_loss=0.0001456, whisper_loss=0.09032, over 3863328.26 frames. ], batch size: 65, lr: 2.44e-03, grad_scale: 1.152921504606847e+18 2024-08-18 03:57:33,977 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.895e+01 2.262e+01 2.551e+01 2.791e+01 4.688e+01, threshold=5.101e+01, percent-clipped=0.0 2024-08-18 03:57:47,263 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3683020.0, ans=0.125 2024-08-18 03:57:48,411 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3683020.0, ans=0.125 2024-08-18 03:57:50,494 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 22 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-18 03:57:50,693 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3683020.0, ans=0.125 2024-08-18 03:57:54,925 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 24 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-18 03:58:09,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3683220.0, ans=0.0 2024-08-18 03:58:13,409 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3683220.0, ans=0.125 2024-08-18 03:58:23,109 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 10700, loss[loss=0.09956, beats_loss=0.009912, ecapa_loss=0.0001186, whisper_loss=0.08846, over 15609.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01053, ecapa_loss=0.0001437, whisper_loss=0.09049, over 3862697.56 frames. ], batch size: 57, lr: 2.44e-03, grad_scale: 1.152921504606847e+18 2024-08-18 03:58:37,333 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-18 03:58:37,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3683420.0, ans=0.1 2024-08-18 03:58:40,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3683420.0, ans=0.04949747468305833 2024-08-18 03:58:46,832 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3683420.0, ans=0.125 2024-08-18 03:58:50,856 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-18 03:58:55,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3683520.0, ans=0.5 2024-08-18 03:59:07,378 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3683620.0, ans=0.1 2024-08-18 03:59:27,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=3683720.0, ans=10.0 2024-08-18 03:59:28,800 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.16 vs. limit=10.0 2024-08-18 03:59:37,204 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.36 vs. limit=10.0 2024-08-18 03:59:38,899 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 10750, loss[loss=0.1074, beats_loss=0.01214, ecapa_loss=0.0001426, whisper_loss=0.09382, over 21720.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01053, ecapa_loss=0.0001443, whisper_loss=0.09057, over 3874635.32 frames. ], batch size: 89, lr: 2.44e-03, grad_scale: 1.152921504606847e+18 2024-08-18 03:59:52,484 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 14 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-18 03:59:52,812 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3683920.0, ans=0.1 2024-08-18 03:59:59,357 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 19 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-18 04:00:01,001 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3683920.0, ans=0.2 2024-08-18 04:00:03,923 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.722e+01 2.308e+01 2.481e+01 2.750e+01 3.346e+01, threshold=4.962e+01, percent-clipped=0.0 2024-08-18 04:00:35,427 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-18 04:00:37,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3684220.0, ans=0.125 2024-08-18 04:00:41,096 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 34 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-18 04:00:41,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3684220.0, ans=0.125 2024-08-18 04:00:46,611 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3684220.0, ans=0.125 2024-08-18 04:00:49,163 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3684220.0, ans=0.2 2024-08-18 04:00:53,195 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 10800, loss[loss=0.07404, beats_loss=0.01254, ecapa_loss=0.0001656, whisper_loss=0.05985, over 16932.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01058, ecapa_loss=0.0001446, whisper_loss=0.09065, over 3880510.52 frames. ], batch size: 70, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:01:00,175 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3684320.0, ans=0.2 2024-08-18 04:01:18,205 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 19 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-18 04:01:20,768 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.33 vs. limit=15.0 2024-08-18 04:01:34,395 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3684520.0, ans=0.125 2024-08-18 04:01:49,631 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 27 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-18 04:01:56,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3684720.0, ans=0.0 2024-08-18 04:02:05,487 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3684820.0, ans=0.125 2024-08-18 04:02:06,249 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 10850, loss[loss=0.1153, beats_loss=0.01177, ecapa_loss=9.197e-05, whisper_loss=0.1026, over 24149.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01059, ecapa_loss=0.0001445, whisper_loss=0.09119, over 3895820.18 frames. ], batch size: 90, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:02:07,995 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3684820.0, ans=0.1 2024-08-18 04:02:13,592 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3684820.0, ans=0.2 2024-08-18 04:02:13,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3684820.0, ans=0.0 2024-08-18 04:02:14,520 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 20 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-18 04:02:17,286 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 19 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-18 04:02:17,700 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2024-08-18 04:02:22,052 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3684920.0, ans=0.2 2024-08-18 04:02:28,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3684920.0, ans=0.125 2024-08-18 04:02:29,580 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-18 04:02:34,359 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.724e+01 2.394e+01 2.623e+01 3.020e+01 4.318e+02, threshold=5.247e+01, percent-clipped=1.0 2024-08-18 04:02:49,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3685120.0, ans=0.0 2024-08-18 04:03:16,769 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 22 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-18 04:03:19,491 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 10900, loss[loss=0.09654, beats_loss=0.01213, ecapa_loss=0.0001501, whisper_loss=0.08291, over 18758.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01056, ecapa_loss=0.0001443, whisper_loss=0.09111, over 3887553.64 frames. ], batch size: 79, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:03:22,823 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.87 vs. limit=15.0 2024-08-18 04:03:50,705 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 34 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-18 04:04:09,821 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=7.046e+00 2024-08-18 04:04:10,794 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 23 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-18 04:04:26,400 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-18 04:04:28,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3685720.0, ans=0.0 2024-08-18 04:04:31,609 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 10950, loss[loss=0.1218, beats_loss=0.00665, ecapa_loss=0.0001491, whisper_loss=0.1136, over 24547.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01051, ecapa_loss=0.0001455, whisper_loss=0.09129, over 3902756.55 frames. ], batch size: 92, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:04:58,726 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.317e+01 2.614e+01 2.946e+01 3.732e+01, threshold=5.227e+01, percent-clipped=0.0 2024-08-18 04:05:10,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3686020.0, ans=0.125 2024-08-18 04:05:19,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3686120.0, ans=0.1 2024-08-18 04:05:29,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3686220.0, ans=0.0 2024-08-18 04:05:43,813 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 11000, loss[loss=0.1225, beats_loss=0.008173, ecapa_loss=0.000135, whisper_loss=0.1129, over 19814.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01045, ecapa_loss=0.0001471, whisper_loss=0.0913, over 3927735.29 frames. ], batch size: 75, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:06:17,918 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 29 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-18 04:06:59,625 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 11050, loss[loss=0.1, beats_loss=0.009497, ecapa_loss=0.0001764, whisper_loss=0.08876, over 16487.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01053, ecapa_loss=0.0001472, whisper_loss=0.09085, over 3942686.64 frames. ], batch size: 69, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:07:01,123 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3686820.0, ans=0.125 2024-08-18 04:07:08,909 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 25 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-18 04:07:13,444 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.99 vs. limit=15.0 2024-08-18 04:07:18,721 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-18 04:07:21,783 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 23 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-18 04:07:25,548 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.992e+01 2.316e+01 2.430e+01 2.735e+01 3.688e+01, threshold=4.860e+01, percent-clipped=0.0 2024-08-18 04:07:26,967 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3687020.0, ans=0.125 2024-08-18 04:07:39,775 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3687120.0, ans=0.125 2024-08-18 04:07:44,991 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 23 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-18 04:07:54,922 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3687220.0, ans=0.1 2024-08-18 04:07:56,292 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3687220.0, ans=0.0 2024-08-18 04:08:03,020 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-18 04:08:07,847 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3687320.0, ans=0.2 2024-08-18 04:08:08,674 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 11100, loss[loss=0.09325, beats_loss=0.01148, ecapa_loss=0.0001527, whisper_loss=0.08024, over 22199.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01056, ecapa_loss=0.0001469, whisper_loss=0.09074, over 3953946.21 frames. ], batch size: 91, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:08:10,581 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3687320.0, ans=0.0 2024-08-18 04:08:37,022 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-18 04:08:44,829 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-18 04:08:59,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3687620.0, ans=0.125 2024-08-18 04:09:06,440 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 26 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-18 04:09:24,552 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 11150, loss[loss=0.1015, beats_loss=0.01109, ecapa_loss=0.0001565, whisper_loss=0.08883, over 21919.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01051, ecapa_loss=0.0001462, whisper_loss=0.09098, over 3927016.97 frames. ], batch size: 90, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:09:32,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3687820.0, ans=0.1 2024-08-18 04:09:43,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3687920.0, ans=0.1 2024-08-18 04:09:53,788 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.407e+01 2.639e+01 3.028e+01 3.278e+02, threshold=5.278e+01, percent-clipped=1.0 2024-08-18 04:09:57,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=3688020.0, ans=0.05 2024-08-18 04:09:57,526 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3688020.0, ans=0.1 2024-08-18 04:10:01,961 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3688020.0, ans=0.1 2024-08-18 04:10:30,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=3688220.0, ans=0.1 2024-08-18 04:10:43,185 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 11200, loss[loss=0.1122, beats_loss=0.01157, ecapa_loss=0.0001759, whisper_loss=0.09885, over 21928.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01038, ecapa_loss=0.0001467, whisper_loss=0.09182, over 3913398.20 frames. ], batch size: 87, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:10:45,534 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3688320.0, ans=0.125 2024-08-18 04:10:46,937 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 22 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-18 04:10:47,256 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3688320.0, ans=0.0 2024-08-18 04:11:08,250 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 14 from Vox, 46 fro AS 2024-08-18 04:11:11,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3688420.0, ans=0.0 2024-08-18 04:11:41,412 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3688620.0, ans=0.125 2024-08-18 04:11:59,202 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 11250, loss[loss=0.1097, beats_loss=0.01066, ecapa_loss=0.000124, whisper_loss=0.09778, over 20792.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0104, ecapa_loss=0.0001459, whisper_loss=0.09138, over 3898572.85 frames. ], batch size: 79, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:12:09,403 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3688820.0, ans=0.125 2024-08-18 04:12:09,405 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3688820.0, ans=0.1 2024-08-18 04:12:23,976 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 27 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-18 04:12:27,970 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.791e+01 2.315e+01 2.575e+01 2.987e+01 7.220e+01, threshold=5.150e+01, percent-clipped=1.0 2024-08-18 04:12:28,099 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-18 04:12:32,538 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 27 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-18 04:12:47,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3689120.0, ans=0.0 2024-08-18 04:12:57,096 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 19 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-18 04:13:07,398 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.16 vs. limit=15.0 2024-08-18 04:13:14,371 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 11300, loss[loss=0.09245, beats_loss=0.01191, ecapa_loss=0.0001772, whisper_loss=0.07877, over 19723.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01042, ecapa_loss=0.0001461, whisper_loss=0.09007, over 3899150.54 frames. ], batch size: 83, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:13:14,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3689320.0, ans=0.2 2024-08-18 04:13:15,791 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 28 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-18 04:13:37,650 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-18 04:13:39,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3689420.0, ans=0.1 2024-08-18 04:13:42,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3689420.0, ans=0.0 2024-08-18 04:13:46,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3689520.0, ans=0.125 2024-08-18 04:14:06,804 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.23 vs. limit=15.0 2024-08-18 04:14:07,349 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 28 from LS+wenet, 13 from Vox, 19 fro AS 2024-08-18 04:14:28,692 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 23 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-18 04:14:33,651 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 11350, loss[loss=0.09416, beats_loss=0.009983, ecapa_loss=0.0001734, whisper_loss=0.08244, over 16560.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01043, ecapa_loss=0.0001469, whisper_loss=0.09008, over 3925919.19 frames. ], batch size: 70, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:14:34,142 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3689820.0, ans=0.125 2024-08-18 04:14:40,329 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 30 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-18 04:14:51,923 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3689920.0, ans=0.0 2024-08-18 04:15:03,418 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.87 vs. limit=6.0 2024-08-18 04:15:03,615 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+01 2.324e+01 2.496e+01 2.827e+01 4.121e+01, threshold=4.992e+01, percent-clipped=0.0 2024-08-18 04:15:14,691 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3690020.0, ans=0.125 2024-08-18 04:15:15,666 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-18 04:15:21,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3690120.0, ans=0.025 2024-08-18 04:15:23,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3690120.0, ans=0.0 2024-08-18 04:15:49,493 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 11400, loss[loss=0.1007, beats_loss=0.01112, ecapa_loss=0.0001257, whisper_loss=0.08832, over 23396.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01044, ecapa_loss=0.0001469, whisper_loss=0.09087, over 3939917.86 frames. ], batch size: 92, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:16:09,590 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 22 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-18 04:16:12,689 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2024-08-18 04:16:15,200 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3690420.0, ans=0.1 2024-08-18 04:16:30,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3690520.0, ans=0.125 2024-08-18 04:16:30,599 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3690520.0, ans=0.1 2024-08-18 04:16:49,062 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.41 vs. limit=22.5 2024-08-18 04:16:53,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3690720.0, ans=0.0 2024-08-18 04:17:02,100 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 20 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-18 04:17:02,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3690720.0, ans=0.2 2024-08-18 04:17:07,987 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 11450, loss[loss=0.08963, beats_loss=0.01089, ecapa_loss=0.0001583, whisper_loss=0.07716, over 14944.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0105, ecapa_loss=0.0001467, whisper_loss=0.0904, over 3931287.84 frames. ], batch size: 63, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:17:15,306 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.13 vs. limit=10.0 2024-08-18 04:17:20,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=3690820.0, ans=15.0 2024-08-18 04:17:37,488 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.265e+01 2.441e+01 2.752e+01 3.778e+01, threshold=4.882e+01, percent-clipped=0.0 2024-08-18 04:17:39,964 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3691020.0, ans=0.0 2024-08-18 04:17:43,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3691020.0, ans=0.0 2024-08-18 04:17:53,855 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3691120.0, ans=0.125 2024-08-18 04:17:54,914 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3691120.0, ans=0.125 2024-08-18 04:18:22,640 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 04:18:26,249 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 11500, loss[loss=0.1157, beats_loss=0.008445, ecapa_loss=0.0001313, whisper_loss=0.1059, over 17130.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01054, ecapa_loss=0.0001457, whisper_loss=0.09008, over 3949829.33 frames. ], batch size: 64, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:18:26,732 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3691320.0, ans=0.0 2024-08-18 04:18:30,218 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3691320.0, ans=0.125 2024-08-18 04:18:54,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3691420.0, ans=0.1 2024-08-18 04:18:55,773 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3691420.0, ans=0.125 2024-08-18 04:19:03,496 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 39 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-18 04:19:03,741 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3691520.0, ans=0.025 2024-08-18 04:19:21,074 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3691620.0, ans=0.1 2024-08-18 04:19:24,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3691620.0, ans=0.09899494936611666 2024-08-18 04:19:42,747 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 11550, loss[loss=0.1046, beats_loss=0.01182, ecapa_loss=0.0001278, whisper_loss=0.09148, over 22907.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01048, ecapa_loss=0.000146, whisper_loss=0.09057, over 3931807.88 frames. ], batch size: 89, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:19:55,964 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3691820.0, ans=0.0 2024-08-18 04:20:00,067 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 35 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-18 04:20:10,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3691920.0, ans=0.2 2024-08-18 04:20:12,647 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.438e+01 2.681e+01 2.984e+01 2.148e+02, threshold=5.363e+01, percent-clipped=1.0 2024-08-18 04:20:13,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3692020.0, ans=0.125 2024-08-18 04:20:19,136 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 24 from LS+wenet, 22 from Vox, 48 fro AS 2024-08-18 04:20:24,157 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3692020.0, ans=0.125 2024-08-18 04:20:35,689 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3692120.0, ans=0.0 2024-08-18 04:20:38,458 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3692120.0, ans=0.125 2024-08-18 04:20:38,550 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3692120.0, ans=0.0 2024-08-18 04:20:39,383 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-18 04:20:42,453 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3692220.0, ans=0.0 2024-08-18 04:20:56,048 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 11600, loss[loss=0.1058, beats_loss=0.01048, ecapa_loss=0.000135, whisper_loss=0.09397, over 17352.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0104, ecapa_loss=0.0001469, whisper_loss=0.091, over 3955962.74 frames. ], batch size: 67, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:20:56,901 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.96 vs. limit=15.0 2024-08-18 04:21:03,095 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-18 04:21:06,397 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3692320.0, ans=0.1 2024-08-18 04:21:11,177 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-18 04:21:24,565 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3692520.0, ans=0.0 2024-08-18 04:21:28,630 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 23 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-18 04:21:51,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3692620.0, ans=0.2 2024-08-18 04:21:54,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3692720.0, ans=0.125 2024-08-18 04:22:09,680 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 11650, loss[loss=0.09787, beats_loss=0.01262, ecapa_loss=0.000134, whisper_loss=0.0839, over 21796.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01036, ecapa_loss=0.0001474, whisper_loss=0.09118, over 3957147.84 frames. ], batch size: 88, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:22:17,934 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-18 04:22:18,502 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.00 vs. limit=22.5 2024-08-18 04:22:35,996 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 21 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-18 04:22:36,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3692920.0, ans=0.0 2024-08-18 04:22:37,300 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+01 2.291e+01 2.477e+01 2.731e+01 1.047e+02, threshold=4.954e+01, percent-clipped=2.0 2024-08-18 04:22:38,791 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-18 04:22:44,615 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.14 vs. limit=15.0 2024-08-18 04:22:49,767 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3693020.0, ans=0.125 2024-08-18 04:23:08,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3693220.0, ans=0.1 2024-08-18 04:23:17,719 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3693220.0, ans=0.2 2024-08-18 04:23:20,468 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3693320.0, ans=0.125 2024-08-18 04:23:20,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3693320.0, ans=0.125 2024-08-18 04:23:21,296 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 11700, loss[loss=0.08662, beats_loss=0.01075, ecapa_loss=0.0001469, whisper_loss=0.0744, over 15042.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01045, ecapa_loss=0.0001461, whisper_loss=0.09091, over 3931060.39 frames. ], batch size: 62, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:23:21,938 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3693320.0, ans=0.0 2024-08-18 04:23:29,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3693320.0, ans=0.0 2024-08-18 04:23:30,756 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 35 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-18 04:23:49,685 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 37 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-18 04:23:52,300 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 28 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-18 04:24:14,887 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3693620.0, ans=0.0 2024-08-18 04:24:33,953 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 11750, loss[loss=0.1269, beats_loss=0.01033, ecapa_loss=0.0001439, whisper_loss=0.1151, over 22545.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01053, ecapa_loss=0.0001465, whisper_loss=0.09057, over 3925779.16 frames. ], batch size: 88, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:24:35,652 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3693820.0, ans=0.0 2024-08-18 04:24:35,667 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3693820.0, ans=0.0 2024-08-18 04:24:52,692 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 33 from LS+wenet, 29 from Vox, 25 fro AS 2024-08-18 04:24:57,572 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-18 04:25:00,288 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3693920.0, ans=0.1 2024-08-18 04:25:01,109 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.733e+01 2.257e+01 2.580e+01 2.883e+01 7.198e+01, threshold=5.159e+01, percent-clipped=2.0 2024-08-18 04:25:05,481 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3694020.0, ans=0.125 2024-08-18 04:25:13,050 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3694020.0, ans=0.2 2024-08-18 04:25:15,375 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 23 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-18 04:25:19,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3694120.0, ans=0.0 2024-08-18 04:25:40,757 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3694220.0, ans=0.1 2024-08-18 04:25:46,218 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3694320.0, ans=0.125 2024-08-18 04:25:47,026 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 11800, loss[loss=0.1264, beats_loss=0.007641, ecapa_loss=0.0001075, whisper_loss=0.1177, over 17662.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01052, ecapa_loss=0.0001464, whisper_loss=0.09086, over 3917526.70 frames. ], batch size: 63, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:26:00,331 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 32 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-18 04:26:08,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3694420.0, ans=0.125 2024-08-18 04:26:24,251 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3694520.0, ans=0.125 2024-08-18 04:26:38,687 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 22 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-18 04:26:42,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3694720.0, ans=0.2 2024-08-18 04:26:45,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3694720.0, ans=0.0 2024-08-18 04:26:54,445 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 11850, loss[loss=0.1154, beats_loss=0.00954, ecapa_loss=0.0001592, whisper_loss=0.1042, over 21805.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01052, ecapa_loss=0.0001469, whisper_loss=0.09077, over 3890737.12 frames. ], batch size: 88, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:27:01,672 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-18 04:27:18,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=3694920.0, ans=0.1 2024-08-18 04:27:19,375 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.719e+01 2.338e+01 2.535e+01 2.911e+01 4.657e+01, threshold=5.070e+01, percent-clipped=0.0 2024-08-18 04:27:24,101 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3695020.0, ans=0.125 2024-08-18 04:27:30,435 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3695020.0, ans=0.2 2024-08-18 04:27:43,117 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 23 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-18 04:27:45,027 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3695120.0, ans=0.025 2024-08-18 04:27:47,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3695220.0, ans=0.125 2024-08-18 04:27:51,246 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-18 04:28:01,789 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 11900, loss[loss=0.09127, beats_loss=0.01121, ecapa_loss=0.0001124, whisper_loss=0.07894, over 16473.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01056, ecapa_loss=0.0001469, whisper_loss=0.09082, over 3879950.53 frames. ], batch size: 63, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:28:06,645 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-18 04:28:10,597 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3695320.0, ans=0.025 2024-08-18 04:28:17,156 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-18 04:28:21,273 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3695420.0, ans=0.125 2024-08-18 04:28:27,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3695520.0, ans=0.125 2024-08-18 04:28:29,101 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3695520.0, ans=0.125 2024-08-18 04:28:30,214 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3695520.0, ans=0.125 2024-08-18 04:28:32,846 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3695520.0, ans=0.1 2024-08-18 04:28:48,138 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 23 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-18 04:28:48,356 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3695620.0, ans=0.125 2024-08-18 04:28:53,347 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 14 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-18 04:29:07,446 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 11950, loss[loss=0.09123, beats_loss=0.01134, ecapa_loss=0.0001297, whisper_loss=0.07859, over 16958.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01052, ecapa_loss=0.0001471, whisper_loss=0.09126, over 3882193.94 frames. ], batch size: 66, lr: 2.43e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:29:10,196 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3695820.0, ans=0.1 2024-08-18 04:29:17,827 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3695820.0, ans=0.125 2024-08-18 04:29:18,839 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-18 04:29:31,521 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.580e+01 2.286e+01 2.533e+01 2.895e+01 4.370e+02, threshold=5.067e+01, percent-clipped=3.0 2024-08-18 04:29:31,732 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 19 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-18 04:29:38,397 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3696020.0, ans=0.1 2024-08-18 04:29:39,206 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-18 04:30:10,302 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-18 04:30:12,567 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 12000, loss[loss=0.09516, beats_loss=0.009846, ecapa_loss=0.000169, whisper_loss=0.08362, over 21775.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01054, ecapa_loss=0.0001456, whisper_loss=0.09074, over 3876462.86 frames. ], batch size: 93, lr: 2.43e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:30:12,568 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-18 04:30:49,986 INFO [train_multi_KD3.py:1149] (2/4) Epoch 25, validation on ASR_libri: loss=0.2542, beats_loss=0, ecapa_loss=0.0005296, whisper_loss=0.2489, over 922467.00 frames. 2024-08-18 04:31:05,476 INFO [train_multi_KD3.py:1149] (2/4) Epoch 25, validation on SV_voxceleb1: loss=0.004038, beats_loss=0, ecapa_loss=0.0004038, whisper_loss=0, over 939242.00 frames. 2024-08-18 04:32:43,948 INFO [train_multi_KD3.py:1149] (2/4) Epoch 25, validation on AT_audioset: loss=0.02323, beats_loss=0.02323, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 04:32:43,952 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31584MB 2024-08-18 04:33:05,256 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.09 vs. limit=10.0 2024-08-18 04:33:17,162 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.23 vs. limit=22.5 2024-08-18 04:33:23,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3696520.0, ans=0.0 2024-08-18 04:33:32,954 WARNING [optim.py:496] (2/4) Scaling gradients by 0.09565281867980957, model_norm_threshold=50.66883087158203 2024-08-18 04:33:33,127 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.self_attn_weights.linear_pos.weight with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.699e+04, grad_sumsq=4.144e+03, orig_rms_sq=8.927e+00 2024-08-18 04:33:35,072 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3696620.0, ans=0.07 2024-08-18 04:33:36,942 WARNING [optim.py:496] (2/4) Scaling gradients by 0.07986252754926682, model_norm_threshold=50.66883087158203 2024-08-18 04:33:37,113 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.12, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.743e+04, grad_sumsq=4.635e+06, orig_rms_sq=1.023e-02 2024-08-18 04:33:41,795 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3696720.0, ans=0.0 2024-08-18 04:33:43,272 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3696720.0, ans=0.025 2024-08-18 04:33:47,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3696720.0, ans=0.125 2024-08-18 04:33:52,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3696820.0, ans=0.2 2024-08-18 04:33:53,759 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 12050, loss[loss=0.094, beats_loss=0.01027, ecapa_loss=0.0001245, whisper_loss=0.08248, over 22847.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01056, ecapa_loss=0.0001447, whisper_loss=0.0896, over 3849135.21 frames. ], batch size: 91, lr: 2.43e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:33:57,199 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3696820.0, ans=0.125 2024-08-18 04:34:02,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3696820.0, ans=0.2 2024-08-18 04:34:06,429 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 37 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-18 04:34:20,203 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.681e+01 2.233e+01 2.436e+01 2.807e+01 6.345e+02, threshold=4.872e+01, percent-clipped=3.0 2024-08-18 04:34:21,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3697020.0, ans=0.015 2024-08-18 04:34:23,626 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.96 vs. limit=10.0 2024-08-18 04:34:41,364 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3697120.0, ans=0.0 2024-08-18 04:34:50,000 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3697220.0, ans=0.125 2024-08-18 04:34:53,002 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3697220.0, ans=0.125 2024-08-18 04:34:54,684 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3697220.0, ans=0.125 2024-08-18 04:35:02,937 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 12100, loss[loss=0.1146, beats_loss=0.008946, ecapa_loss=0.0001526, whisper_loss=0.1042, over 17527.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01055, ecapa_loss=0.0001449, whisper_loss=0.08986, over 3840756.41 frames. ], batch size: 68, lr: 2.43e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:35:14,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3697420.0, ans=0.125 2024-08-18 04:35:38,360 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3697520.0, ans=0.125 2024-08-18 04:35:44,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3697620.0, ans=0.125 2024-08-18 04:36:03,774 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 13 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-18 04:36:08,676 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 12150, loss[loss=0.09665, beats_loss=0.01044, ecapa_loss=0.0001508, whisper_loss=0.0847, over 19442.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01059, ecapa_loss=0.0001458, whisper_loss=0.08906, over 3836204.64 frames. ], batch size: 79, lr: 2.43e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:36:12,995 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3697820.0, ans=0.2 2024-08-18 04:36:15,330 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3697820.0, ans=0.04949747468305833 2024-08-18 04:36:24,258 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3697920.0, ans=0.2 2024-08-18 04:36:25,165 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-18 04:36:32,327 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.238e+01 2.438e+01 2.753e+01 4.276e+01, threshold=4.877e+01, percent-clipped=0.0 2024-08-18 04:36:33,744 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 30 from LS+wenet, 32 from Vox, 33 fro AS 2024-08-18 04:36:42,855 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3698020.0, ans=0.1 2024-08-18 04:37:07,865 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 25 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-18 04:37:12,836 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 12200, loss[loss=0.1026, beats_loss=0.009399, ecapa_loss=0.0001483, whisper_loss=0.09176, over 15476.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01061, ecapa_loss=0.0001452, whisper_loss=0.08929, over 3867676.93 frames. ], batch size: 60, lr: 2.43e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:37:29,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3698420.0, ans=0.125 2024-08-18 04:37:37,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3698520.0, ans=0.95 2024-08-18 04:37:43,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3698520.0, ans=0.0 2024-08-18 04:37:48,609 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3698520.0, ans=0.05 2024-08-18 04:37:53,415 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3698620.0, ans=0.1 2024-08-18 04:38:00,185 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.63 vs. limit=15.0 2024-08-18 04:38:15,846 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 12250, loss[loss=0.1061, beats_loss=0.01193, ecapa_loss=0.0001126, whisper_loss=0.09306, over 16950.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01056, ecapa_loss=0.0001454, whisper_loss=0.08974, over 3853999.57 frames. ], batch size: 66, lr: 2.43e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:38:17,498 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-18 04:38:28,159 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.96 vs. limit=22.5 2024-08-18 04:38:40,010 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.471e+01 2.671e+01 3.119e+01 8.520e+01, threshold=5.341e+01, percent-clipped=2.0 2024-08-18 04:39:13,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3699220.0, ans=0.125 2024-08-18 04:39:17,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3699220.0, ans=0.0 2024-08-18 04:39:19,482 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 12300, loss[loss=0.09554, beats_loss=0.01012, ecapa_loss=0.0001654, whisper_loss=0.08376, over 21048.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01057, ecapa_loss=0.000146, whisper_loss=0.08936, over 3870177.75 frames. ], batch size: 87, lr: 2.43e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:39:23,702 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3699320.0, ans=0.125 2024-08-18 04:39:27,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3699320.0, ans=0.0 2024-08-18 04:39:28,335 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3699320.0, ans=0.2 2024-08-18 04:39:36,821 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 30 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-18 04:39:41,760 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 19 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-18 04:39:48,842 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 16 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-18 04:39:54,759 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3699520.0, ans=0.0 2024-08-18 04:39:57,212 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3699620.0, ans=0.0 2024-08-18 04:39:59,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3699620.0, ans=0.1 2024-08-18 04:40:21,656 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 12350, loss[loss=0.107, beats_loss=0.01009, ecapa_loss=0.0001288, whisper_loss=0.09562, over 21379.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01049, ecapa_loss=0.0001466, whisper_loss=0.08975, over 3909159.98 frames. ], batch size: 83, lr: 2.43e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:40:39,846 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3699920.0, ans=0.125 2024-08-18 04:40:45,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3699920.0, ans=0.0 2024-08-18 04:40:45,934 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.044e+01 2.495e+01 2.683e+01 3.048e+01 4.110e+01, threshold=5.366e+01, percent-clipped=0.0 2024-08-18 04:40:46,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3700020.0, ans=0.125 2024-08-18 04:40:54,077 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-18 04:40:57,072 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3700020.0, ans=0.04949747468305833 2024-08-18 04:40:59,091 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 27 from Vox, 24 fro AS 2024-08-18 04:41:23,818 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-18 04:41:24,878 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 12400, loss[loss=0.1089, beats_loss=0.008837, ecapa_loss=0.0001853, whisper_loss=0.09822, over 21108.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01052, ecapa_loss=0.000147, whisper_loss=0.08985, over 3901916.81 frames. ], batch size: 87, lr: 2.43e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:41:40,062 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-18 04:41:41,405 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3700420.0, ans=0.1 2024-08-18 04:41:45,521 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.42 vs. limit=15.0 2024-08-18 04:41:47,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3700420.0, ans=0.5 2024-08-18 04:41:49,782 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 14 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-18 04:41:55,917 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 24 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-18 04:42:11,149 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3700620.0, ans=0.1 2024-08-18 04:42:26,438 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 12450, loss[loss=0.08622, beats_loss=0.01013, ecapa_loss=0.0001728, whisper_loss=0.07436, over 21562.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0105, ecapa_loss=0.0001462, whisper_loss=0.08958, over 3875451.52 frames. ], batch size: 89, lr: 2.43e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:42:35,711 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-18 04:42:45,631 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3700920.0, ans=0.125 2024-08-18 04:42:46,504 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 24 from LS+wenet, 30 from Vox, 26 fro AS 2024-08-18 04:42:49,744 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.51 vs. limit=22.5 2024-08-18 04:42:50,282 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.277e+01 2.499e+01 2.886e+01 6.809e+01, threshold=4.997e+01, percent-clipped=1.0 2024-08-18 04:42:54,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3701020.0, ans=0.0 2024-08-18 04:43:10,307 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3701120.0, ans=0.1 2024-08-18 04:43:22,135 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-18 04:43:27,181 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-18 04:43:28,287 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 12500, loss[loss=0.1001, beats_loss=0.0105, ecapa_loss=0.0001369, whisper_loss=0.08822, over 14332.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01051, ecapa_loss=0.0001461, whisper_loss=0.08925, over 3891506.77 frames. ], batch size: 58, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:43:30,192 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.80 vs. limit=12.0 2024-08-18 04:43:31,378 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3701320.0, ans=0.125 2024-08-18 04:43:50,697 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 16 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-18 04:43:50,929 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3701420.0, ans=0.125 2024-08-18 04:44:05,887 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3701620.0, ans=0.0 2024-08-18 04:44:14,537 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-18 04:44:15,678 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-18 04:44:27,685 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.63 vs. limit=22.5 2024-08-18 04:44:30,805 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 12550, loss[loss=0.108, beats_loss=0.01137, ecapa_loss=0.0001276, whisper_loss=0.09534, over 23736.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01054, ecapa_loss=0.000145, whisper_loss=0.08955, over 3891413.77 frames. ], batch size: 92, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:44:43,458 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-18 04:44:52,256 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-18 04:44:55,035 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3702020.0, ans=0.125 2024-08-18 04:44:55,886 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.343e+01 2.607e+01 2.934e+01 3.730e+01, threshold=5.215e+01, percent-clipped=0.0 2024-08-18 04:45:04,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3702020.0, ans=0.125 2024-08-18 04:45:06,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3702020.0, ans=0.0 2024-08-18 04:45:11,223 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-18 04:45:11,643 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3702120.0, ans=0.1 2024-08-18 04:45:12,776 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3702120.0, ans=0.125 2024-08-18 04:45:19,155 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.60 vs. limit=10.0 2024-08-18 04:45:28,354 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3702220.0, ans=0.125 2024-08-18 04:45:33,249 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 12600, loss[loss=0.08994, beats_loss=0.01194, ecapa_loss=0.0001052, whisper_loss=0.07696, over 17296.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01061, ecapa_loss=0.0001448, whisper_loss=0.08988, over 3900382.56 frames. ], batch size: 66, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:45:46,567 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3702420.0, ans=0.0 2024-08-18 04:45:47,853 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3702420.0, ans=0.09899494936611666 2024-08-18 04:45:50,306 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3702420.0, ans=0.125 2024-08-18 04:46:00,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3702520.0, ans=0.125 2024-08-18 04:46:09,894 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-18 04:46:10,545 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.45 vs. limit=12.0 2024-08-18 04:46:33,685 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-18 04:46:36,123 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 12650, loss[loss=0.1146, beats_loss=0.01114, ecapa_loss=0.0001197, whisper_loss=0.1022, over 19252.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01062, ecapa_loss=0.0001447, whisper_loss=0.09037, over 3893859.23 frames. ], batch size: 74, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:47:01,282 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.719e+01 2.352e+01 2.595e+01 2.962e+01 3.883e+01, threshold=5.190e+01, percent-clipped=0.0 2024-08-18 04:47:05,114 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-18 04:47:18,641 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-18 04:47:23,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3703120.0, ans=0.0 2024-08-18 04:47:30,186 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3703220.0, ans=0.1 2024-08-18 04:47:38,495 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 12700, loss[loss=0.09266, beats_loss=0.01323, ecapa_loss=0.0001322, whisper_loss=0.07811, over 22053.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01059, ecapa_loss=0.0001456, whisper_loss=0.09035, over 3856121.77 frames. ], batch size: 93, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:47:38,605 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-18 04:47:41,879 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.62 vs. limit=10.0 2024-08-18 04:47:45,302 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.01 vs. limit=12.0 2024-08-18 04:48:01,292 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3703420.0, ans=0.0 2024-08-18 04:48:10,979 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3703520.0, ans=0.0 2024-08-18 04:48:18,911 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.15 vs. limit=12.0 2024-08-18 04:48:25,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3703620.0, ans=0.0 2024-08-18 04:48:40,215 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 12750, loss[loss=0.1158, beats_loss=0.008553, ecapa_loss=0.0001646, whisper_loss=0.1056, over 22421.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01063, ecapa_loss=0.0001462, whisper_loss=0.0906, over 3858893.71 frames. ], batch size: 93, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:48:54,169 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3703920.0, ans=0.0 2024-08-18 04:49:01,891 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.55 vs. limit=22.5 2024-08-18 04:49:05,082 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.358e+01 2.568e+01 2.897e+01 5.259e+01, threshold=5.137e+01, percent-clipped=1.0 2024-08-18 04:49:10,547 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 29 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-18 04:49:14,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3704020.0, ans=0.09899494936611666 2024-08-18 04:49:24,501 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3704120.0, ans=0.1 2024-08-18 04:49:26,683 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-18 04:49:37,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3704220.0, ans=6.0 2024-08-18 04:49:37,979 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3704220.0, ans=0.0 2024-08-18 04:49:42,422 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 12800, loss[loss=0.1007, beats_loss=0.01165, ecapa_loss=0.000124, whisper_loss=0.08784, over 23415.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01073, ecapa_loss=0.0001463, whisper_loss=0.08987, over 3862667.33 frames. ], batch size: 93, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:49:42,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3704320.0, ans=0.125 2024-08-18 04:49:50,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3704320.0, ans=0.1 2024-08-18 04:49:52,149 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.48 vs. limit=22.5 2024-08-18 04:49:57,464 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.86 vs. limit=5.0 2024-08-18 04:50:07,504 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2024-08-18 04:50:15,310 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 24 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-18 04:50:16,856 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=6.190e-01 2024-08-18 04:50:22,846 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 29 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-18 04:50:35,554 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 16 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-18 04:50:44,860 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3704820.0, ans=0.125 2024-08-18 04:50:45,706 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 12850, loss[loss=0.1179, beats_loss=0.009661, ecapa_loss=0.0001363, whisper_loss=0.1069, over 21469.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01065, ecapa_loss=0.0001463, whisper_loss=0.09121, over 3892139.30 frames. ], batch size: 79, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:50:52,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3704820.0, ans=0.0 2024-08-18 04:51:10,385 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.841e+01 2.342e+01 2.570e+01 2.903e+01 1.113e+02, threshold=5.140e+01, percent-clipped=1.0 2024-08-18 04:51:11,996 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3705020.0, ans=0.0 2024-08-18 04:51:12,114 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3705020.0, ans=0.125 2024-08-18 04:51:18,248 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3705020.0, ans=0.125 2024-08-18 04:51:31,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3705120.0, ans=0.1 2024-08-18 04:51:38,027 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-18 04:51:43,450 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3705220.0, ans=0.0 2024-08-18 04:51:45,899 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-18 04:51:48,020 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 12900, loss[loss=0.07892, beats_loss=0.01307, ecapa_loss=0.0001235, whisper_loss=0.06461, over 21864.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01066, ecapa_loss=0.0001464, whisper_loss=0.09014, over 3849254.40 frames. ], batch size: 92, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:51:57,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3705320.0, ans=0.125 2024-08-18 04:52:05,799 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3705420.0, ans=0.2 2024-08-18 04:52:26,337 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-18 04:52:36,360 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 22 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-18 04:52:45,147 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 20 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-18 04:52:49,892 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 12950, loss[loss=0.09946, beats_loss=0.01109, ecapa_loss=0.000142, whisper_loss=0.08695, over 23117.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01064, ecapa_loss=0.0001464, whisper_loss=0.08941, over 3846833.76 frames. ], batch size: 93, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:52:50,099 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 27 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-18 04:52:57,135 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.14 vs. limit=15.0 2024-08-18 04:52:59,069 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 15 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-18 04:53:05,575 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.37 vs. limit=15.0 2024-08-18 04:53:10,327 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3705920.0, ans=0.125 2024-08-18 04:53:15,035 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.225e+01 2.385e+01 2.707e+01 3.126e+02, threshold=4.771e+01, percent-clipped=1.0 2024-08-18 04:53:24,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3706020.0, ans=0.0 2024-08-18 04:53:27,780 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 15 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-18 04:53:30,213 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 19 from LS+wenet, 30 from Vox, 44 fro AS 2024-08-18 04:53:34,157 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3706120.0, ans=0.125 2024-08-18 04:53:35,258 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-18 04:53:52,849 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 13000, loss[loss=0.118, beats_loss=0.00871, ecapa_loss=0.0001578, whisper_loss=0.1077, over 18121.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01062, ecapa_loss=0.0001462, whisper_loss=0.08971, over 3866515.47 frames. ], batch size: 73, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:53:56,322 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.48 vs. limit=22.5 2024-08-18 04:54:11,246 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.30 vs. limit=10.0 2024-08-18 04:54:34,783 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3706620.0, ans=0.95 2024-08-18 04:54:39,956 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3706620.0, ans=0.0 2024-08-18 04:54:42,545 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.20 vs. limit=22.5 2024-08-18 04:54:51,095 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.00 vs. limit=15.0 2024-08-18 04:54:53,031 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-18 04:54:54,717 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3706820.0, ans=0.0 2024-08-18 04:54:55,456 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 13050, loss[loss=0.1201, beats_loss=0.009404, ecapa_loss=0.0001391, whisper_loss=0.1093, over 23798.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01058, ecapa_loss=0.0001459, whisper_loss=0.08972, over 3861990.69 frames. ], batch size: 92, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:55:15,976 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3706920.0, ans=0.125 2024-08-18 04:55:17,148 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3706920.0, ans=0.2 2024-08-18 04:55:20,642 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.374e+01 2.590e+01 2.851e+01 4.425e+02, threshold=5.179e+01, percent-clipped=1.0 2024-08-18 04:55:24,496 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 35 from Vox, 34 fro AS 2024-08-18 04:55:29,340 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-18 04:55:29,751 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3707020.0, ans=15.0 2024-08-18 04:55:34,501 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3707120.0, ans=0.125 2024-08-18 04:55:36,843 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3707120.0, ans=0.0 2024-08-18 04:55:38,096 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3707120.0, ans=0.125 2024-08-18 04:55:41,573 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 29 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-18 04:55:41,838 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3707120.0, ans=0.125 2024-08-18 04:55:45,723 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.10 vs. limit=15.0 2024-08-18 04:55:47,156 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.85 vs. limit=15.0 2024-08-18 04:55:54,367 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3707220.0, ans=0.0 2024-08-18 04:55:57,611 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 13100, loss[loss=0.07035, beats_loss=0.01275, ecapa_loss=0.0001223, whisper_loss=0.05637, over 17597.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01059, ecapa_loss=0.0001458, whisper_loss=0.08885, over 3843992.78 frames. ], batch size: 73, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:56:01,685 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3707320.0, ans=0.0 2024-08-18 04:56:08,129 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3707320.0, ans=0.125 2024-08-18 04:56:29,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3707520.0, ans=0.125 2024-08-18 04:56:31,802 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 04:56:31,942 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=9.064e+00 2024-08-18 04:56:34,275 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3707620.0, ans=0.1 2024-08-18 04:56:36,378 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 24 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-18 04:56:41,857 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3707620.0, ans=0.1 2024-08-18 04:56:59,808 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 13150, loss[loss=0.106, beats_loss=0.01248, ecapa_loss=0.0001582, whisper_loss=0.09198, over 18749.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01062, ecapa_loss=0.0001455, whisper_loss=0.089, over 3832231.57 frames. ], batch size: 75, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:56:59,967 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 15 from LS+wenet, 24 from Vox, 18 fro AS 2024-08-18 04:57:03,780 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-18 04:57:25,550 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.778e+01 2.230e+01 2.439e+01 2.719e+01 6.493e+01, threshold=4.878e+01, percent-clipped=1.0 2024-08-18 04:57:31,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3708020.0, ans=0.1 2024-08-18 04:57:33,898 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=3708020.0, ans=10.0 2024-08-18 04:57:48,296 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 21 from LS+wenet, 11 from Vox, 44 fro AS 2024-08-18 04:57:50,956 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3708220.0, ans=0.125 2024-08-18 04:57:53,424 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3708220.0, ans=0.2 2024-08-18 04:57:55,847 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3708220.0, ans=0.125 2024-08-18 04:58:02,838 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 13200, loss[loss=0.1032, beats_loss=0.0113, ecapa_loss=0.0001434, whisper_loss=0.0905, over 22427.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01062, ecapa_loss=0.000145, whisper_loss=0.08911, over 3851230.55 frames. ], batch size: 91, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:58:06,650 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-18 04:58:09,237 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3708320.0, ans=0.125 2024-08-18 04:58:17,965 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3708420.0, ans=0.2 2024-08-18 04:58:24,339 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 28 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-18 04:58:28,136 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 22 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-18 04:58:30,819 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3708520.0, ans=0.125 2024-08-18 04:58:40,872 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2024-08-18 04:58:55,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3708720.0, ans=0.125 2024-08-18 04:58:57,778 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-18 04:58:58,856 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-18 04:59:01,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3708720.0, ans=0.125 2024-08-18 04:59:05,273 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 13250, loss[loss=0.1023, beats_loss=0.008813, ecapa_loss=0.0001455, whisper_loss=0.09207, over 18138.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01063, ecapa_loss=0.0001454, whisper_loss=0.08878, over 3842083.51 frames. ], batch size: 72, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:59:05,732 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3708820.0, ans=0.125 2024-08-18 04:59:30,036 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.594e+01 2.263e+01 2.511e+01 2.835e+01 4.406e+01, threshold=5.022e+01, percent-clipped=0.0 2024-08-18 04:59:31,698 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3709020.0, ans=0.09899494936611666 2024-08-18 04:59:37,405 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 19 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-18 04:59:40,399 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3709020.0, ans=0.0 2024-08-18 04:59:41,416 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-18 04:59:53,784 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 38 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-18 04:59:55,189 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 04:59:57,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3709220.0, ans=0.0 2024-08-18 04:59:57,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3709220.0, ans=0.125 2024-08-18 05:00:05,443 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 25 from LS+wenet, 20 from Vox, 49 fro AS 2024-08-18 05:00:07,489 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 13300, loss[loss=0.1199, beats_loss=0.0113, ecapa_loss=0.0001298, whisper_loss=0.1074, over 22536.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01059, ecapa_loss=0.0001451, whisper_loss=0.0893, over 3851262.23 frames. ], batch size: 89, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:00:12,805 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3709320.0, ans=0.0 2024-08-18 05:00:24,853 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 29 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-18 05:00:25,073 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3709420.0, ans=0.125 2024-08-18 05:00:37,858 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3709520.0, ans=0.125 2024-08-18 05:00:39,892 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 36 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-18 05:00:42,301 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3709520.0, ans=0.125 2024-08-18 05:01:06,244 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 29 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-18 05:01:09,813 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 13350, loss[loss=0.1114, beats_loss=0.01228, ecapa_loss=0.0001181, whisper_loss=0.09789, over 22620.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01056, ecapa_loss=0.0001446, whisper_loss=0.08987, over 3834361.26 frames. ], batch size: 89, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:01:12,446 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 21 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-18 05:01:29,060 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 05:01:32,682 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 27 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-18 05:01:34,952 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.554e+01 2.429e+01 2.621e+01 2.901e+01 4.949e+01, threshold=5.243e+01, percent-clipped=0.0 2024-08-18 05:01:36,732 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 26 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-18 05:01:45,439 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 30 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-18 05:01:49,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3710120.0, ans=0.125 2024-08-18 05:01:52,442 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.73 vs. limit=15.0 2024-08-18 05:02:09,159 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.16 vs. limit=22.5 2024-08-18 05:02:13,049 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 13400, loss[loss=0.08871, beats_loss=0.009662, ecapa_loss=0.0001421, whisper_loss=0.07762, over 15896.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01044, ecapa_loss=0.0001456, whisper_loss=0.09077, over 3827361.56 frames. ], batch size: 64, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:02:21,076 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 22 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-18 05:02:29,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3710420.0, ans=0.0 2024-08-18 05:02:31,950 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-18 05:02:34,911 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3710420.0, ans=0.1 2024-08-18 05:02:45,299 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=3710520.0, ans=15.0 2024-08-18 05:02:50,313 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.19 vs. limit=15.0 2024-08-18 05:03:02,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3710720.0, ans=0.125 2024-08-18 05:03:07,020 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.56 vs. limit=15.0 2024-08-18 05:03:07,711 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-18 05:03:07,986 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3710720.0, ans=0.2 2024-08-18 05:03:16,471 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 13450, loss[loss=0.0838, beats_loss=0.01143, ecapa_loss=0.0001411, whisper_loss=0.07097, over 20172.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01048, ecapa_loss=0.000145, whisper_loss=0.09109, over 3860995.86 frames. ], batch size: 81, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:03:20,752 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3710820.0, ans=0.1 2024-08-18 05:03:21,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3710820.0, ans=0.125 2024-08-18 05:03:24,442 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3710820.0, ans=0.0 2024-08-18 05:03:25,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3710820.0, ans=0.125 2024-08-18 05:03:41,691 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.380e+01 2.555e+01 2.922e+01 3.832e+01, threshold=5.110e+01, percent-clipped=0.0 2024-08-18 05:03:58,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3711120.0, ans=0.125 2024-08-18 05:04:07,027 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.32 vs. limit=15.0 2024-08-18 05:04:07,287 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.43 vs. limit=15.0 2024-08-18 05:04:09,377 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 36 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-18 05:04:12,462 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.47 vs. limit=15.0 2024-08-18 05:04:12,887 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 25 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-18 05:04:15,436 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 24 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-18 05:04:18,785 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 13500, loss[loss=0.1221, beats_loss=0.009274, ecapa_loss=0.0001352, whisper_loss=0.1115, over 21136.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01053, ecapa_loss=0.0001457, whisper_loss=0.09094, over 3894189.11 frames. ], batch size: 81, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:04:36,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3711420.0, ans=0.125 2024-08-18 05:04:42,972 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3711520.0, ans=0.125 2024-08-18 05:05:16,077 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 16 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-18 05:05:20,841 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 13550, loss[loss=0.1058, beats_loss=0.01091, ecapa_loss=0.0001574, whisper_loss=0.09335, over 22084.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01056, ecapa_loss=0.0001457, whisper_loss=0.09038, over 3878681.70 frames. ], batch size: 88, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:05:23,328 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-18 05:05:23,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3711820.0, ans=0.125 2024-08-18 05:05:23,564 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3711820.0, ans=0.2 2024-08-18 05:05:25,947 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 19 from LS+wenet, 31 from Vox, 28 fro AS 2024-08-18 05:05:32,429 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 14 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-18 05:05:33,870 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3711920.0, ans=0.0 2024-08-18 05:05:34,303 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.03 vs. limit=22.5 2024-08-18 05:05:38,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3711920.0, ans=0.09899494936611666 2024-08-18 05:05:42,143 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 24 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-18 05:05:45,915 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+01 2.297e+01 2.515e+01 2.826e+01 4.852e+01, threshold=5.029e+01, percent-clipped=0.0 2024-08-18 05:06:05,048 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3712120.0, ans=0.95 2024-08-18 05:06:11,416 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3712220.0, ans=0.125 2024-08-18 05:06:12,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3712220.0, ans=0.0 2024-08-18 05:06:16,300 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3712220.0, ans=0.125 2024-08-18 05:06:19,121 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.71 vs. limit=15.0 2024-08-18 05:06:19,637 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-18 05:06:23,028 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 13600, loss[loss=0.09126, beats_loss=0.009216, ecapa_loss=0.000116, whisper_loss=0.08089, over 17064.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01054, ecapa_loss=0.0001467, whisper_loss=0.09037, over 3892586.95 frames. ], batch size: 64, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:06:24,832 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3712320.0, ans=0.035 2024-08-18 05:06:33,552 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 12 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-18 05:06:47,271 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 15 from LS+wenet, 24 from Vox, 14 fro AS 2024-08-18 05:06:49,221 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3712520.0, ans=0.125 2024-08-18 05:06:52,376 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-18 05:07:13,707 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-18 05:07:22,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3712720.0, ans=0.0 2024-08-18 05:07:23,980 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.52 vs. limit=15.0 2024-08-18 05:07:25,908 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 13650, loss[loss=0.07916, beats_loss=0.01152, ecapa_loss=0.0001252, whisper_loss=0.06639, over 15282.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01058, ecapa_loss=0.0001474, whisper_loss=0.09032, over 3892857.31 frames. ], batch size: 57, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:07:37,515 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3712920.0, ans=0.125 2024-08-18 05:07:37,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3712920.0, ans=0.125 2024-08-18 05:07:50,783 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.696e+01 2.250e+01 2.521e+01 2.924e+01 4.330e+02, threshold=5.042e+01, percent-clipped=2.0 2024-08-18 05:07:55,698 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 25 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-18 05:08:11,499 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.31 vs. limit=10.0 2024-08-18 05:08:23,596 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-18 05:08:23,809 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3713220.0, ans=0.09899494936611666 2024-08-18 05:08:28,464 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 13700, loss[loss=0.0901, beats_loss=0.01257, ecapa_loss=0.0001184, whisper_loss=0.07635, over 21162.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01067, ecapa_loss=0.0001469, whisper_loss=0.08982, over 3876990.70 frames. ], batch size: 84, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:08:28,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3713320.0, ans=0.125 2024-08-18 05:08:33,776 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-18 05:08:41,640 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3713420.0, ans=0.1 2024-08-18 05:08:41,643 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3713420.0, ans=0.125 2024-08-18 05:09:24,012 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.35 vs. limit=10.0 2024-08-18 05:09:28,952 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.36 vs. limit=6.0 2024-08-18 05:09:30,605 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 13750, loss[loss=0.09409, beats_loss=0.01175, ecapa_loss=0.0001619, whisper_loss=0.08072, over 14221.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01066, ecapa_loss=0.0001468, whisper_loss=0.09052, over 3871951.09 frames. ], batch size: 55, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:09:42,858 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 16 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-18 05:09:55,366 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.369e+01 2.573e+01 3.048e+01 2.205e+02, threshold=5.146e+01, percent-clipped=4.0 2024-08-18 05:10:02,013 WARNING [optim.py:496] (2/4) Scaling gradients by 0.0230120737105608, model_norm_threshold=51.46477508544922 2024-08-18 05:10:02,185 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.952e+05, grad_sumsq=7.749e+07, orig_rms_sq=1.026e-02 2024-08-18 05:10:06,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3714020.0, ans=0.125 2024-08-18 05:10:08,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3714120.0, ans=0.125 2024-08-18 05:10:12,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3714120.0, ans=0.125 2024-08-18 05:10:12,324 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.94 vs. limit=22.5 2024-08-18 05:10:26,528 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 22 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-18 05:10:27,714 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 24 from LS+wenet, 11 from Vox, 21 fro AS 2024-08-18 05:10:32,792 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 13800, loss[loss=0.08591, beats_loss=0.01191, ecapa_loss=0.000106, whisper_loss=0.07294, over 14057.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01065, ecapa_loss=0.0001467, whisper_loss=0.09025, over 3843445.07 frames. ], batch size: 55, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:10:39,635 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3714320.0, ans=0.125 2024-08-18 05:10:40,675 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 34 from LS+wenet, 11 from Vox, 36 fro AS 2024-08-18 05:10:49,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3714420.0, ans=0.0 2024-08-18 05:10:56,763 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 23 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-18 05:11:12,943 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 29 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-18 05:11:18,602 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.88 vs. limit=15.0 2024-08-18 05:11:22,061 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.03 vs. limit=10.0 2024-08-18 05:11:26,282 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-18 05:11:29,430 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.81 vs. limit=15.0 2024-08-18 05:11:35,167 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 13850, loss[loss=0.08166, beats_loss=0.01513, ecapa_loss=0.0001471, whisper_loss=0.06506, over 20785.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01056, ecapa_loss=0.0001484, whisper_loss=0.0908, over 3850230.64 frames. ], batch size: 88, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:11:37,970 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3714820.0, ans=0.125 2024-08-18 05:11:51,533 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3714920.0, ans=0.125 2024-08-18 05:11:59,063 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3715020.0, ans=0.1 2024-08-18 05:11:59,754 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.280e+01 2.571e+01 2.873e+01 2.236e+03, threshold=5.141e+01, percent-clipped=2.0 2024-08-18 05:12:09,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3715020.0, ans=0.0 2024-08-18 05:12:23,652 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-18 05:12:27,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3715220.0, ans=0.0 2024-08-18 05:12:36,533 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 23 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-18 05:12:37,495 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 13900, loss[loss=0.09492, beats_loss=0.01212, ecapa_loss=0.0001842, whisper_loss=0.08096, over 20734.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01053, ecapa_loss=0.0001486, whisper_loss=0.09136, over 3861569.03 frames. ], batch size: 92, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:12:39,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3715320.0, ans=0.125 2024-08-18 05:12:44,483 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-18 05:12:52,790 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3715420.0, ans=0.1 2024-08-18 05:13:00,352 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.66 vs. limit=15.0 2024-08-18 05:13:01,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3715520.0, ans=0.125 2024-08-18 05:13:23,546 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 15 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-18 05:13:29,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3715720.0, ans=0.0 2024-08-18 05:13:39,547 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 13950, loss[loss=0.108, beats_loss=0.009449, ecapa_loss=0.0001456, whisper_loss=0.09706, over 22307.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0105, ecapa_loss=0.0001486, whisper_loss=0.09147, over 3870331.55 frames. ], batch size: 87, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:14:04,515 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.937e+01 2.378e+01 2.637e+01 2.974e+01 4.505e+01, threshold=5.274e+01, percent-clipped=0.0 2024-08-18 05:14:13,594 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 30 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-18 05:14:18,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3716120.0, ans=0.0 2024-08-18 05:14:26,960 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-18 05:14:27,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3716120.0, ans=0.04949747468305833 2024-08-18 05:14:41,774 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 14000, loss[loss=0.1127, beats_loss=0.01151, ecapa_loss=0.0001572, whisper_loss=0.09964, over 21602.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01055, ecapa_loss=0.0001463, whisper_loss=0.09128, over 3890979.26 frames. ], batch size: 87, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:14:58,470 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3716420.0, ans=0.1 2024-08-18 05:15:16,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3716520.0, ans=0.0 2024-08-18 05:15:23,213 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-18 05:15:23,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3716620.0, ans=0.2 2024-08-18 05:15:27,148 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3716620.0, ans=0.125 2024-08-18 05:15:32,191 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3716720.0, ans=0.0 2024-08-18 05:15:44,546 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 14050, loss[loss=0.1012, beats_loss=0.01144, ecapa_loss=0.0001455, whisper_loss=0.08831, over 21382.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01049, ecapa_loss=0.0001466, whisper_loss=0.0913, over 3889523.94 frames. ], batch size: 89, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:15:54,674 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 14 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-18 05:15:55,964 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-18 05:16:02,889 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3716920.0, ans=0.0 2024-08-18 05:16:09,488 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.272e+01 2.582e+01 2.804e+01 4.484e+01, threshold=5.163e+01, percent-clipped=0.0 2024-08-18 05:16:35,124 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.12 vs. limit=12.0 2024-08-18 05:16:41,062 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 35 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-18 05:16:41,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3717220.0, ans=0.04949747468305833 2024-08-18 05:16:45,960 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-18 05:16:47,025 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 14100, loss[loss=0.102, beats_loss=0.01085, ecapa_loss=0.0001734, whisper_loss=0.08937, over 22403.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01052, ecapa_loss=0.0001455, whisper_loss=0.09143, over 3870840.77 frames. ], batch size: 92, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:17:00,047 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-18 05:17:01,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3717420.0, ans=0.125 2024-08-18 05:17:07,045 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 24 from LS+wenet, 22 from Vox, 48 fro AS 2024-08-18 05:17:33,522 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3717620.0, ans=0.125 2024-08-18 05:17:46,455 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.341e+05 2024-08-18 05:17:47,972 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.49 vs. limit=15.0 2024-08-18 05:17:48,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3717820.0, ans=0.125 2024-08-18 05:17:49,167 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.39 vs. limit=15.0 2024-08-18 05:17:49,610 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 14150, loss[loss=0.08624, beats_loss=0.01217, ecapa_loss=0.0001421, whisper_loss=0.07266, over 21471.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01057, ecapa_loss=0.0001452, whisper_loss=0.09112, over 3867390.81 frames. ], batch size: 88, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:17:52,369 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3717820.0, ans=0.0 2024-08-18 05:18:01,237 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-18 05:18:01,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3717920.0, ans=0.0 2024-08-18 05:18:08,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3717920.0, ans=0.0 2024-08-18 05:18:13,703 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3718020.0, ans=0.0 2024-08-18 05:18:14,478 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.317e+01 2.583e+01 2.863e+01 6.029e+01, threshold=5.165e+01, percent-clipped=1.0 2024-08-18 05:18:51,396 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 14200, loss[loss=0.1123, beats_loss=0.009206, ecapa_loss=0.0001201, whisper_loss=0.1019, over 18403.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01053, ecapa_loss=0.0001452, whisper_loss=0.09165, over 3866934.58 frames. ], batch size: 69, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:18:57,224 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3718320.0, ans=0.0 2024-08-18 05:19:00,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3718320.0, ans=0.0 2024-08-18 05:19:16,080 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2024-08-18 05:19:21,829 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3718520.0, ans=0.1 2024-08-18 05:19:24,312 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3718520.0, ans=0.2 2024-08-18 05:19:39,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3718620.0, ans=0.07 2024-08-18 05:19:50,526 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-18 05:19:51,724 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-18 05:19:54,071 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 14250, loss[loss=0.09337, beats_loss=0.01412, ecapa_loss=0.0001582, whisper_loss=0.07767, over 18803.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01064, ecapa_loss=0.0001446, whisper_loss=0.09087, over 3880988.53 frames. ], batch size: 77, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:19:55,442 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 28 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-18 05:19:56,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3718820.0, ans=0.125 2024-08-18 05:19:57,869 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-18 05:20:10,058 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.09 vs. limit=15.0 2024-08-18 05:20:13,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3718920.0, ans=0.0 2024-08-18 05:20:20,724 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.786e+01 2.251e+01 2.558e+01 2.864e+01 4.072e+01, threshold=5.115e+01, percent-clipped=0.0 2024-08-18 05:20:38,569 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3719120.0, ans=0.125 2024-08-18 05:20:40,248 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 32 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-18 05:20:46,352 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 14 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-18 05:20:59,406 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 18 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-18 05:21:00,515 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3719220.0, ans=0.1 2024-08-18 05:21:07,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3719220.0, ans=0.0 2024-08-18 05:21:09,911 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=15.11 vs. limit=15.0 2024-08-18 05:21:10,240 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 14300, loss[loss=0.1121, beats_loss=0.00833, ecapa_loss=0.0001592, whisper_loss=0.1022, over 15822.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01065, ecapa_loss=0.0001442, whisper_loss=0.09059, over 3876632.74 frames. ], batch size: 61, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:21:19,959 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 21 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-18 05:21:35,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3719420.0, ans=0.0 2024-08-18 05:21:41,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3719520.0, ans=0.04949747468305833 2024-08-18 05:21:46,568 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.89 vs. limit=22.5 2024-08-18 05:21:47,223 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-18 05:22:40,166 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 14350, loss[loss=0.09039, beats_loss=0.01249, ecapa_loss=0.0001465, whisper_loss=0.07643, over 18030.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01071, ecapa_loss=0.0001434, whisper_loss=0.09005, over 3886937.38 frames. ], batch size: 74, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:22:52,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3719820.0, ans=0.1 2024-08-18 05:23:22,503 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.359e+01 2.540e+01 2.872e+01 4.791e+01, threshold=5.079e+01, percent-clipped=0.0 2024-08-18 05:23:44,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3720120.0, ans=0.125 2024-08-18 05:24:02,268 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 24 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-18 05:24:22,739 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 14400, loss[loss=0.09534, beats_loss=0.01152, ecapa_loss=0.0001457, whisper_loss=0.08237, over 18696.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01065, ecapa_loss=0.0001452, whisper_loss=0.09052, over 3915697.47 frames. ], batch size: 75, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:25:08,424 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3720520.0, ans=0.125 2024-08-18 05:25:43,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3720620.0, ans=0.0 2024-08-18 05:25:43,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3720620.0, ans=0.125 2024-08-18 05:25:56,737 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-18 05:26:08,515 INFO [train_multi_KD3.py:1116] (2/4) Epoch 25, batch 14450, loss[loss=0.1007, beats_loss=0.01198, ecapa_loss=0.0001074, whisper_loss=0.08767, over 21874.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01069, ecapa_loss=0.0001442, whisper_loss=0.09046, over 3935395.51 frames. ], batch size: 83, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:26:21,853 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 18 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-18 05:26:37,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3720920.0, ans=0.0 2024-08-18 05:26:48,677 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.314e+01 2.548e+01 2.908e+01 2.011e+02, threshold=5.097e+01, percent-clipped=3.0 2024-08-18 05:26:48,830 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-18 05:26:59,629 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.00 vs. limit=15.0 2024-08-18 05:27:04,106 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 34 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-18 05:27:06,998 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3721120.0, ans=0.04949747468305833 2024-08-18 05:27:51,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3721230.0, ans=0.125 2024-08-18 05:27:51,745 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 0, loss[loss=0.09772, beats_loss=0.01168, ecapa_loss=0.0001285, whisper_loss=0.08476, over 21864.00 frames. ], tot_loss[loss=0.09772, beats_loss=0.01168, ecapa_loss=0.0001285, whisper_loss=0.08476, over 21864.00 frames. ], batch size: 91, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:27:51,746 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-18 05:28:25,224 INFO [train_multi_KD3.py:1149] (2/4) Epoch 26, validation on ASR_libri: loss=0.251, beats_loss=0, ecapa_loss=0.0005273, whisper_loss=0.2457, over 922467.00 frames. 2024-08-18 05:28:39,636 INFO [train_multi_KD3.py:1149] (2/4) Epoch 26, validation on SV_voxceleb1: loss=0.004107, beats_loss=0, ecapa_loss=0.0004107, whisper_loss=0, over 939242.00 frames. 2024-08-18 05:30:15,542 INFO [train_multi_KD3.py:1149] (2/4) Epoch 26, validation on AT_audioset: loss=0.02319, beats_loss=0.02319, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 05:30:15,545 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31584MB 2024-08-18 05:30:22,451 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3721230.0, ans=0.125 2024-08-18 05:30:40,223 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3721330.0, ans=0.0 2024-08-18 05:31:08,964 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 36 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-18 05:31:22,515 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3721430.0, ans=0.125 2024-08-18 05:31:51,590 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3721630.0, ans=0.0 2024-08-18 05:31:59,636 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 21 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-18 05:32:09,463 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 50, loss[loss=0.1119, beats_loss=0.007799, ecapa_loss=0.0001594, whisper_loss=0.1025, over 18494.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.009305, ecapa_loss=0.0001497, whisper_loss=0.0908, over 878538.17 frames. ], batch size: 72, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:32:17,195 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3721730.0, ans=0.125 2024-08-18 05:32:31,352 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-18 05:32:42,769 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 22 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-18 05:33:02,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3721930.0, ans=0.125 2024-08-18 05:33:11,522 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3721930.0, ans=0.125 2024-08-18 05:33:12,199 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.655e+01 2.458e+01 2.767e+01 3.037e+01 4.050e+01, threshold=5.534e+01, percent-clipped=0.0 2024-08-18 05:33:13,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3722030.0, ans=0.125 2024-08-18 05:33:18,397 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3722030.0, ans=0.0 2024-08-18 05:33:20,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3722030.0, ans=0.07 2024-08-18 05:33:58,132 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 100, loss[loss=0.08848, beats_loss=0.009727, ecapa_loss=0.0001431, whisper_loss=0.07733, over 22312.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.009473, ecapa_loss=0.0001482, whisper_loss=0.08978, over 1524456.49 frames. ], batch size: 89, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:33:59,756 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3722230.0, ans=0.1 2024-08-18 05:34:05,763 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3722230.0, ans=0.125 2024-08-18 05:34:07,654 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3722230.0, ans=0.04949747468305833 2024-08-18 05:34:17,125 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 36 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-18 05:34:17,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3722330.0, ans=0.0 2024-08-18 05:34:46,567 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3722430.0, ans=0.1 2024-08-18 05:35:35,854 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 150, loss[loss=0.1017, beats_loss=0.01015, ecapa_loss=0.0001318, whisper_loss=0.0902, over 14345.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.009509, ecapa_loss=0.0001474, whisper_loss=0.0893, over 2031025.74 frames. ], batch size: 57, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:36:10,548 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3722930.0, ans=0.015 2024-08-18 05:36:23,216 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.872e+01 2.548e+01 2.773e+01 3.033e+01 4.359e+01, threshold=5.546e+01, percent-clipped=0.0 2024-08-18 05:36:27,885 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 28 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-18 05:36:28,236 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3723030.0, ans=0.125 2024-08-18 05:36:32,990 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.30 vs. limit=15.0 2024-08-18 05:36:40,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3723130.0, ans=0.0 2024-08-18 05:36:49,831 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-18 05:36:55,852 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 200, loss[loss=0.1039, beats_loss=0.01086, ecapa_loss=0.0001481, whisper_loss=0.09152, over 22224.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.009584, ecapa_loss=0.0001466, whisper_loss=0.09054, over 2442227.32 frames. ], batch size: 89, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:37:10,894 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 27 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-18 05:37:14,719 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-18 05:37:21,217 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 21 from LS+wenet, 15 from Vox, 52 fro AS 2024-08-18 05:37:44,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3723530.0, ans=0.125 2024-08-18 05:37:45,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3723530.0, ans=0.125 2024-08-18 05:37:57,960 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.64 vs. limit=15.0 2024-08-18 05:38:09,687 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 250, loss[loss=0.09758, beats_loss=0.00847, ecapa_loss=0.000211, whisper_loss=0.087, over 12788.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.009768, ecapa_loss=0.000147, whisper_loss=0.09094, over 2762495.33 frames. ], batch size: 54, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:38:11,221 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-18 05:38:27,094 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.88 vs. limit=15.0 2024-08-18 05:38:27,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3723830.0, ans=0.1 2024-08-18 05:38:29,248 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3723830.0, ans=0.125 2024-08-18 05:38:29,319 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3723830.0, ans=0.2 2024-08-18 05:38:35,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3723930.0, ans=0.1 2024-08-18 05:38:38,743 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 19 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-18 05:38:44,124 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 15 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-18 05:38:49,109 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.790e+01 2.303e+01 2.551e+01 2.928e+01 5.127e+01, threshold=5.101e+01, percent-clipped=0.0 2024-08-18 05:39:05,665 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3724130.0, ans=0.125 2024-08-18 05:39:19,649 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 300, loss[loss=0.08372, beats_loss=0.009997, ecapa_loss=0.0001708, whisper_loss=0.07202, over 15331.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.009962, ecapa_loss=0.0001468, whisper_loss=0.09022, over 2992370.85 frames. ], batch size: 62, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:39:26,575 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-18 05:39:38,019 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2024-08-18 05:39:41,867 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 26 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-18 05:39:43,074 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-18 05:39:48,585 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3724430.0, ans=0.1 2024-08-18 05:39:48,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3724430.0, ans=0.125 2024-08-18 05:39:51,442 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3724430.0, ans=0.125 2024-08-18 05:39:51,527 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3724430.0, ans=0.125 2024-08-18 05:39:52,785 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3724430.0, ans=0.125 2024-08-18 05:39:58,196 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3724430.0, ans=0.125 2024-08-18 05:40:01,201 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.279e+01 2024-08-18 05:40:24,432 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3724630.0, ans=0.1 2024-08-18 05:40:28,426 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.27 vs. limit=15.0 2024-08-18 05:40:29,031 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 350, loss[loss=0.08683, beats_loss=0.01167, ecapa_loss=0.0001233, whisper_loss=0.07393, over 15521.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0101, ecapa_loss=0.0001454, whisper_loss=0.08969, over 3175213.14 frames. ], batch size: 60, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:40:32,296 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.52 vs. limit=22.5 2024-08-18 05:40:34,967 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.54 vs. limit=15.0 2024-08-18 05:40:38,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3724730.0, ans=0.0 2024-08-18 05:40:45,754 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.84 vs. limit=10.0 2024-08-18 05:40:47,047 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.89 vs. limit=10.0 2024-08-18 05:41:06,694 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.645e+01 2.139e+01 2.480e+01 2.873e+01 3.431e+01, threshold=4.959e+01, percent-clipped=0.0 2024-08-18 05:41:09,684 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-18 05:41:13,750 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 28 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-18 05:41:33,968 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 400, loss[loss=0.08975, beats_loss=0.01001, ecapa_loss=0.0001755, whisper_loss=0.07799, over 14733.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01022, ecapa_loss=0.0001446, whisper_loss=0.08955, over 3317234.92 frames. ], batch size: 57, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:41:36,468 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 20 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-18 05:41:48,108 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 22 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-18 05:41:53,457 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 18 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-18 05:42:13,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3725530.0, ans=15.0 2024-08-18 05:42:26,051 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 29 from LS+wenet, 33 from Vox, 24 fro AS 2024-08-18 05:42:27,775 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.35 vs. limit=15.0 2024-08-18 05:42:39,651 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 450, loss[loss=0.1002, beats_loss=0.0122, ecapa_loss=0.0001124, whisper_loss=0.08685, over 15620.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01015, ecapa_loss=0.0001446, whisper_loss=0.08989, over 3420223.40 frames. ], batch size: 58, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:42:49,046 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 20 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-18 05:43:03,344 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3725830.0, ans=0.2 2024-08-18 05:43:17,595 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.230e+01 2.487e+01 2.863e+01 4.267e+01, threshold=4.973e+01, percent-clipped=0.0 2024-08-18 05:43:18,158 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-18 05:43:25,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3726030.0, ans=0.125 2024-08-18 05:43:31,001 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 14 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-18 05:43:41,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3726130.0, ans=0.2 2024-08-18 05:43:45,019 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 500, loss[loss=0.1216, beats_loss=0.008028, ecapa_loss=0.0001537, whisper_loss=0.112, over 17375.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01024, ecapa_loss=0.0001443, whisper_loss=0.08984, over 3527901.77 frames. ], batch size: 66, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:44:04,175 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-18 05:44:04,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3726330.0, ans=0.05 2024-08-18 05:44:08,107 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 12 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-18 05:44:47,842 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 20 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-18 05:44:50,360 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 550, loss[loss=0.1142, beats_loss=0.008782, ecapa_loss=0.0001416, whisper_loss=0.104, over 24014.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0103, ecapa_loss=0.0001438, whisper_loss=0.08979, over 3606386.24 frames. ], batch size: 90, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:45:28,249 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.311e+01 2.532e+01 2.757e+01 3.672e+01, threshold=5.065e+01, percent-clipped=0.0 2024-08-18 05:45:29,599 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 17 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-18 05:45:34,758 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 23 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-18 05:45:35,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3727030.0, ans=0.1 2024-08-18 05:45:39,838 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 22 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-18 05:45:52,218 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3727130.0, ans=0.125 2024-08-18 05:45:55,383 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 600, loss[loss=0.1238, beats_loss=0.008259, ecapa_loss=0.0001227, whisper_loss=0.1143, over 21087.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01033, ecapa_loss=0.0001439, whisper_loss=0.08997, over 3668505.18 frames. ], batch size: 78, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:45:59,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3727230.0, ans=0.0 2024-08-18 05:46:04,824 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-18 05:46:05,419 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.83 vs. limit=15.0 2024-08-18 05:46:15,818 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3727330.0, ans=0.2 2024-08-18 05:46:47,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3727630.0, ans=0.125 2024-08-18 05:46:49,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3727630.0, ans=0.125 2024-08-18 05:46:50,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3727630.0, ans=0.125 2024-08-18 05:46:53,785 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-18 05:47:00,515 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 650, loss[loss=0.09344, beats_loss=0.01227, ecapa_loss=0.0001553, whisper_loss=0.07962, over 17019.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01039, ecapa_loss=0.000144, whisper_loss=0.08932, over 3701672.94 frames. ], batch size: 71, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:47:00,679 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 26 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-18 05:47:02,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3727730.0, ans=0.2 2024-08-18 05:47:20,203 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3727830.0, ans=10.0 2024-08-18 05:47:34,158 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.49 vs. limit=10.0 2024-08-18 05:47:38,258 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.709e+01 2.282e+01 2.594e+01 2.924e+01 5.519e+01, threshold=5.188e+01, percent-clipped=2.0 2024-08-18 05:47:41,978 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.29 vs. limit=15.0 2024-08-18 05:47:46,475 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3728030.0, ans=0.125 2024-08-18 05:47:47,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3728030.0, ans=0.1 2024-08-18 05:47:55,671 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 37 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-18 05:47:57,185 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3728130.0, ans=0.0 2024-08-18 05:48:00,813 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 22 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-18 05:48:02,156 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-18 05:48:05,904 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 700, loss[loss=0.08784, beats_loss=0.01003, ecapa_loss=0.0001918, whisper_loss=0.0759, over 19483.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01043, ecapa_loss=0.0001444, whisper_loss=0.08936, over 3705912.51 frames. ], batch size: 83, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:48:11,360 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-18 05:48:14,827 WARNING [optim.py:496] (2/4) Scaling gradients by 0.0922105461359024, model_norm_threshold=51.87888717651367 2024-08-18 05:48:14,996 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.030e+04, grad_sumsq=4.030e+04, orig_rms_sq=1.000e+00 2024-08-18 05:48:16,564 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3728230.0, ans=0.2 2024-08-18 05:48:28,855 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-08-18 05:48:46,224 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3728530.0, ans=0.5 2024-08-18 05:48:48,783 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 17 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-18 05:48:54,615 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.64 vs. limit=15.0 2024-08-18 05:49:07,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3728630.0, ans=0.1 2024-08-18 05:49:10,644 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 750, loss[loss=0.09648, beats_loss=0.01156, ecapa_loss=0.0001567, whisper_loss=0.08334, over 16656.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01044, ecapa_loss=0.0001448, whisper_loss=0.08966, over 3731897.10 frames. ], batch size: 65, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:49:32,480 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 12 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-18 05:49:32,824 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3728830.0, ans=0.125 2024-08-18 05:49:47,399 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.292e+01 2.475e+01 2.827e+01 5.626e+02, threshold=4.950e+01, percent-clipped=2.0 2024-08-18 05:50:13,778 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3729130.0, ans=0.125 2024-08-18 05:50:16,228 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 800, loss[loss=0.1064, beats_loss=0.01124, ecapa_loss=0.0001446, whisper_loss=0.09371, over 16647.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01045, ecapa_loss=0.0001453, whisper_loss=0.08924, over 3744727.64 frames. ], batch size: 68, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:50:19,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3729230.0, ans=0.0 2024-08-18 05:50:29,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3729330.0, ans=0.2 2024-08-18 05:50:39,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3729330.0, ans=0.2 2024-08-18 05:50:42,377 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-18 05:50:49,927 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 36 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-18 05:50:50,189 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3729430.0, ans=0.0 2024-08-18 05:50:59,743 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3729530.0, ans=0.125 2024-08-18 05:51:20,459 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 05:51:23,692 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 850, loss[loss=0.09811, beats_loss=0.006802, ecapa_loss=0.0002011, whisper_loss=0.0893, over 15405.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.0105, ecapa_loss=0.0001448, whisper_loss=0.0882, over 3760174.09 frames. ], batch size: 64, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:51:27,765 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 41 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-18 05:51:36,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3729830.0, ans=0.125 2024-08-18 05:51:57,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3729930.0, ans=10.0 2024-08-18 05:52:02,226 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.231e+01 2.471e+01 2.825e+01 3.854e+01, threshold=4.942e+01, percent-clipped=0.0 2024-08-18 05:52:03,218 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.91 vs. limit=15.0 2024-08-18 05:52:18,634 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3730130.0, ans=0.125 2024-08-18 05:52:30,534 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 900, loss[loss=0.09805, beats_loss=0.009763, ecapa_loss=0.0001624, whisper_loss=0.08666, over 17164.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01045, ecapa_loss=0.0001443, whisper_loss=0.08885, over 3810300.94 frames. ], batch size: 71, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:52:49,752 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3730330.0, ans=0.0 2024-08-18 05:52:49,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3730330.0, ans=0.125 2024-08-18 05:52:52,410 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3730330.0, ans=0.125 2024-08-18 05:52:56,256 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-18 05:52:56,958 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3730430.0, ans=0.1 2024-08-18 05:53:04,153 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3730430.0, ans=0.125 2024-08-18 05:53:05,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3730430.0, ans=0.125 2024-08-18 05:53:15,365 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 26 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-18 05:53:18,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3730530.0, ans=0.125 2024-08-18 05:53:38,977 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 950, loss[loss=0.09364, beats_loss=0.01021, ecapa_loss=0.0001719, whisper_loss=0.0817, over 21224.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01046, ecapa_loss=0.0001439, whisper_loss=0.0885, over 3813408.33 frames. ], batch size: 89, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:54:00,569 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3730830.0, ans=0.125 2024-08-18 05:54:08,645 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-18 05:54:17,811 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.336e+01 2.576e+01 2.851e+01 4.260e+01, threshold=5.152e+01, percent-clipped=0.0 2024-08-18 05:54:23,605 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3731030.0, ans=0.1 2024-08-18 05:54:42,884 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 17 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-18 05:54:46,853 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 1000, loss[loss=0.1053, beats_loss=0.0103, ecapa_loss=0.0001737, whisper_loss=0.09323, over 22280.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01047, ecapa_loss=0.0001435, whisper_loss=0.08825, over 3783637.53 frames. ], batch size: 92, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:54:52,048 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 38 from LS+wenet, 13 from Vox, 42 fro AS 2024-08-18 05:54:54,253 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3731230.0, ans=0.125 2024-08-18 05:55:15,639 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3731430.0, ans=0.1 2024-08-18 05:55:19,583 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-18 05:55:33,009 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3731530.0, ans=10.0 2024-08-18 05:55:36,052 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=3731530.0, ans=15.0 2024-08-18 05:55:43,363 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3731630.0, ans=0.05 2024-08-18 05:55:47,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3731630.0, ans=0.125 2024-08-18 05:55:53,920 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 1050, loss[loss=0.1127, beats_loss=0.01105, ecapa_loss=0.0001246, whisper_loss=0.1004, over 20256.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01046, ecapa_loss=0.0001423, whisper_loss=0.08847, over 3794439.04 frames. ], batch size: 78, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:55:55,843 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-18 05:56:00,031 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.88 vs. limit=15.0 2024-08-18 05:56:01,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3731730.0, ans=0.125 2024-08-18 05:56:01,991 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-18 05:56:17,093 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 26 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-18 05:56:35,554 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+01 2.334e+01 2.539e+01 2.786e+01 5.351e+01, threshold=5.078e+01, percent-clipped=0.0 2024-08-18 05:56:42,241 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3732030.0, ans=0.125 2024-08-18 05:56:46,940 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3732030.0, ans=0.125 2024-08-18 05:56:57,610 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 36 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-18 05:56:59,680 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.78 vs. limit=15.0 2024-08-18 05:57:05,200 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3732230.0, ans=0.2 2024-08-18 05:57:06,313 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 1100, loss[loss=0.09686, beats_loss=0.01076, ecapa_loss=0.0001626, whisper_loss=0.08447, over 21858.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01044, ecapa_loss=0.0001432, whisper_loss=0.08866, over 3815692.63 frames. ], batch size: 89, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:57:08,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3732230.0, ans=0.0 2024-08-18 05:57:17,819 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 23 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-18 05:57:19,854 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.46 vs. limit=6.0 2024-08-18 05:57:37,086 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3732430.0, ans=0.1 2024-08-18 05:58:16,616 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 1150, loss[loss=0.1036, beats_loss=0.01007, ecapa_loss=0.0001384, whisper_loss=0.09219, over 23642.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01048, ecapa_loss=0.0001427, whisper_loss=0.08818, over 3809831.53 frames. ], batch size: 92, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:58:26,426 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.62 vs. limit=10.0 2024-08-18 05:58:32,546 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3732830.0, ans=0.125 2024-08-18 05:58:38,317 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3732830.0, ans=0.125 2024-08-18 05:58:40,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3732830.0, ans=0.0 2024-08-18 05:58:42,287 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3732830.0, ans=0.125 2024-08-18 05:58:46,623 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.44 vs. limit=22.5 2024-08-18 05:58:53,708 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3732930.0, ans=0.0 2024-08-18 05:58:57,164 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.301e+01 2.610e+01 2.950e+01 4.151e+01, threshold=5.220e+01, percent-clipped=1.0 2024-08-18 05:59:06,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3733030.0, ans=0.2 2024-08-18 05:59:15,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3733130.0, ans=0.0 2024-08-18 05:59:24,353 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 29 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-18 05:59:27,912 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 1200, loss[loss=0.106, beats_loss=0.009978, ecapa_loss=0.0001761, whisper_loss=0.09422, over 21664.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01038, ecapa_loss=0.000143, whisper_loss=0.08938, over 3791615.04 frames. ], batch size: 91, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:59:29,757 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3733230.0, ans=0.2 2024-08-18 05:59:34,403 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 15 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-18 05:59:36,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3733230.0, ans=0.0 2024-08-18 05:59:44,173 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3733330.0, ans=0.0 2024-08-18 05:59:53,595 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-18 06:00:12,803 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 16 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-18 06:00:26,685 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3733630.0, ans=0.0 2024-08-18 06:00:32,548 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.04 vs. limit=22.5 2024-08-18 06:00:40,956 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 1250, loss[loss=0.1013, beats_loss=0.008757, ecapa_loss=0.0001425, whisper_loss=0.09113, over 15308.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01052, ecapa_loss=0.0001417, whisper_loss=0.08871, over 3798197.43 frames. ], batch size: 57, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:00:43,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3733730.0, ans=0.125 2024-08-18 06:00:49,565 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 21 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-18 06:00:57,328 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3733830.0, ans=0.125 2024-08-18 06:01:08,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3733830.0, ans=0.0 2024-08-18 06:01:20,140 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3733930.0, ans=0.125 2024-08-18 06:01:20,317 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3733930.0, ans=15.0 2024-08-18 06:01:20,956 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 25 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-18 06:01:24,455 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.310e+01 2.549e+01 2.839e+01 4.783e+01, threshold=5.098e+01, percent-clipped=0.0 2024-08-18 06:01:28,629 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 20 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-18 06:01:31,504 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-18 06:01:51,462 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 29 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-18 06:01:55,077 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 1300, loss[loss=0.1008, beats_loss=0.008923, ecapa_loss=0.0001521, whisper_loss=0.09032, over 18055.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01042, ecapa_loss=0.0001422, whisper_loss=0.08932, over 3771706.95 frames. ], batch size: 69, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:02:04,202 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-18 06:02:15,481 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 20 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-18 06:02:17,996 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 21 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-18 06:02:20,809 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-18 06:02:47,924 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.90 vs. limit=15.0 2024-08-18 06:03:07,131 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3734630.0, ans=0.125 2024-08-18 06:03:07,224 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3734630.0, ans=0.0 2024-08-18 06:03:07,225 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3734630.0, ans=0.0 2024-08-18 06:03:12,412 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 1350, loss[loss=0.1014, beats_loss=0.01015, ecapa_loss=0.0001142, whisper_loss=0.09006, over 15419.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01044, ecapa_loss=0.0001414, whisper_loss=0.08906, over 3786485.96 frames. ], batch size: 56, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:03:12,726 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3734730.0, ans=0.125 2024-08-18 06:03:13,561 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.14 vs. limit=22.5 2024-08-18 06:03:26,560 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.841e+05 2024-08-18 06:03:50,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3734930.0, ans=0.2 2024-08-18 06:04:00,191 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.786e+01 2.254e+01 2.510e+01 2.787e+01 4.431e+01, threshold=5.020e+01, percent-clipped=0.0 2024-08-18 06:04:12,343 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 19 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-18 06:04:25,248 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-18 06:04:25,835 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.81 vs. limit=15.0 2024-08-18 06:04:34,978 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 1400, loss[loss=0.09532, beats_loss=0.01136, ecapa_loss=0.0001602, whisper_loss=0.08236, over 15945.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01037, ecapa_loss=0.0001422, whisper_loss=0.08973, over 3773061.66 frames. ], batch size: 64, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:05:02,995 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3735330.0, ans=0.1 2024-08-18 06:05:09,708 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.88 vs. limit=10.0 2024-08-18 06:05:10,622 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3735430.0, ans=0.0 2024-08-18 06:05:17,623 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3735430.0, ans=0.125 2024-08-18 06:05:40,760 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3735630.0, ans=0.125 2024-08-18 06:06:25,086 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 1450, loss[loss=0.08354, beats_loss=0.01108, ecapa_loss=0.0001125, whisper_loss=0.07133, over 19240.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01035, ecapa_loss=0.0001423, whisper_loss=0.08986, over 3755058.27 frames. ], batch size: 75, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:06:31,534 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.52 vs. limit=10.0 2024-08-18 06:06:48,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3735830.0, ans=0.125 2024-08-18 06:06:55,712 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-18 06:06:58,414 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.70 vs. limit=15.0 2024-08-18 06:07:03,928 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3735930.0, ans=0.125 2024-08-18 06:07:09,460 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.184e+01 2.399e+01 2.651e+01 6.055e+01, threshold=4.798e+01, percent-clipped=1.0 2024-08-18 06:07:28,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3736130.0, ans=0.125 2024-08-18 06:07:31,269 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 27 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-18 06:07:36,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3736130.0, ans=0.125 2024-08-18 06:07:40,719 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 1500, loss[loss=0.09245, beats_loss=0.01189, ecapa_loss=0.0001274, whisper_loss=0.07928, over 17643.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01036, ecapa_loss=0.0001424, whisper_loss=0.08989, over 3801815.01 frames. ], batch size: 71, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:07:43,199 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3736230.0, ans=0.1 2024-08-18 06:07:43,244 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3736230.0, ans=0.125 2024-08-18 06:07:49,807 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.34 vs. limit=22.5 2024-08-18 06:07:50,018 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.56 vs. limit=15.0 2024-08-18 06:07:54,675 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3736330.0, ans=0.2 2024-08-18 06:08:07,698 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.54 vs. limit=10.0 2024-08-18 06:08:55,156 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 1550, loss[loss=0.0815, beats_loss=0.009151, ecapa_loss=0.0001352, whisper_loss=0.07099, over 14904.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01036, ecapa_loss=0.0001434, whisper_loss=0.08912, over 3798194.17 frames. ], batch size: 57, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:09:06,681 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3736730.0, ans=0.1 2024-08-18 06:09:19,242 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.18 vs. limit=22.5 2024-08-18 06:09:38,794 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+01 2.223e+01 2.492e+01 2.734e+01 3.919e+01, threshold=4.985e+01, percent-clipped=0.0 2024-08-18 06:09:51,976 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 22 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-18 06:09:53,155 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 32 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-18 06:09:53,382 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3737130.0, ans=0.0 2024-08-18 06:09:59,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3737130.0, ans=0.125 2024-08-18 06:10:07,621 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-18 06:10:07,886 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3737230.0, ans=0.125 2024-08-18 06:10:08,715 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 1600, loss[loss=0.1002, beats_loss=0.0113, ecapa_loss=0.0001583, whisper_loss=0.08735, over 21463.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01031, ecapa_loss=0.0001433, whisper_loss=0.08972, over 3826380.21 frames. ], batch size: 89, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:10:08,900 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 15 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-18 06:10:11,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3737230.0, ans=0.0 2024-08-18 06:10:14,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3737230.0, ans=0.0 2024-08-18 06:10:14,996 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3737230.0, ans=0.0 2024-08-18 06:10:21,095 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 25 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-18 06:10:25,730 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.83 vs. limit=12.0 2024-08-18 06:10:27,995 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 18 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-18 06:10:37,175 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3737430.0, ans=0.0 2024-08-18 06:10:40,419 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 20 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-18 06:10:41,948 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-18 06:11:00,646 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3737530.0, ans=0.2 2024-08-18 06:11:09,107 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-18 06:11:20,018 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 1650, loss[loss=0.08677, beats_loss=0.009959, ecapa_loss=0.0001392, whisper_loss=0.07542, over 15569.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01034, ecapa_loss=0.0001427, whisper_loss=0.08996, over 3836239.46 frames. ], batch size: 63, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:11:23,865 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 18 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-18 06:11:33,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3737830.0, ans=0.0 2024-08-18 06:11:43,955 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 16 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-18 06:11:58,788 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.262e+01 2.498e+01 2.828e+01 4.112e+01, threshold=4.996e+01, percent-clipped=0.0 2024-08-18 06:11:59,771 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3738030.0, ans=0.0 2024-08-18 06:12:05,966 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-18 06:12:11,532 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 33 from Vox, 35 fro AS 2024-08-18 06:12:11,855 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3738030.0, ans=0.0 2024-08-18 06:12:15,032 WARNING [optim.py:496] (2/4) Scaling gradients by 0.09963374584913254, model_norm_threshold=49.962425231933594 2024-08-18 06:12:15,200 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.17, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.352e+04, grad_sumsq=4.352e+04, orig_rms_sq=1.000e+00 2024-08-18 06:12:19,796 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-18 06:12:27,730 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 1700, loss[loss=0.08227, beats_loss=0.01206, ecapa_loss=0.0001465, whisper_loss=0.06875, over 20131.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01041, ecapa_loss=0.0001432, whisper_loss=0.08921, over 3830623.99 frames. ], batch size: 83, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:12:33,303 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-18 06:12:37,902 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.52 vs. limit=22.5 2024-08-18 06:12:49,584 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-18 06:12:50,928 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3738330.0, ans=0.125 2024-08-18 06:12:58,883 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-18 06:13:08,420 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-18 06:13:10,283 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 32 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-18 06:13:12,672 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-18 06:13:18,218 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 21 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-18 06:13:20,001 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.15 vs. limit=10.0 2024-08-18 06:13:26,472 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3738630.0, ans=0.125 2024-08-18 06:13:35,015 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 1750, loss[loss=0.09581, beats_loss=0.01068, ecapa_loss=0.0001325, whisper_loss=0.08381, over 21138.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.0104, ecapa_loss=0.000143, whisper_loss=0.08931, over 3819978.07 frames. ], batch size: 81, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:13:35,644 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3738730.0, ans=0.125 2024-08-18 06:13:37,027 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3738730.0, ans=0.125 2024-08-18 06:13:43,658 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.160e-01 2024-08-18 06:13:44,997 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 36 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-18 06:13:47,298 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 20 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-18 06:14:06,251 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-18 06:14:09,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3738930.0, ans=0.125 2024-08-18 06:14:15,695 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.736e+01 2.240e+01 2.515e+01 2.865e+01 5.015e+02, threshold=5.030e+01, percent-clipped=2.0 2024-08-18 06:14:19,944 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3739030.0, ans=0.125 2024-08-18 06:14:21,190 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3739030.0, ans=0.025 2024-08-18 06:14:21,227 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3739030.0, ans=0.1 2024-08-18 06:14:22,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3739030.0, ans=0.2 2024-08-18 06:14:27,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3739030.0, ans=0.125 2024-08-18 06:14:37,853 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-18 06:14:42,637 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 1800, loss[loss=0.08487, beats_loss=0.01081, ecapa_loss=0.0001452, whisper_loss=0.0726, over 15733.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01036, ecapa_loss=0.0001435, whisper_loss=0.08928, over 3839640.57 frames. ], batch size: 64, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:14:58,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3739330.0, ans=0.0 2024-08-18 06:15:04,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3739330.0, ans=0.1 2024-08-18 06:15:08,833 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 26 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-18 06:15:09,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3739430.0, ans=0.125 2024-08-18 06:15:28,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3739530.0, ans=0.0 2024-08-18 06:15:29,742 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 20 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-18 06:15:34,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3739530.0, ans=0.125 2024-08-18 06:15:34,890 WARNING [optim.py:496] (2/4) Scaling gradients by 0.04103388637304306, model_norm_threshold=50.29759979248047 2024-08-18 06:15:35,057 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.23, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.438e+05, grad_sumsq=3.364e+07, orig_rms_sq=1.022e-02 2024-08-18 06:15:43,554 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=3739630.0, ans=0.02 2024-08-18 06:15:49,580 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 1850, loss[loss=0.09243, beats_loss=0.01039, ecapa_loss=0.0001865, whisper_loss=0.08017, over 13184.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01049, ecapa_loss=0.0001424, whisper_loss=0.08856, over 3839482.14 frames. ], batch size: 57, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:16:16,664 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 34 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-18 06:16:16,959 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3739930.0, ans=0.125 2024-08-18 06:16:29,343 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.281e+01 2.584e+01 3.021e+01 1.226e+03, threshold=5.167e+01, percent-clipped=3.0 2024-08-18 06:16:39,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3740030.0, ans=0.0 2024-08-18 06:16:49,870 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3740130.0, ans=0.125 2024-08-18 06:16:58,389 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 1900, loss[loss=0.1046, beats_loss=0.01138, ecapa_loss=0.0001618, whisper_loss=0.09159, over 23606.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.0105, ecapa_loss=0.0001429, whisper_loss=0.08874, over 3844017.55 frames. ], batch size: 94, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:17:13,896 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3740330.0, ans=0.125 2024-08-18 06:17:54,218 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3740630.0, ans=0.0 2024-08-18 06:17:57,205 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.20 vs. limit=15.0 2024-08-18 06:18:05,318 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 1950, loss[loss=0.09013, beats_loss=0.01079, ecapa_loss=0.0001684, whisper_loss=0.07766, over 18908.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.0105, ecapa_loss=0.0001422, whisper_loss=0.08899, over 3828255.13 frames. ], batch size: 82, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:18:13,107 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3740730.0, ans=0.2 2024-08-18 06:18:43,507 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.770e+01 2.240e+01 2.452e+01 2.849e+01 7.205e+01, threshold=4.903e+01, percent-clipped=1.0 2024-08-18 06:19:11,486 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 2000, loss[loss=0.08792, beats_loss=0.01087, ecapa_loss=0.0001448, whisper_loss=0.0756, over 16738.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01048, ecapa_loss=0.0001421, whisper_loss=0.08905, over 3812494.24 frames. ], batch size: 66, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:19:11,976 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3741230.0, ans=0.125 2024-08-18 06:19:17,334 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 21 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-18 06:19:21,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3741230.0, ans=0.0 2024-08-18 06:19:23,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3741330.0, ans=0.2 2024-08-18 06:19:29,227 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3741330.0, ans=0.0 2024-08-18 06:19:51,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3741530.0, ans=0.0 2024-08-18 06:20:07,473 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.01 vs. limit=15.0 2024-08-18 06:20:16,810 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 2050, loss[loss=0.101, beats_loss=0.009386, ecapa_loss=0.0001566, whisper_loss=0.09008, over 22494.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01055, ecapa_loss=0.0001417, whisper_loss=0.08897, over 3830217.75 frames. ], batch size: 92, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:20:25,794 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 26 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-18 06:20:30,333 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.42 vs. limit=22.5 2024-08-18 06:20:33,775 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 22 from LS+wenet, 33 from Vox, 38 fro AS 2024-08-18 06:20:40,306 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3741830.0, ans=0.125 2024-08-18 06:20:54,434 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.321e+01 2.575e+01 2.805e+01 3.958e+01, threshold=5.151e+01, percent-clipped=0.0 2024-08-18 06:21:08,377 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-08-18 06:21:11,461 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.99 vs. limit=22.5 2024-08-18 06:21:15,802 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-18 06:21:20,266 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2024-08-18 06:21:23,082 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 2100, loss[loss=0.1108, beats_loss=0.009438, ecapa_loss=0.0001538, whisper_loss=0.09984, over 22882.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.0106, ecapa_loss=0.0001412, whisper_loss=0.08883, over 3834325.97 frames. ], batch size: 89, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:21:32,634 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-18 06:22:26,804 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 2150, loss[loss=0.1062, beats_loss=0.01122, ecapa_loss=0.0001194, whisper_loss=0.0938, over 18334.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.0106, ecapa_loss=0.0001405, whisper_loss=0.08854, over 3793270.13 frames. ], batch size: 72, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:22:36,125 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.07 vs. limit=15.0 2024-08-18 06:22:41,256 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.92 vs. limit=22.5 2024-08-18 06:22:45,882 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 17 from LS+wenet, 9 from Vox, 28 fro AS 2024-08-18 06:22:46,955 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.whiten.whitening_limit, batch_count=3742830.0, ans=12.0 2024-08-18 06:22:55,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3742930.0, ans=0.1 2024-08-18 06:23:07,304 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.693e+01 2.305e+01 2.600e+01 2.971e+01 3.562e+02, threshold=5.201e+01, percent-clipped=4.0 2024-08-18 06:23:09,387 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.39 vs. limit=10.0 2024-08-18 06:23:26,044 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 24 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-18 06:23:28,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3743130.0, ans=0.5 2024-08-18 06:23:30,758 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-18 06:23:39,389 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 2200, loss[loss=0.1151, beats_loss=0.009486, ecapa_loss=0.0001457, whisper_loss=0.1041, over 16244.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01054, ecapa_loss=0.0001408, whisper_loss=0.08997, over 3817990.40 frames. ], batch size: 63, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:24:30,236 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 21 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-18 06:24:30,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3743530.0, ans=0.125 2024-08-18 06:24:53,807 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3743630.0, ans=0.125 2024-08-18 06:24:54,640 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 20 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-18 06:24:59,763 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 2250, loss[loss=0.1054, beats_loss=0.009561, ecapa_loss=0.0001491, whisper_loss=0.09438, over 19771.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01053, ecapa_loss=0.0001414, whisper_loss=0.09, over 3863445.66 frames. ], batch size: 78, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:25:03,371 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3743730.0, ans=0.0 2024-08-18 06:25:12,335 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3743730.0, ans=0.1 2024-08-18 06:25:21,366 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 36 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-18 06:25:34,058 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3743930.0, ans=0.1 2024-08-18 06:25:47,168 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.275e+01 2.589e+01 2.957e+01 4.064e+01, threshold=5.179e+01, percent-clipped=0.0 2024-08-18 06:26:11,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3744130.0, ans=0.1 2024-08-18 06:26:16,771 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3744130.0, ans=0.0 2024-08-18 06:26:20,401 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 2300, loss[loss=0.1149, beats_loss=0.009878, ecapa_loss=0.0001689, whisper_loss=0.1033, over 20640.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01053, ecapa_loss=0.0001415, whisper_loss=0.0907, over 3890040.26 frames. ], batch size: 87, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:26:31,644 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.21 vs. limit=6.0 2024-08-18 06:26:50,132 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.14 vs. limit=15.0 2024-08-18 06:26:51,138 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-18 06:27:18,408 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.57 vs. limit=12.0 2024-08-18 06:27:19,264 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 16 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-18 06:27:24,837 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3744630.0, ans=0.125 2024-08-18 06:27:29,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3744630.0, ans=0.09899494936611666 2024-08-18 06:27:42,119 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 2350, loss[loss=0.08416, beats_loss=0.01094, ecapa_loss=0.0001304, whisper_loss=0.07192, over 14986.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01056, ecapa_loss=0.0001423, whisper_loss=0.08985, over 3871323.00 frames. ], batch size: 56, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:27:47,639 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3744730.0, ans=0.1 2024-08-18 06:27:56,676 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 24 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-18 06:27:58,576 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-18 06:28:00,707 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.26 vs. limit=15.0 2024-08-18 06:28:16,285 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.389e+00 2024-08-18 06:28:24,039 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.27 vs. limit=22.5 2024-08-18 06:28:29,615 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.717e+01 2.243e+01 2.499e+01 2.727e+01 3.618e+01, threshold=4.998e+01, percent-clipped=0.0 2024-08-18 06:28:46,244 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3745130.0, ans=0.1 2024-08-18 06:28:54,052 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 19 from LS+wenet, 13 from Vox, 43 fro AS 2024-08-18 06:28:59,883 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3745130.0, ans=0.2 2024-08-18 06:29:03,538 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3745230.0, ans=0.0 2024-08-18 06:29:04,131 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.07 vs. limit=6.0 2024-08-18 06:29:04,268 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 2400, loss[loss=0.09749, beats_loss=0.01201, ecapa_loss=0.0001187, whisper_loss=0.08429, over 20203.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01055, ecapa_loss=0.0001424, whisper_loss=0.09002, over 3860695.95 frames. ], batch size: 75, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:29:12,148 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 20 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-18 06:29:15,879 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 06:29:38,175 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3745430.0, ans=0.1 2024-08-18 06:30:00,201 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3745530.0, ans=0.1 2024-08-18 06:30:04,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3745630.0, ans=0.07 2024-08-18 06:30:19,381 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 2450, loss[loss=0.0964, beats_loss=0.01018, ecapa_loss=0.0001828, whisper_loss=0.08439, over 18469.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01054, ecapa_loss=0.0001426, whisper_loss=0.09009, over 3839528.26 frames. ], batch size: 79, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:30:23,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3745730.0, ans=0.0 2024-08-18 06:30:35,681 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.28 vs. limit=15.0 2024-08-18 06:30:41,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3745830.0, ans=0.125 2024-08-18 06:31:08,963 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.751e+01 2.283e+01 2.536e+01 2.779e+01 5.169e+01, threshold=5.071e+01, percent-clipped=0.0 2024-08-18 06:31:09,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3746030.0, ans=0.125 2024-08-18 06:31:15,917 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=3746030.0, ans=0.5 2024-08-18 06:31:18,581 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 20 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-18 06:31:19,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3746030.0, ans=0.2 2024-08-18 06:31:20,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3746030.0, ans=0.2 2024-08-18 06:31:26,979 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3746130.0, ans=0.0 2024-08-18 06:31:32,156 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-18 06:31:37,079 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3746130.0, ans=0.2 2024-08-18 06:31:40,873 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 2500, loss[loss=0.09164, beats_loss=0.009015, ecapa_loss=0.0001622, whisper_loss=0.081, over 16686.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01053, ecapa_loss=0.0001426, whisper_loss=0.09017, over 3874099.81 frames. ], batch size: 64, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:31:41,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3746230.0, ans=0.0 2024-08-18 06:31:47,850 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-18 06:32:22,211 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-18 06:32:28,228 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3746530.0, ans=0.035 2024-08-18 06:32:48,975 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2024-08-18 06:32:49,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3746630.0, ans=0.125 2024-08-18 06:32:54,048 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 30 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-18 06:32:55,717 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3746630.0, ans=0.1 2024-08-18 06:32:57,805 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 2550, loss[loss=0.1162, beats_loss=0.0077, ecapa_loss=0.0001645, whisper_loss=0.1069, over 20952.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01048, ecapa_loss=0.000144, whisper_loss=0.09064, over 3903103.65 frames. ], batch size: 83, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:33:00,049 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3746730.0, ans=0.1 2024-08-18 06:33:26,737 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.03 vs. limit=15.0 2024-08-18 06:33:32,388 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3746930.0, ans=0.0 2024-08-18 06:33:32,399 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3746930.0, ans=0.0 2024-08-18 06:33:41,717 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.402e+01 2.684e+01 2.879e+01 3.926e+01, threshold=5.368e+01, percent-clipped=1.0 2024-08-18 06:33:44,859 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 17 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-18 06:33:49,839 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.25 vs. limit=15.0 2024-08-18 06:33:50,513 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-18 06:33:56,961 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3747130.0, ans=0.0 2024-08-18 06:34:13,265 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 2600, loss[loss=0.09034, beats_loss=0.01119, ecapa_loss=0.000163, whisper_loss=0.07752, over 13412.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01047, ecapa_loss=0.0001445, whisper_loss=0.09034, over 3905773.77 frames. ], batch size: 55, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:34:22,672 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3747230.0, ans=0.0 2024-08-18 06:34:29,995 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3747330.0, ans=0.0 2024-08-18 06:34:40,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3747330.0, ans=0.125 2024-08-18 06:34:43,448 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.18 vs. limit=15.0 2024-08-18 06:34:46,022 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-18 06:34:57,497 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=3747530.0, ans=0.02 2024-08-18 06:35:04,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3747530.0, ans=0.0 2024-08-18 06:35:08,170 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3747530.0, ans=0.1 2024-08-18 06:35:24,691 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 24 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-18 06:35:28,564 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3747730.0, ans=0.2 2024-08-18 06:35:29,328 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 2650, loss[loss=0.1137, beats_loss=0.01031, ecapa_loss=0.0001733, whisper_loss=0.1017, over 21922.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01051, ecapa_loss=0.0001452, whisper_loss=0.08986, over 3919405.87 frames. ], batch size: 91, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:35:32,267 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 21 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-18 06:35:38,095 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 14 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-18 06:35:45,138 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.53 vs. limit=15.0 2024-08-18 06:35:57,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3747930.0, ans=0.5 2024-08-18 06:35:59,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3747930.0, ans=0.125 2024-08-18 06:36:10,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3747930.0, ans=0.0 2024-08-18 06:36:12,416 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.431e+01 2.791e+01 3.155e+01 3.699e+02, threshold=5.582e+01, percent-clipped=1.0 2024-08-18 06:36:33,191 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3748130.0, ans=0.125 2024-08-18 06:36:37,001 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 34 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-18 06:36:46,374 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 2700, loss[loss=0.112, beats_loss=0.01018, ecapa_loss=0.0001441, whisper_loss=0.1003, over 14985.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01054, ecapa_loss=0.0001443, whisper_loss=0.08982, over 3921562.04 frames. ], batch size: 59, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:36:47,468 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.95 vs. limit=12.0 2024-08-18 06:36:54,187 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 23 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-18 06:36:57,355 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.99 vs. limit=6.0 2024-08-18 06:36:59,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3748230.0, ans=0.1 2024-08-18 06:37:13,291 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3748330.0, ans=0.1 2024-08-18 06:37:18,258 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.23 vs. limit=15.0 2024-08-18 06:37:24,120 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.39 vs. limit=22.5 2024-08-18 06:37:24,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=3748430.0, ans=15.0 2024-08-18 06:37:42,197 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-18 06:38:05,080 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 2750, loss[loss=0.08111, beats_loss=0.01272, ecapa_loss=0.0001051, whisper_loss=0.06734, over 14453.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01054, ecapa_loss=0.0001429, whisper_loss=0.09019, over 3898031.73 frames. ], batch size: 56, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:38:13,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3748730.0, ans=0.125 2024-08-18 06:38:37,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3748930.0, ans=0.2 2024-08-18 06:38:45,779 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.90 vs. limit=15.0 2024-08-18 06:38:51,048 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.293e+01 2.523e+01 2.815e+01 3.785e+01, threshold=5.047e+01, percent-clipped=0.0 2024-08-18 06:39:03,943 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 21 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-18 06:39:07,245 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 34 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-18 06:39:28,581 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 2800, loss[loss=0.09544, beats_loss=0.009173, ecapa_loss=0.000142, whisper_loss=0.08485, over 16821.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01054, ecapa_loss=0.0001436, whisper_loss=0.09009, over 3874373.73 frames. ], batch size: 64, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:39:53,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3749330.0, ans=0.0 2024-08-18 06:39:53,964 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-18 06:39:55,837 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 27 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-18 06:40:14,339 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=3749430.0, ans=15.0 2024-08-18 06:40:17,371 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 25 from LS+wenet, 19 from Vox, 14 fro AS 2024-08-18 06:40:32,611 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3749630.0, ans=0.125 2024-08-18 06:40:40,013 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3749630.0, ans=0.125 2024-08-18 06:40:51,341 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 2850, loss[loss=0.08782, beats_loss=0.01169, ecapa_loss=0.0001793, whisper_loss=0.07434, over 16406.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0105, ecapa_loss=0.0001434, whisper_loss=0.09083, over 3878119.22 frames. ], batch size: 68, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:41:03,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3749730.0, ans=0.0 2024-08-18 06:41:16,848 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3749830.0, ans=0.0 2024-08-18 06:41:36,489 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3749930.0, ans=0.125 2024-08-18 06:41:42,705 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.928e+01 2.356e+01 2.615e+01 2.993e+01 1.081e+02, threshold=5.230e+01, percent-clipped=3.0 2024-08-18 06:41:53,869 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3750030.0, ans=0.2 2024-08-18 06:41:59,130 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3750030.0, ans=0.1 2024-08-18 06:42:06,680 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-18 06:42:11,251 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 23 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-18 06:42:16,853 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 2900, loss[loss=0.11, beats_loss=0.01022, ecapa_loss=0.0001434, whisper_loss=0.09831, over 22596.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01055, ecapa_loss=0.0001451, whisper_loss=0.09079, over 3914260.44 frames. ], batch size: 94, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:42:33,537 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3750330.0, ans=0.0 2024-08-18 06:42:52,444 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 30 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-18 06:43:02,760 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3750530.0, ans=0.125 2024-08-18 06:43:04,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3750530.0, ans=0.2 2024-08-18 06:43:12,249 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.85 vs. limit=15.0 2024-08-18 06:43:17,778 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3750630.0, ans=0.125 2024-08-18 06:43:23,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3750630.0, ans=0.1 2024-08-18 06:43:26,978 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 25 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-18 06:43:29,694 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 2950, loss[loss=0.0871, beats_loss=0.01223, ecapa_loss=0.0001344, whisper_loss=0.07352, over 14485.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01059, ecapa_loss=0.0001445, whisper_loss=0.09026, over 3920686.09 frames. ], batch size: 58, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:43:55,477 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3750930.0, ans=0.0 2024-08-18 06:44:05,873 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-18 06:44:09,589 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.710e+01 2.348e+01 2.612e+01 2.875e+01 5.578e+01, threshold=5.225e+01, percent-clipped=1.0 2024-08-18 06:44:29,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3751130.0, ans=0.1 2024-08-18 06:44:35,898 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 3000, loss[loss=0.1171, beats_loss=0.01011, ecapa_loss=0.0001373, whisper_loss=0.1056, over 22779.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01059, ecapa_loss=0.0001447, whisper_loss=0.08958, over 3908779.69 frames. ], batch size: 90, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:44:35,899 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-18 06:45:05,381 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.4900, 4.0284, 3.8948, 3.6751], device='cuda:2') 2024-08-18 06:45:15,251 INFO [train_multi_KD3.py:1149] (2/4) Epoch 26, validation on ASR_libri: loss=0.2538, beats_loss=0, ecapa_loss=0.0005294, whisper_loss=0.2485, over 922467.00 frames. 2024-08-18 06:45:31,832 INFO [train_multi_KD3.py:1149] (2/4) Epoch 26, validation on SV_voxceleb1: loss=0.004081, beats_loss=0, ecapa_loss=0.0004081, whisper_loss=0, over 939242.00 frames. 2024-08-18 06:47:15,203 INFO [train_multi_KD3.py:1149] (2/4) Epoch 26, validation on AT_audioset: loss=0.02323, beats_loss=0.02323, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 06:47:15,208 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31584MB 2024-08-18 06:47:15,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3751230.0, ans=0.125 2024-08-18 06:47:27,520 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3751330.0, ans=0.0 2024-08-18 06:47:35,101 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 06:47:40,155 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 32 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-18 06:47:44,158 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 17 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-18 06:47:44,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3751430.0, ans=0.125 2024-08-18 06:47:58,934 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.32 vs. limit=6.0 2024-08-18 06:47:59,600 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 32 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-18 06:48:01,033 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3751530.0, ans=0.125 2024-08-18 06:48:04,987 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3751530.0, ans=0.0 2024-08-18 06:48:08,856 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-18 06:48:09,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3751630.0, ans=0.1 2024-08-18 06:48:17,232 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.67 vs. limit=10.0 2024-08-18 06:48:21,852 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 3050, loss[loss=0.1056, beats_loss=0.01208, ecapa_loss=0.0001457, whisper_loss=0.09202, over 22585.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01054, ecapa_loss=0.0001444, whisper_loss=0.09083, over 3929627.20 frames. ], batch size: 91, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:48:39,604 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-18 06:48:43,347 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 15 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-18 06:48:43,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3751830.0, ans=0.125 2024-08-18 06:48:46,141 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-18 06:48:55,878 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.19 vs. limit=6.0 2024-08-18 06:48:58,380 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3751930.0, ans=0.125 2024-08-18 06:49:01,926 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.301e+01 2.557e+01 2.884e+01 4.342e+01, threshold=5.114e+01, percent-clipped=0.0 2024-08-18 06:49:10,015 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-18 06:49:21,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3752130.0, ans=0.125 2024-08-18 06:49:22,612 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3752130.0, ans=0.1 2024-08-18 06:49:28,779 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 3100, loss[loss=0.1029, beats_loss=0.01372, ecapa_loss=0.0001142, whisper_loss=0.08805, over 15973.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01065, ecapa_loss=0.0001447, whisper_loss=0.09012, over 3898917.51 frames. ], batch size: 61, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:49:29,669 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.08 vs. limit=15.0 2024-08-18 06:49:29,816 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.74 vs. limit=15.0 2024-08-18 06:49:31,138 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.21 vs. limit=15.0 2024-08-18 06:49:33,708 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3752230.0, ans=0.125 2024-08-18 06:49:47,320 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=3752330.0, ans=0.2 2024-08-18 06:50:00,814 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.72 vs. limit=10.0 2024-08-18 06:50:05,965 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3752430.0, ans=0.1 2024-08-18 06:50:10,086 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3752530.0, ans=0.125 2024-08-18 06:50:22,060 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3752630.0, ans=0.0 2024-08-18 06:50:24,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3752630.0, ans=0.125 2024-08-18 06:50:36,419 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 3150, loss[loss=0.09759, beats_loss=0.01207, ecapa_loss=0.0001423, whisper_loss=0.0841, over 18975.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01066, ecapa_loss=0.0001451, whisper_loss=0.09036, over 3889675.57 frames. ], batch size: 80, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:51:05,838 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3752930.0, ans=0.2 2024-08-18 06:51:06,699 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 16 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-18 06:51:13,091 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 21 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-18 06:51:17,311 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.299e+01 2.506e+01 2.759e+01 4.272e+01, threshold=5.011e+01, percent-clipped=0.0 2024-08-18 06:51:17,641 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 29 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-18 06:51:35,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3753130.0, ans=0.2 2024-08-18 06:51:35,823 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3753130.0, ans=0.0 2024-08-18 06:51:42,356 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.97 vs. limit=15.0 2024-08-18 06:51:44,062 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 3200, loss[loss=0.0974, beats_loss=0.0101, ecapa_loss=0.0001403, whisper_loss=0.0859, over 20850.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0106, ecapa_loss=0.0001453, whisper_loss=0.09058, over 3874372.01 frames. ], batch size: 81, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:52:04,500 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.48 vs. limit=15.0 2024-08-18 06:52:05,381 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-18 06:52:18,980 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3753430.0, ans=0.035 2024-08-18 06:52:45,715 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3753630.0, ans=0.125 2024-08-18 06:52:48,432 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3753630.0, ans=0.0 2024-08-18 06:52:50,681 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 3250, loss[loss=0.09597, beats_loss=0.01172, ecapa_loss=0.0001167, whisper_loss=0.08308, over 16767.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01055, ecapa_loss=0.0001449, whisper_loss=0.09073, over 3842066.63 frames. ], batch size: 63, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:52:56,226 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 23 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-18 06:53:03,309 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.18 vs. limit=15.0 2024-08-18 06:53:28,212 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-18 06:53:30,578 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.721e+01 2.285e+01 2.505e+01 2.811e+01 5.288e+01, threshold=5.010e+01, percent-clipped=1.0 2024-08-18 06:53:30,960 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 35 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-18 06:53:32,107 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 13 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-18 06:53:43,956 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 23 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-18 06:53:50,808 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 16 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-18 06:53:57,391 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 3300, loss[loss=0.1172, beats_loss=0.01049, ecapa_loss=0.0001172, whisper_loss=0.1055, over 20568.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01059, ecapa_loss=0.0001439, whisper_loss=0.09056, over 3830681.17 frames. ], batch size: 78, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:53:59,321 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3754230.0, ans=0.125 2024-08-18 06:54:11,465 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3754330.0, ans=0.125 2024-08-18 06:54:23,604 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 16 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-18 06:54:31,800 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.13 vs. limit=10.0 2024-08-18 06:54:36,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3754430.0, ans=0.1 2024-08-18 06:54:41,773 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 17 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-18 06:54:43,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3754530.0, ans=0.125 2024-08-18 06:54:52,322 WARNING [optim.py:496] (2/4) Scaling gradients by 0.08354800194501877, model_norm_threshold=50.10049057006836 2024-08-18 06:54:52,490 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.20, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.148e+04, grad_sumsq=7.148e+04, orig_rms_sq=1.000e+00 2024-08-18 06:55:05,907 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 3350, loss[loss=0.1023, beats_loss=0.01052, ecapa_loss=0.0001504, whisper_loss=0.09024, over 21841.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01051, ecapa_loss=0.0001446, whisper_loss=0.09061, over 3806330.52 frames. ], batch size: 91, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:55:06,482 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3754730.0, ans=0.0 2024-08-18 06:55:25,399 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3754830.0, ans=0.1 2024-08-18 06:55:33,402 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3754930.0, ans=0.0 2024-08-18 06:55:38,564 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3754930.0, ans=0.125 2024-08-18 06:55:39,543 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 22 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-18 06:55:42,456 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-18 06:55:45,983 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.771e+01 2.324e+01 2.611e+01 2.907e+01 5.997e+02, threshold=5.222e+01, percent-clipped=2.0 2024-08-18 06:55:48,968 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-18 06:56:12,456 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 3400, loss[loss=0.084, beats_loss=0.01421, ecapa_loss=0.0001276, whisper_loss=0.06851, over 13988.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01055, ecapa_loss=0.000145, whisper_loss=0.09068, over 3850476.55 frames. ], batch size: 55, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:56:16,296 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2024-08-18 06:56:21,015 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 21 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-18 06:56:27,564 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-18 06:56:30,488 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.03 vs. limit=12.0 2024-08-18 06:56:43,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3755430.0, ans=0.1 2024-08-18 06:56:56,884 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.06 vs. limit=15.0 2024-08-18 06:57:26,972 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 3450, loss[loss=0.1165, beats_loss=0.01081, ecapa_loss=0.0001291, whisper_loss=0.1044, over 22606.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01049, ecapa_loss=0.0001452, whisper_loss=0.09099, over 3865585.40 frames. ], batch size: 88, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 06:57:53,404 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 23 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-18 06:58:21,271 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.293e+01 2.573e+01 2.898e+01 3.051e+02, threshold=5.147e+01, percent-clipped=2.0 2024-08-18 06:58:35,353 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 34 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-18 06:58:45,010 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 18 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-18 06:58:54,589 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 3500, loss[loss=0.09656, beats_loss=0.008246, ecapa_loss=0.0001763, whisper_loss=0.08655, over 17956.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01054, ecapa_loss=0.0001454, whisper_loss=0.09045, over 3881228.80 frames. ], batch size: 75, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 06:59:06,723 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3756230.0, ans=0.0 2024-08-18 06:59:24,771 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3756330.0, ans=0.125 2024-08-18 06:59:28,680 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3756330.0, ans=0.125 2024-08-18 06:59:31,141 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3756330.0, ans=0.1 2024-08-18 07:00:07,182 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3756530.0, ans=0.1 2024-08-18 07:00:13,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3756630.0, ans=0.0 2024-08-18 07:00:35,046 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 3550, loss[loss=0.08882, beats_loss=0.01128, ecapa_loss=0.0001347, whisper_loss=0.0762, over 22151.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01056, ecapa_loss=0.0001451, whisper_loss=0.09047, over 3906135.63 frames. ], batch size: 90, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:00:59,971 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3756830.0, ans=0.1 2024-08-18 07:01:01,910 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.70 vs. limit=15.0 2024-08-18 07:01:07,960 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-18 07:01:35,617 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.288e+01 2.493e+01 2.847e+01 8.952e+01, threshold=4.987e+01, percent-clipped=1.0 2024-08-18 07:01:46,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3757030.0, ans=0.0 2024-08-18 07:01:48,586 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-18 07:02:06,031 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.54 vs. limit=15.0 2024-08-18 07:02:09,416 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 3600, loss[loss=0.09321, beats_loss=0.01249, ecapa_loss=0.0001294, whisper_loss=0.07943, over 21558.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01052, ecapa_loss=0.0001459, whisper_loss=0.09032, over 3911361.22 frames. ], batch size: 89, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:02:26,788 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 19 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-18 07:02:39,258 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-18 07:02:43,765 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.60 vs. limit=22.5 2024-08-18 07:02:45,144 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=3.284e-01 2024-08-18 07:03:01,063 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-18 07:03:03,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3757630.0, ans=0.125 2024-08-18 07:03:12,277 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3757630.0, ans=0.125 2024-08-18 07:03:18,847 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 3650, loss[loss=0.1144, beats_loss=0.01173, ecapa_loss=0.0001334, whisper_loss=0.1014, over 22429.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01056, ecapa_loss=0.000145, whisper_loss=0.09003, over 3887658.71 frames. ], batch size: 91, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:03:38,956 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.58 vs. limit=8.0 2024-08-18 07:03:50,869 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-18 07:04:01,764 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.66 vs. limit=15.0 2024-08-18 07:04:02,325 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.285e+01 2.488e+01 2.673e+01 1.240e+02, threshold=4.975e+01, percent-clipped=2.0 2024-08-18 07:04:27,534 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3758130.0, ans=0.0 2024-08-18 07:04:29,709 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 3700, loss[loss=0.09619, beats_loss=0.009625, ecapa_loss=0.0001217, whisper_loss=0.08535, over 18463.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01054, ecapa_loss=0.0001467, whisper_loss=0.09046, over 3863411.64 frames. ], batch size: 71, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:04:32,873 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 21 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-18 07:04:42,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3758330.0, ans=0.0 2024-08-18 07:04:43,024 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.99 vs. limit=15.0 2024-08-18 07:04:47,811 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 33 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-18 07:04:48,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3758330.0, ans=0.1 2024-08-18 07:05:23,000 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-18 07:05:24,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3758630.0, ans=0.125 2024-08-18 07:05:39,457 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 3750, loss[loss=0.1083, beats_loss=0.01161, ecapa_loss=0.0001249, whisper_loss=0.09546, over 20146.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01052, ecapa_loss=0.0001467, whisper_loss=0.09041, over 3861111.26 frames. ], batch size: 81, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:05:54,215 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3758830.0, ans=0.0 2024-08-18 07:06:13,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3758930.0, ans=0.125 2024-08-18 07:06:16,516 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3758930.0, ans=0.95 2024-08-18 07:06:19,139 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3758930.0, ans=0.0 2024-08-18 07:06:22,855 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.005e+01 2.345e+01 2.587e+01 2.858e+01 2.449e+02, threshold=5.174e+01, percent-clipped=2.0 2024-08-18 07:06:36,857 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3759130.0, ans=0.0 2024-08-18 07:06:41,773 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 26 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-18 07:06:47,870 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 3800, loss[loss=0.07277, beats_loss=0.011, ecapa_loss=0.0001644, whisper_loss=0.06013, over 17414.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01061, ecapa_loss=0.0001471, whisper_loss=0.08956, over 3874731.36 frames. ], batch size: 69, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:06:53,700 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3759230.0, ans=0.0 2024-08-18 07:07:08,886 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 20 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-18 07:07:13,006 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 14 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-18 07:07:20,962 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3759430.0, ans=0.1 2024-08-18 07:07:32,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3759530.0, ans=0.1 2024-08-18 07:07:38,641 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3759630.0, ans=0.0 2024-08-18 07:07:42,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3759630.0, ans=0.125 2024-08-18 07:07:47,533 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.99 vs. limit=15.0 2024-08-18 07:07:47,561 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2024-08-18 07:07:51,204 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 17 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-18 07:07:52,045 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2024-08-18 07:07:52,251 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 3850, loss[loss=0.08708, beats_loss=0.01002, ecapa_loss=0.0001592, whisper_loss=0.07546, over 15799.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01065, ecapa_loss=0.0001468, whisper_loss=0.08908, over 3857417.64 frames. ], batch size: 66, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:07:53,898 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-18 07:08:10,477 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 17 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-18 07:08:21,171 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3759930.0, ans=0.125 2024-08-18 07:08:30,118 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 29 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-18 07:08:35,276 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.791e+01 2.367e+01 2.592e+01 2.770e+01 3.521e+02, threshold=5.184e+01, percent-clipped=1.0 2024-08-18 07:08:38,299 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3760030.0, ans=0.0 2024-08-18 07:08:59,102 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.56 vs. limit=22.5 2024-08-18 07:08:59,611 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 3900, loss[loss=0.08806, beats_loss=0.0119, ecapa_loss=0.0001653, whisper_loss=0.0745, over 20389.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01053, ecapa_loss=0.0001457, whisper_loss=0.09052, over 3873953.37 frames. ], batch size: 86, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:09:13,087 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3760330.0, ans=0.125 2024-08-18 07:09:14,227 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 14 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-18 07:09:16,643 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 29 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-18 07:09:25,063 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3760430.0, ans=0.125 2024-08-18 07:09:27,495 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-18 07:09:32,879 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3760430.0, ans=0.1 2024-08-18 07:09:33,108 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.14 vs. limit=6.0 2024-08-18 07:09:34,416 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.28 vs. limit=12.0 2024-08-18 07:09:39,463 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-18 07:10:04,607 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 3950, loss[loss=0.09647, beats_loss=0.009586, ecapa_loss=0.0001444, whisper_loss=0.08544, over 13424.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01055, ecapa_loss=0.0001454, whisper_loss=0.09076, over 3866375.38 frames. ], batch size: 53, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:10:05,166 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3760730.0, ans=0.1 2024-08-18 07:10:22,886 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3760830.0, ans=0.125 2024-08-18 07:10:29,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3760930.0, ans=0.1 2024-08-18 07:10:33,533 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3760930.0, ans=0.0 2024-08-18 07:10:44,188 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.379e+01 2.624e+01 2.905e+01 3.854e+01, threshold=5.247e+01, percent-clipped=0.0 2024-08-18 07:10:44,367 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 15 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-18 07:10:45,797 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 35 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-18 07:10:47,385 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3761030.0, ans=0.1 2024-08-18 07:10:58,970 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3761130.0, ans=0.125 2024-08-18 07:11:08,693 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 4000, loss[loss=0.1132, beats_loss=0.00899, ecapa_loss=0.0001646, whisper_loss=0.1025, over 21668.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01049, ecapa_loss=0.0001458, whisper_loss=0.09106, over 3889660.50 frames. ], batch size: 89, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:11:15,621 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3761230.0, ans=0.125 2024-08-18 07:11:23,174 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3761330.0, ans=0.0 2024-08-18 07:11:37,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3761430.0, ans=0.125 2024-08-18 07:11:44,214 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.29 vs. limit=15.0 2024-08-18 07:12:04,056 WARNING [optim.py:496] (2/4) Scaling gradients by 0.08813267201185226, model_norm_threshold=52.47489929199219 2024-08-18 07:12:04,223 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.12, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.229e+04, grad_sumsq=4.229e+04, orig_rms_sq=1.000e+00 2024-08-18 07:12:04,426 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-18 07:12:07,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3761630.0, ans=0.1 2024-08-18 07:12:13,326 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 4050, loss[loss=0.09784, beats_loss=0.008836, ecapa_loss=0.0001664, whisper_loss=0.08734, over 14675.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01042, ecapa_loss=0.0001467, whisper_loss=0.09157, over 3915134.02 frames. ], batch size: 57, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:12:39,955 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-18 07:12:45,346 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3761930.0, ans=0.125 2024-08-18 07:12:53,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3762030.0, ans=0.1 2024-08-18 07:12:54,497 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.935e+01 2.303e+01 2.573e+01 2.954e+01 5.954e+02, threshold=5.146e+01, percent-clipped=3.0 2024-08-18 07:12:55,261 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.70 vs. limit=15.0 2024-08-18 07:13:18,388 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 4100, loss[loss=0.1139, beats_loss=0.009596, ecapa_loss=0.000174, whisper_loss=0.1025, over 23132.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01045, ecapa_loss=0.0001466, whisper_loss=0.09173, over 3923542.35 frames. ], batch size: 92, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:13:21,146 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3762230.0, ans=0.125 2024-08-18 07:13:30,474 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3762330.0, ans=0.2 2024-08-18 07:13:52,622 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.72 vs. limit=22.5 2024-08-18 07:13:53,442 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 20 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-18 07:14:03,271 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-18 07:14:06,099 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3762530.0, ans=0.125 2024-08-18 07:14:22,251 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 4150, loss[loss=0.1219, beats_loss=0.008808, ecapa_loss=0.0001348, whisper_loss=0.1118, over 18935.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01054, ecapa_loss=0.0001463, whisper_loss=0.0909, over 3908172.11 frames. ], batch size: 73, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:14:24,070 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3762730.0, ans=0.1 2024-08-18 07:15:01,862 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.335e+01 2.524e+01 2.797e+01 3.665e+01, threshold=5.048e+01, percent-clipped=0.0 2024-08-18 07:15:10,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3763030.0, ans=0.125 2024-08-18 07:15:11,188 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-18 07:15:12,945 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3763130.0, ans=0.0 2024-08-18 07:15:26,651 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 4200, loss[loss=0.1019, beats_loss=0.01094, ecapa_loss=0.0001368, whisper_loss=0.08958, over 18848.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01056, ecapa_loss=0.0001462, whisper_loss=0.09047, over 3901266.23 frames. ], batch size: 75, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:15:41,215 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 17 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-18 07:15:47,645 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 21 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-18 07:16:01,798 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 38 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-18 07:16:03,464 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.43 vs. limit=15.0 2024-08-18 07:16:09,290 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-18 07:16:11,076 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.24 vs. limit=10.0 2024-08-18 07:16:28,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3763630.0, ans=0.2 2024-08-18 07:16:30,495 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 4250, loss[loss=0.1412, beats_loss=0.006038, ecapa_loss=0.000166, whisper_loss=0.1335, over 24572.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01055, ecapa_loss=0.0001461, whisper_loss=0.09051, over 3919035.22 frames. ], batch size: 91, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:16:36,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3763730.0, ans=0.125 2024-08-18 07:16:55,153 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 19 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-18 07:17:09,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3764030.0, ans=0.0 2024-08-18 07:17:10,781 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.296e+01 2.590e+01 3.069e+01 5.204e+01, threshold=5.180e+01, percent-clipped=2.0 2024-08-18 07:17:21,414 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 11 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-18 07:17:23,384 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.23 vs. limit=15.0 2024-08-18 07:17:32,845 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 36 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-18 07:17:34,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3764230.0, ans=0.125 2024-08-18 07:17:35,006 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 4300, loss[loss=0.1109, beats_loss=0.01048, ecapa_loss=0.0001286, whisper_loss=0.09918, over 20128.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01047, ecapa_loss=0.0001466, whisper_loss=0.09088, over 3887956.68 frames. ], batch size: 78, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:17:35,713 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=15.34 vs. limit=15.0 2024-08-18 07:17:45,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3764230.0, ans=0.125 2024-08-18 07:18:04,523 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.77 vs. limit=15.0 2024-08-18 07:18:14,474 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3764530.0, ans=0.125 2024-08-18 07:18:15,683 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3764530.0, ans=0.1 2024-08-18 07:18:28,492 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-18 07:18:34,986 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3764630.0, ans=0.2 2024-08-18 07:18:39,527 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 4350, loss[loss=0.1012, beats_loss=0.01055, ecapa_loss=0.0001348, whisper_loss=0.08934, over 17907.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01044, ecapa_loss=0.0001473, whisper_loss=0.09016, over 3847550.52 frames. ], batch size: 67, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:18:40,466 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.24 vs. limit=15.0 2024-08-18 07:18:41,343 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3764730.0, ans=0.125 2024-08-18 07:18:57,433 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.49 vs. limit=22.5 2024-08-18 07:18:58,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3764830.0, ans=0.1 2024-08-18 07:19:00,469 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 16 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-18 07:19:00,899 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3764830.0, ans=0.125 2024-08-18 07:19:02,236 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3764830.0, ans=0.1 2024-08-18 07:19:05,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3764930.0, ans=0.125 2024-08-18 07:19:18,993 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.231e+01 2.499e+01 2.816e+01 6.235e+01, threshold=4.999e+01, percent-clipped=1.0 2024-08-18 07:19:29,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3765130.0, ans=0.125 2024-08-18 07:19:40,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3765130.0, ans=0.2 2024-08-18 07:19:43,380 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 4400, loss[loss=0.1127, beats_loss=0.006319, ecapa_loss=0.000165, whisper_loss=0.1048, over 19711.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01041, ecapa_loss=0.0001466, whisper_loss=0.08968, over 3826649.70 frames. ], batch size: 79, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:19:44,722 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 22 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-18 07:19:50,819 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.28 vs. limit=15.0 2024-08-18 07:19:57,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=3765330.0, ans=0.05 2024-08-18 07:20:09,665 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 27 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-18 07:20:26,021 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-18 07:20:26,390 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3765530.0, ans=0.125 2024-08-18 07:20:27,491 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3765530.0, ans=0.125 2024-08-18 07:20:34,614 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 18 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-18 07:20:47,397 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 4450, loss[loss=0.09635, beats_loss=0.01027, ecapa_loss=0.0001611, whisper_loss=0.08447, over 19548.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01049, ecapa_loss=0.0001461, whisper_loss=0.08904, over 3858969.90 frames. ], batch size: 84, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:20:50,328 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3765730.0, ans=0.0 2024-08-18 07:21:12,501 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-18 07:21:16,166 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 27 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-18 07:21:19,075 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 18 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-18 07:21:26,930 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.341e+01 2.651e+01 2.929e+01 6.825e+01, threshold=5.301e+01, percent-clipped=1.0 2024-08-18 07:21:32,862 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2024-08-18 07:21:33,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3766030.0, ans=0.0 2024-08-18 07:21:45,766 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3766130.0, ans=0.125 2024-08-18 07:21:51,496 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 4500, loss[loss=0.1108, beats_loss=0.009559, ecapa_loss=0.0001409, whisper_loss=0.09988, over 23526.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01046, ecapa_loss=0.0001458, whisper_loss=0.08954, over 3902353.75 frames. ], batch size: 94, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:21:51,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3766230.0, ans=0.125 2024-08-18 07:22:02,953 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 20 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-18 07:22:05,870 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.139e+05 2024-08-18 07:22:10,695 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-18 07:22:18,766 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3766430.0, ans=0.0 2024-08-18 07:22:23,479 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-18 07:22:27,333 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 22 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-18 07:22:27,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3766430.0, ans=0.0 2024-08-18 07:22:31,277 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 22 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-18 07:22:40,184 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-18 07:22:44,033 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 21 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-18 07:22:55,340 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 4550, loss[loss=0.1058, beats_loss=0.00957, ecapa_loss=0.0001454, whisper_loss=0.09482, over 21294.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01051, ecapa_loss=0.0001454, whisper_loss=0.08966, over 3889307.16 frames. ], batch size: 87, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:22:55,788 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3766730.0, ans=0.0 2024-08-18 07:22:58,347 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3766730.0, ans=0.1 2024-08-18 07:23:13,911 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3766830.0, ans=0.0 2024-08-18 07:23:20,355 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3766930.0, ans=0.05 2024-08-18 07:23:21,543 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3766930.0, ans=0.0 2024-08-18 07:23:31,924 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3766930.0, ans=0.125 2024-08-18 07:23:35,409 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.816e+01 2.270e+01 2.466e+01 2.642e+01 3.409e+01, threshold=4.932e+01, percent-clipped=0.0 2024-08-18 07:23:38,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3767030.0, ans=0.125 2024-08-18 07:23:53,411 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3767130.0, ans=0.125 2024-08-18 07:23:59,818 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 4600, loss[loss=0.1072, beats_loss=0.009311, ecapa_loss=0.0001735, whisper_loss=0.0962, over 21468.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01048, ecapa_loss=0.0001448, whisper_loss=0.09, over 3863546.41 frames. ], batch size: 86, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:24:12,093 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 07:24:34,431 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3767430.0, ans=0.125 2024-08-18 07:24:37,184 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-18 07:25:04,723 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 4650, loss[loss=0.0777, beats_loss=0.01332, ecapa_loss=0.0001163, whisper_loss=0.06322, over 14713.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01043, ecapa_loss=0.0001452, whisper_loss=0.0905, over 3847845.57 frames. ], batch size: 60, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:25:06,163 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 20 from LS+wenet, 30 from Vox, 41 fro AS 2024-08-18 07:25:23,049 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3767830.0, ans=0.125 2024-08-18 07:25:38,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3767930.0, ans=0.07 2024-08-18 07:25:44,556 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.243e+01 2.444e+01 2.788e+01 4.840e+01, threshold=4.887e+01, percent-clipped=0.0 2024-08-18 07:25:51,066 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3768030.0, ans=0.125 2024-08-18 07:25:55,130 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-18 07:25:55,447 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.917e+05 2024-08-18 07:25:59,272 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3768130.0, ans=0.0 2024-08-18 07:26:08,882 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 4700, loss[loss=0.09134, beats_loss=0.01012, ecapa_loss=0.000178, whisper_loss=0.07944, over 20341.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01044, ecapa_loss=0.0001457, whisper_loss=0.09019, over 3851431.23 frames. ], batch size: 88, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:26:14,652 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3768230.0, ans=0.125 2024-08-18 07:26:20,191 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3768230.0, ans=0.125 2024-08-18 07:26:23,654 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3768330.0, ans=0.125 2024-08-18 07:26:31,104 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3768330.0, ans=0.125 2024-08-18 07:26:46,660 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3768530.0, ans=0.125 2024-08-18 07:26:54,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3768530.0, ans=0.1 2024-08-18 07:27:03,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3768630.0, ans=0.1 2024-08-18 07:27:06,415 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-18 07:27:07,653 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 23 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-18 07:27:12,722 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 4750, loss[loss=0.116, beats_loss=0.009076, ecapa_loss=0.0001449, whisper_loss=0.1054, over 18594.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01047, ecapa_loss=0.0001455, whisper_loss=0.09024, over 3865113.12 frames. ], batch size: 71, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:27:23,211 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 26 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-18 07:27:24,641 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3768830.0, ans=0.1 2024-08-18 07:27:25,514 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 19 from LS+wenet, 19 from Vox, 17 fro AS 2024-08-18 07:27:26,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3768830.0, ans=0.125 2024-08-18 07:27:30,882 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3768830.0, ans=0.125 2024-08-18 07:27:36,814 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 21 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-18 07:27:52,003 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.283e+01 2.543e+01 2.859e+01 8.026e+01, threshold=5.085e+01, percent-clipped=1.0 2024-08-18 07:27:56,276 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.70 vs. limit=10.0 2024-08-18 07:27:58,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3769030.0, ans=0.125 2024-08-18 07:28:08,668 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3769130.0, ans=0.0 2024-08-18 07:28:15,462 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 4800, loss[loss=0.09515, beats_loss=0.01158, ecapa_loss=0.0001592, whisper_loss=0.08198, over 16995.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01057, ecapa_loss=0.0001452, whisper_loss=0.08981, over 3864811.47 frames. ], batch size: 69, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:28:22,217 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3769230.0, ans=0.025 2024-08-18 07:28:29,724 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 21 from LS+wenet, 18 from Vox, 49 fro AS 2024-08-18 07:28:47,778 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 26 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-18 07:29:07,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3769630.0, ans=0.0 2024-08-18 07:29:16,065 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3769630.0, ans=0.125 2024-08-18 07:29:19,651 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 4850, loss[loss=0.09925, beats_loss=0.0102, ecapa_loss=0.0001271, whisper_loss=0.08778, over 17168.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01055, ecapa_loss=0.0001457, whisper_loss=0.08985, over 3872685.56 frames. ], batch size: 65, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:29:23,803 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3769730.0, ans=0.125 2024-08-18 07:29:35,133 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-18 07:29:44,005 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-18 07:29:46,869 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 16 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-18 07:29:48,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3769930.0, ans=0.2 2024-08-18 07:29:54,465 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-18 07:29:59,168 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.841e+01 2.248e+01 2.515e+01 2.808e+01 3.681e+01, threshold=5.031e+01, percent-clipped=0.0 2024-08-18 07:30:10,036 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 14 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-18 07:30:23,890 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 4900, loss[loss=0.09442, beats_loss=0.01023, ecapa_loss=0.0001936, whisper_loss=0.08225, over 13501.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01058, ecapa_loss=0.0001446, whisper_loss=0.08952, over 3844190.46 frames. ], batch size: 58, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:30:41,920 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-18 07:30:58,866 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3770430.0, ans=0.2 2024-08-18 07:30:59,767 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 19 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-18 07:31:00,890 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 33 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-18 07:31:06,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3770530.0, ans=0.1 2024-08-18 07:31:07,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3770530.0, ans=0.125 2024-08-18 07:31:09,168 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3770530.0, ans=0.2 2024-08-18 07:31:10,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3770530.0, ans=0.2 2024-08-18 07:31:16,671 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3770630.0, ans=0.125 2024-08-18 07:31:16,739 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3770630.0, ans=0.2 2024-08-18 07:31:21,532 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 18 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-18 07:31:22,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3770630.0, ans=0.0 2024-08-18 07:31:25,437 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 35 from LS+wenet, 10 from Vox, 36 fro AS 2024-08-18 07:31:27,759 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 4950, loss[loss=0.1049, beats_loss=0.01059, ecapa_loss=0.0001712, whisper_loss=0.09265, over 21746.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01052, ecapa_loss=0.0001451, whisper_loss=0.08984, over 3835891.04 frames. ], batch size: 92, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:31:29,362 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3770730.0, ans=0.125 2024-08-18 07:31:33,100 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3770730.0, ans=0.0 2024-08-18 07:31:44,124 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-18 07:31:54,292 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3770930.0, ans=0.125 2024-08-18 07:31:59,368 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3770930.0, ans=0.1 2024-08-18 07:32:03,225 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3770930.0, ans=0.125 2024-08-18 07:32:06,688 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.529e+01 2.313e+01 2.600e+01 2.856e+01 8.113e+01, threshold=5.200e+01, percent-clipped=1.0 2024-08-18 07:32:26,970 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.54 vs. limit=15.0 2024-08-18 07:32:31,250 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 5000, loss[loss=0.1271, beats_loss=0.009555, ecapa_loss=0.0001475, whisper_loss=0.1161, over 22971.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01051, ecapa_loss=0.0001442, whisper_loss=0.08987, over 3830472.44 frames. ], batch size: 88, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:32:41,356 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-18 07:32:43,142 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3771330.0, ans=0.0 2024-08-18 07:33:20,153 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3771530.0, ans=0.0 2024-08-18 07:33:29,699 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.71 vs. limit=15.0 2024-08-18 07:33:30,325 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 23 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-18 07:33:35,101 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 5050, loss[loss=0.09572, beats_loss=0.01271, ecapa_loss=0.0001277, whisper_loss=0.08173, over 20695.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01059, ecapa_loss=0.0001443, whisper_loss=0.0896, over 3837604.07 frames. ], batch size: 82, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:33:42,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3771730.0, ans=0.125 2024-08-18 07:33:47,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3771830.0, ans=0.04949747468305833 2024-08-18 07:33:49,851 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-18 07:34:01,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3771930.0, ans=0.1 2024-08-18 07:34:14,151 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 23 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-18 07:34:14,832 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=3772030.0, ans=15.0 2024-08-18 07:34:15,378 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.763e+01 2.257e+01 2.473e+01 2.747e+01 4.109e+01, threshold=4.946e+01, percent-clipped=0.0 2024-08-18 07:34:20,129 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.14 vs. limit=15.0 2024-08-18 07:34:26,420 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.04 vs. limit=15.0 2024-08-18 07:34:31,621 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.59 vs. limit=10.0 2024-08-18 07:34:39,437 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 5100, loss[loss=0.09745, beats_loss=0.009791, ecapa_loss=0.0001487, whisper_loss=0.08617, over 16570.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01062, ecapa_loss=0.000144, whisper_loss=0.08998, over 3834100.00 frames. ], batch size: 65, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:34:41,089 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-18 07:34:47,679 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3772230.0, ans=0.0 2024-08-18 07:34:48,898 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-18 07:34:56,965 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3772330.0, ans=0.04949747468305833 2024-08-18 07:35:05,684 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 21 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-18 07:35:08,082 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 18 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-18 07:35:18,372 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 20 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-18 07:35:32,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3772630.0, ans=0.125 2024-08-18 07:35:41,940 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3772630.0, ans=0.0 2024-08-18 07:35:43,893 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 5150, loss[loss=0.08772, beats_loss=0.01216, ecapa_loss=0.0001735, whisper_loss=0.07382, over 21776.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01056, ecapa_loss=0.0001436, whisper_loss=0.09015, over 3849097.03 frames. ], batch size: 91, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:36:04,099 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3772830.0, ans=0.125 2024-08-18 07:36:08,721 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 21 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-18 07:36:12,516 WARNING [optim.py:496] (2/4) Scaling gradients by 0.06933695822954178, model_norm_threshold=49.455631256103516 2024-08-18 07:36:12,683 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.0.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.293e+04, grad_sumsq=7.293e+04, orig_rms_sq=1.000e+00 2024-08-18 07:36:24,070 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.841e+01 2.306e+01 2.541e+01 2.832e+01 7.133e+02, threshold=5.082e+01, percent-clipped=3.0 2024-08-18 07:36:29,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3773030.0, ans=0.125 2024-08-18 07:36:37,789 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 13 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-18 07:36:39,661 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.27 vs. limit=15.0 2024-08-18 07:36:48,915 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 5200, loss[loss=0.1079, beats_loss=0.00953, ecapa_loss=0.0001601, whisper_loss=0.09673, over 22960.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01054, ecapa_loss=0.0001433, whisper_loss=0.09026, over 3870942.46 frames. ], batch size: 92, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:36:52,781 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 26 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-18 07:36:54,688 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3773230.0, ans=0.0 2024-08-18 07:36:57,225 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 39 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-18 07:36:57,407 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3773230.0, ans=0.2 2024-08-18 07:37:00,024 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 29 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-18 07:37:10,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3773330.0, ans=0.0 2024-08-18 07:37:13,075 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.53 vs. limit=15.0 2024-08-18 07:37:16,777 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 23 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-18 07:37:34,694 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.17 vs. limit=22.5 2024-08-18 07:37:37,001 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3773530.0, ans=0.125 2024-08-18 07:37:46,089 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3773630.0, ans=0.125 2024-08-18 07:37:46,986 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-18 07:37:54,551 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 5250, loss[loss=0.09116, beats_loss=0.01169, ecapa_loss=0.0001476, whisper_loss=0.078, over 21251.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01055, ecapa_loss=0.0001436, whisper_loss=0.09034, over 3882444.06 frames. ], batch size: 89, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:37:56,538 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 07:38:09,759 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 31 from LS+wenet, 7 from Vox, 25 fro AS 2024-08-18 07:38:09,999 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3773830.0, ans=0.125 2024-08-18 07:38:21,060 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 13 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-18 07:38:38,764 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 40 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-18 07:38:39,602 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.656e+01 2.417e+01 2.630e+01 2.894e+01 3.805e+02, threshold=5.260e+01, percent-clipped=2.0 2024-08-18 07:39:02,572 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 29 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-18 07:39:05,625 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 5300, loss[loss=0.1188, beats_loss=0.009672, ecapa_loss=0.0001523, whisper_loss=0.1076, over 20142.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01051, ecapa_loss=0.0001436, whisper_loss=0.09097, over 3896587.99 frames. ], batch size: 79, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:39:09,671 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3774230.0, ans=0.0 2024-08-18 07:39:17,432 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3774230.0, ans=0.125 2024-08-18 07:39:28,401 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 16 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-18 07:39:34,585 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.17 vs. limit=15.0 2024-08-18 07:39:35,365 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3774430.0, ans=0.125 2024-08-18 07:39:44,900 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-18 07:39:45,207 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3774430.0, ans=0.1 2024-08-18 07:39:57,897 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.41 vs. limit=22.5 2024-08-18 07:40:13,023 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 35 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-18 07:40:15,250 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 5350, loss[loss=0.09632, beats_loss=0.01313, ecapa_loss=0.0001011, whisper_loss=0.08218, over 16860.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01046, ecapa_loss=0.000144, whisper_loss=0.09075, over 3846758.91 frames. ], batch size: 64, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:40:17,185 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.11 vs. limit=12.0 2024-08-18 07:40:22,142 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3774730.0, ans=0.05 2024-08-18 07:40:55,421 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.637e+01 2.297e+01 2.541e+01 2.871e+01 2.091e+02, threshold=5.082e+01, percent-clipped=1.0 2024-08-18 07:40:56,721 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 24 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-18 07:40:57,052 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3775030.0, ans=0.1 2024-08-18 07:41:00,794 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3775030.0, ans=0.125 2024-08-18 07:41:18,601 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.85 vs. limit=15.0 2024-08-18 07:41:19,106 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 5400, loss[loss=0.09932, beats_loss=0.00997, ecapa_loss=0.0001546, whisper_loss=0.0878, over 22780.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01044, ecapa_loss=0.0001442, whisper_loss=0.09066, over 3876104.44 frames. ], batch size: 94, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:41:26,977 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 26 from LS+wenet, 32 from Vox, 37 fro AS 2024-08-18 07:41:32,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3775330.0, ans=0.125 2024-08-18 07:41:46,520 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3775430.0, ans=0.0 2024-08-18 07:41:52,987 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3775430.0, ans=0.2 2024-08-18 07:42:08,726 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3775530.0, ans=0.125 2024-08-18 07:42:12,480 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-18 07:42:23,627 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 5450, loss[loss=0.09369, beats_loss=0.008259, ecapa_loss=0.0001635, whisper_loss=0.0838, over 19589.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0104, ecapa_loss=0.0001447, whisper_loss=0.09067, over 3854894.48 frames. ], batch size: 77, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 07:42:31,002 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3775730.0, ans=0.0 2024-08-18 07:42:44,784 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3775830.0, ans=0.0 2024-08-18 07:42:49,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3775930.0, ans=0.0 2024-08-18 07:43:03,288 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.721e+01 2.317e+01 2.510e+01 2.748e+01 4.518e+01, threshold=5.020e+01, percent-clipped=0.0 2024-08-18 07:43:16,530 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3776130.0, ans=0.125 2024-08-18 07:43:17,778 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3776130.0, ans=0.0 2024-08-18 07:43:18,042 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.66 vs. limit=12.0 2024-08-18 07:43:27,237 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 5500, loss[loss=0.09742, beats_loss=0.01088, ecapa_loss=0.0001361, whisper_loss=0.08518, over 21948.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01045, ecapa_loss=0.0001446, whisper_loss=0.09046, over 3864174.43 frames. ], batch size: 90, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 07:43:40,096 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3776330.0, ans=0.04949747468305833 2024-08-18 07:43:43,839 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=3776330.0, ans=0.1 2024-08-18 07:43:44,677 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 16 from LS+wenet, 27 from Vox, 23 fro AS 2024-08-18 07:43:47,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3776330.0, ans=0.0 2024-08-18 07:43:48,513 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 24 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-18 07:44:00,099 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3776430.0, ans=0.125 2024-08-18 07:44:05,081 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3776530.0, ans=0.125 2024-08-18 07:44:21,541 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.04 vs. limit=15.0 2024-08-18 07:44:29,292 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 5550, loss[loss=0.1085, beats_loss=0.01124, ecapa_loss=0.0001303, whisper_loss=0.09599, over 22481.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01044, ecapa_loss=0.0001455, whisper_loss=0.09105, over 3861903.03 frames. ], batch size: 88, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 07:44:37,995 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3776730.0, ans=0.125 2024-08-18 07:44:42,906 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-18 07:44:47,754 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.76 vs. limit=15.0 2024-08-18 07:45:08,296 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.307e+01 2.560e+01 2.922e+01 7.758e+01, threshold=5.121e+01, percent-clipped=1.0 2024-08-18 07:45:31,911 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 5600, loss[loss=0.09997, beats_loss=0.0115, ecapa_loss=0.0001624, whisper_loss=0.08685, over 22029.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0104, ecapa_loss=0.0001454, whisper_loss=0.09138, over 3877857.97 frames. ], batch size: 92, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 07:45:34,723 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3777230.0, ans=0.0 2024-08-18 07:45:38,535 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2024-08-18 07:46:07,088 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-18 07:46:18,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3777530.0, ans=0.125 2024-08-18 07:46:23,100 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-18 07:46:30,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3777630.0, ans=0.95 2024-08-18 07:46:33,460 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 5650, loss[loss=0.09185, beats_loss=0.01074, ecapa_loss=0.0001332, whisper_loss=0.07978, over 22871.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0105, ecapa_loss=0.0001447, whisper_loss=0.09031, over 3850779.42 frames. ], batch size: 90, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 07:46:40,157 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3777730.0, ans=0.0 2024-08-18 07:46:48,015 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.75 vs. limit=15.0 2024-08-18 07:46:56,010 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-18 07:47:05,946 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 29 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-18 07:47:08,886 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3777930.0, ans=0.125 2024-08-18 07:47:12,367 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+01 2.355e+01 2.678e+01 2.961e+01 4.241e+01, threshold=5.356e+01, percent-clipped=0.0 2024-08-18 07:47:34,705 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 26 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-18 07:47:35,842 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 5700, loss[loss=0.1313, beats_loss=0.007565, ecapa_loss=0.0001508, whisper_loss=0.1222, over 16897.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01035, ecapa_loss=0.0001467, whisper_loss=0.09131, over 3859609.21 frames. ], batch size: 66, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 07:48:05,120 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 31 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-18 07:48:19,136 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3778430.0, ans=0.125 2024-08-18 07:48:22,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3778530.0, ans=0.2 2024-08-18 07:48:25,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3778530.0, ans=0.125 2024-08-18 07:48:29,960 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-18 07:48:36,134 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3778530.0, ans=0.0 2024-08-18 07:48:56,264 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 5750, loss[loss=0.1145, beats_loss=0.008361, ecapa_loss=0.000132, whisper_loss=0.1048, over 15014.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01043, ecapa_loss=0.0001466, whisper_loss=0.09068, over 3869853.01 frames. ], batch size: 56, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 07:49:40,332 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.38 vs. limit=22.5 2024-08-18 07:49:53,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3779030.0, ans=0.1 2024-08-18 07:49:54,440 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.413e+01 2.637e+01 2.878e+01 7.764e+01, threshold=5.274e+01, percent-clipped=1.0 2024-08-18 07:50:28,918 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 5800, loss[loss=0.102, beats_loss=0.009272, ecapa_loss=0.0001558, whisper_loss=0.09121, over 16983.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01036, ecapa_loss=0.0001477, whisper_loss=0.09043, over 3853626.74 frames. ], batch size: 66, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 07:51:12,768 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 30 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-18 07:51:13,022 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3779430.0, ans=0.125 2024-08-18 07:51:14,158 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 20 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-18 07:51:14,489 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.523e+01 2024-08-18 07:51:19,059 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.49 vs. limit=15.0 2024-08-18 07:51:20,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3779430.0, ans=0.09899494936611666 2024-08-18 07:51:22,163 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.96 vs. limit=22.5 2024-08-18 07:51:30,413 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-18 07:52:04,369 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 5850, loss[loss=0.1084, beats_loss=0.009118, ecapa_loss=0.0001616, whisper_loss=0.09768, over 22266.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01048, ecapa_loss=0.0001478, whisper_loss=0.08942, over 3885911.60 frames. ], batch size: 89, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 07:52:08,033 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3779730.0, ans=0.0 2024-08-18 07:52:09,387 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 26 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-18 07:52:22,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3779830.0, ans=0.125 2024-08-18 07:52:28,522 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3779830.0, ans=0.0 2024-08-18 07:52:32,917 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3779930.0, ans=0.125 2024-08-18 07:52:50,347 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.605e+01 2.256e+01 2.447e+01 2.728e+01 3.298e+01, threshold=4.893e+01, percent-clipped=0.0 2024-08-18 07:53:03,210 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 28 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-18 07:53:05,414 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3780130.0, ans=0.0 2024-08-18 07:53:17,949 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 5900, loss[loss=0.1141, beats_loss=0.01071, ecapa_loss=0.0001299, whisper_loss=0.1021, over 18292.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01048, ecapa_loss=0.0001478, whisper_loss=0.0892, over 3885399.57 frames. ], batch size: 70, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 07:53:26,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3780230.0, ans=0.0 2024-08-18 07:53:27,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3780230.0, ans=0.2 2024-08-18 07:54:13,041 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-18 07:54:17,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3780630.0, ans=0.1 2024-08-18 07:54:18,124 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.06 vs. limit=15.0 2024-08-18 07:54:25,189 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3780630.0, ans=0.0 2024-08-18 07:54:30,162 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 5950, loss[loss=0.09684, beats_loss=0.009069, ecapa_loss=0.0001487, whisper_loss=0.08628, over 15516.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01058, ecapa_loss=0.0001477, whisper_loss=0.08858, over 3866744.53 frames. ], batch size: 59, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 07:54:46,192 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2024-08-18 07:54:47,712 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 31 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-18 07:55:08,821 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.06 vs. limit=10.0 2024-08-18 07:55:15,692 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.895e+01 2.332e+01 2.559e+01 2.973e+01 4.806e+01, threshold=5.118e+01, percent-clipped=0.0 2024-08-18 07:55:31,751 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3781130.0, ans=0.1 2024-08-18 07:55:45,371 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 6000, loss[loss=0.0929, beats_loss=0.009675, ecapa_loss=0.0001583, whisper_loss=0.08164, over 15443.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01051, ecapa_loss=0.0001472, whisper_loss=0.08967, over 3869652.45 frames. ], batch size: 62, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 07:55:45,371 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-18 07:56:22,509 INFO [train_multi_KD3.py:1149] (2/4) Epoch 26, validation on ASR_libri: loss=0.2523, beats_loss=0, ecapa_loss=0.0005188, whisper_loss=0.2471, over 922467.00 frames. 2024-08-18 07:56:37,654 INFO [train_multi_KD3.py:1149] (2/4) Epoch 26, validation on SV_voxceleb1: loss=0.004005, beats_loss=0, ecapa_loss=0.0004005, whisper_loss=0, over 939242.00 frames. 2024-08-18 07:58:26,877 INFO [train_multi_KD3.py:1149] (2/4) Epoch 26, validation on AT_audioset: loss=0.0231, beats_loss=0.0231, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 07:58:26,880 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31584MB 2024-08-18 07:58:54,136 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 17 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-18 07:58:57,593 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 19 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-18 07:58:59,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3781430.0, ans=0.125 2024-08-18 07:59:15,681 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 22 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-18 07:59:15,971 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3781530.0, ans=0.2 2024-08-18 07:59:39,131 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 6050, loss[loss=0.1108, beats_loss=0.007854, ecapa_loss=0.0001704, whisper_loss=0.1012, over 18157.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01061, ecapa_loss=0.000146, whisper_loss=0.08878, over 3843602.39 frames. ], batch size: 71, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:00:09,200 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.27 vs. limit=15.0 2024-08-18 08:00:10,255 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3781930.0, ans=0.04949747468305833 2024-08-18 08:00:14,879 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3781930.0, ans=0.125 2024-08-18 08:00:15,866 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 30 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-18 08:00:17,782 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 20 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-18 08:00:19,720 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.42 vs. limit=22.5 2024-08-18 08:00:22,644 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.816e+01 2.283e+01 2.534e+01 2.790e+01 4.412e+01, threshold=5.068e+01, percent-clipped=0.0 2024-08-18 08:00:33,807 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 22 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-18 08:00:36,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3782130.0, ans=0.125 2024-08-18 08:00:49,569 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 6100, loss[loss=0.08632, beats_loss=0.009677, ecapa_loss=0.0001402, whisper_loss=0.07524, over 16733.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01062, ecapa_loss=0.0001468, whisper_loss=0.08878, over 3849414.40 frames. ], batch size: 63, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:00:52,083 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.026e-02 2024-08-18 08:00:54,683 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-18 08:00:57,522 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.49 vs. limit=10.0 2024-08-18 08:01:00,064 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-18 08:01:09,569 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3782330.0, ans=0.0 2024-08-18 08:01:26,673 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.30 vs. limit=22.5 2024-08-18 08:01:37,852 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.289e+05 2024-08-18 08:01:38,751 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-18 08:01:42,938 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.47 vs. limit=6.0 2024-08-18 08:01:54,436 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.147e+00 2024-08-18 08:01:59,876 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3782630.0, ans=0.2 2024-08-18 08:02:00,903 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 26 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-18 08:02:01,870 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3782730.0, ans=0.125 2024-08-18 08:02:02,610 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 6150, loss[loss=0.1199, beats_loss=0.008972, ecapa_loss=0.0001554, whisper_loss=0.1093, over 17354.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01063, ecapa_loss=0.000146, whisper_loss=0.08895, over 3854710.38 frames. ], batch size: 67, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:02:04,423 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3782730.0, ans=0.0 2024-08-18 08:02:24,174 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.232e+01 2024-08-18 08:02:26,827 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 20 from LS+wenet, 27 from Vox, 45 fro AS 2024-08-18 08:02:45,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3783030.0, ans=0.0 2024-08-18 08:02:46,183 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.766e+01 2.364e+01 2.615e+01 3.104e+01 2.704e+02, threshold=5.230e+01, percent-clipped=2.0 2024-08-18 08:02:50,086 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3783030.0, ans=0.125 2024-08-18 08:02:55,242 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3783030.0, ans=0.1 2024-08-18 08:02:56,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3783030.0, ans=0.1 2024-08-18 08:03:13,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3783230.0, ans=0.125 2024-08-18 08:03:13,875 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 6200, loss[loss=0.09285, beats_loss=0.01169, ecapa_loss=0.0001321, whisper_loss=0.07985, over 21877.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01056, ecapa_loss=0.000146, whisper_loss=0.08901, over 3846854.23 frames. ], batch size: 87, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:03:54,809 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.96 vs. limit=15.0 2024-08-18 08:03:58,862 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.91 vs. limit=22.5 2024-08-18 08:04:06,009 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.46 vs. limit=22.5 2024-08-18 08:04:07,048 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3783530.0, ans=0.0 2024-08-18 08:04:09,424 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-18 08:04:18,651 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 27 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-18 08:04:23,414 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 6250, loss[loss=0.107, beats_loss=0.01022, ecapa_loss=0.0001721, whisper_loss=0.09503, over 20169.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01051, ecapa_loss=0.0001457, whisper_loss=0.08955, over 3867408.94 frames. ], batch size: 81, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:04:29,459 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.54 vs. limit=15.0 2024-08-18 08:05:04,946 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.253e+01 2.525e+01 2.867e+01 3.662e+01, threshold=5.049e+01, percent-clipped=0.0 2024-08-18 08:05:11,563 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.93 vs. limit=15.0 2024-08-18 08:05:14,944 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-18 08:05:30,945 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 6300, loss[loss=0.09787, beats_loss=0.01117, ecapa_loss=0.0001036, whisper_loss=0.08566, over 18097.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01057, ecapa_loss=0.0001452, whisper_loss=0.08879, over 3883092.23 frames. ], batch size: 69, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:05:32,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3784230.0, ans=0.125 2024-08-18 08:05:38,233 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 21 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-18 08:05:48,960 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 08:05:50,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3784330.0, ans=0.2 2024-08-18 08:05:58,237 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 13 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-18 08:05:59,538 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 21 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-18 08:06:05,573 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 7 from Vox, 32 fro AS 2024-08-18 08:06:15,078 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.56 vs. limit=10.0 2024-08-18 08:06:38,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3784630.0, ans=0.0 2024-08-18 08:06:39,225 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3784630.0, ans=0.125 2024-08-18 08:06:41,607 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 6350, loss[loss=0.09861, beats_loss=0.01004, ecapa_loss=0.0001418, whisper_loss=0.08715, over 18822.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01055, ecapa_loss=0.0001446, whisper_loss=0.08963, over 3881317.31 frames. ], batch size: 76, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:06:50,478 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.99 vs. limit=22.5 2024-08-18 08:07:01,419 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3784830.0, ans=0.2 2024-08-18 08:07:08,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3784930.0, ans=0.05 2024-08-18 08:07:10,039 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3784930.0, ans=0.125 2024-08-18 08:07:17,888 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 17 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-18 08:07:24,438 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.986e+01 2.408e+01 2.616e+01 2.938e+01 2.370e+02, threshold=5.231e+01, percent-clipped=3.0 2024-08-18 08:07:31,595 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 19 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-18 08:07:47,889 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3785130.0, ans=0.1 2024-08-18 08:07:51,361 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 6400, loss[loss=0.1017, beats_loss=0.01081, ecapa_loss=0.0001507, whisper_loss=0.0894, over 21655.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01055, ecapa_loss=0.0001453, whisper_loss=0.08948, over 3869364.09 frames. ], batch size: 89, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:07:52,010 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3785230.0, ans=0.0 2024-08-18 08:07:52,969 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-18 08:07:54,264 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3785230.0, ans=0.125 2024-08-18 08:08:00,914 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.42 vs. limit=15.0 2024-08-18 08:08:12,248 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3785330.0, ans=0.0 2024-08-18 08:08:13,127 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 14 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-18 08:08:17,775 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3785330.0, ans=0.0 2024-08-18 08:08:45,723 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 15 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-18 08:09:02,655 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 6450, loss[loss=0.1052, beats_loss=0.0102, ecapa_loss=0.0001196, whisper_loss=0.09377, over 20606.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01065, ecapa_loss=0.000144, whisper_loss=0.08943, over 3892544.69 frames. ], batch size: 77, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:09:14,404 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3785730.0, ans=0.1 2024-08-18 08:09:25,402 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3785830.0, ans=0.0 2024-08-18 08:09:35,253 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3785930.0, ans=0.125 2024-08-18 08:09:46,280 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.963e+01 2.303e+01 2.568e+01 2.899e+01 1.758e+02, threshold=5.135e+01, percent-clipped=1.0 2024-08-18 08:09:49,929 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 33 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-18 08:10:09,122 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.02 vs. limit=10.0 2024-08-18 08:10:10,108 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-18 08:10:12,186 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3786130.0, ans=0.0 2024-08-18 08:10:12,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3786130.0, ans=0.125 2024-08-18 08:10:13,390 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3786130.0, ans=0.5 2024-08-18 08:10:15,955 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 6500, loss[loss=0.1177, beats_loss=0.009106, ecapa_loss=0.0001616, whisper_loss=0.107, over 19354.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01059, ecapa_loss=0.0001448, whisper_loss=0.09075, over 3934193.35 frames. ], batch size: 75, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:10:42,425 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.70 vs. limit=8.0 2024-08-18 08:10:48,330 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.26 vs. limit=15.0 2024-08-18 08:11:00,213 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3786530.0, ans=0.125 2024-08-18 08:11:03,501 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.13 vs. limit=6.0 2024-08-18 08:11:10,546 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.79 vs. limit=15.0 2024-08-18 08:11:26,585 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 6550, loss[loss=0.09144, beats_loss=0.01271, ecapa_loss=0.0001403, whisper_loss=0.07733, over 22385.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01062, ecapa_loss=0.0001451, whisper_loss=0.09011, over 3899749.69 frames. ], batch size: 91, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:11:50,887 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.00 vs. limit=15.0 2024-08-18 08:11:53,540 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.51 vs. limit=15.0 2024-08-18 08:11:58,276 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3786930.0, ans=0.2 2024-08-18 08:12:09,853 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.653e+01 2.440e+01 2.660e+01 3.012e+01 4.611e+01, threshold=5.320e+01, percent-clipped=0.0 2024-08-18 08:12:10,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3787030.0, ans=0.125 2024-08-18 08:12:16,659 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3787030.0, ans=0.0 2024-08-18 08:12:22,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3787130.0, ans=0.2 2024-08-18 08:12:31,756 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3787130.0, ans=0.125 2024-08-18 08:12:33,801 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3787130.0, ans=0.125 2024-08-18 08:12:37,274 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 6600, loss[loss=0.09823, beats_loss=0.01205, ecapa_loss=0.0001669, whisper_loss=0.08451, over 21551.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01055, ecapa_loss=0.0001468, whisper_loss=0.09018, over 3905989.24 frames. ], batch size: 94, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:12:40,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3787230.0, ans=0.125 2024-08-18 08:12:46,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3787230.0, ans=0.1 2024-08-18 08:12:57,099 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-18 08:12:57,309 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3787330.0, ans=0.125 2024-08-18 08:12:57,787 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.19 vs. limit=15.0 2024-08-18 08:13:06,095 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3787430.0, ans=0.1 2024-08-18 08:13:08,963 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.76 vs. limit=15.0 2024-08-18 08:13:15,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3787430.0, ans=0.0 2024-08-18 08:13:18,863 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3787530.0, ans=0.1 2024-08-18 08:13:24,361 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3787530.0, ans=0.2 2024-08-18 08:13:35,669 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3787630.0, ans=0.04949747468305833 2024-08-18 08:13:45,401 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-18 08:13:48,035 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 6650, loss[loss=0.09833, beats_loss=0.00987, ecapa_loss=0.0001635, whisper_loss=0.08682, over 20009.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01057, ecapa_loss=0.0001458, whisper_loss=0.09018, over 3927555.91 frames. ], batch size: 79, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:13:51,956 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3787730.0, ans=0.1 2024-08-18 08:13:54,285 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 37 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-18 08:14:12,413 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 24 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-18 08:14:13,707 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 30 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-18 08:14:21,010 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3787930.0, ans=0.2 2024-08-18 08:14:31,534 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.309e+01 2.540e+01 2.799e+01 5.437e+01, threshold=5.081e+01, percent-clipped=1.0 2024-08-18 08:14:31,727 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3788030.0, ans=0.125 2024-08-18 08:14:39,848 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.96 vs. limit=15.0 2024-08-18 08:14:53,390 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-18 08:14:54,894 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-18 08:14:56,948 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-18 08:14:59,207 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 6700, loss[loss=0.0918, beats_loss=0.01188, ecapa_loss=0.000121, whisper_loss=0.07871, over 22833.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01056, ecapa_loss=0.0001469, whisper_loss=0.09031, over 3924048.46 frames. ], batch size: 95, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:15:01,088 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3788230.0, ans=0.125 2024-08-18 08:15:02,096 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 23 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-18 08:15:10,277 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 20 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-18 08:15:18,747 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.72 vs. limit=15.0 2024-08-18 08:15:21,799 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-18 08:15:22,968 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-18 08:15:23,706 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.95 vs. limit=15.0 2024-08-18 08:15:26,169 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3788430.0, ans=0.1 2024-08-18 08:15:39,894 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.62 vs. limit=15.0 2024-08-18 08:16:03,895 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 24 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-18 08:16:08,067 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3788730.0, ans=0.1 2024-08-18 08:16:08,899 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 6750, loss[loss=0.08946, beats_loss=0.01138, ecapa_loss=0.0001523, whisper_loss=0.07655, over 14590.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01049, ecapa_loss=0.0001466, whisper_loss=0.09064, over 3892696.95 frames. ], batch size: 60, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:16:11,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3788730.0, ans=0.125 2024-08-18 08:16:19,391 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 12 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-18 08:16:22,726 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.91 vs. limit=22.5 2024-08-18 08:16:23,103 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 22 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-18 08:16:28,829 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 19 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-18 08:16:35,804 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-18 08:16:49,468 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 20 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-18 08:16:52,020 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.278e+01 2.601e+01 3.008e+01 1.413e+02, threshold=5.202e+01, percent-clipped=1.0 2024-08-18 08:16:52,893 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.14 vs. limit=22.5 2024-08-18 08:16:58,456 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 11 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-18 08:17:08,691 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 21 from LS+wenet, 36 from Vox, 31 fro AS 2024-08-18 08:17:17,749 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3789230.0, ans=0.125 2024-08-18 08:17:18,530 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 6800, loss[loss=0.1071, beats_loss=0.009273, ecapa_loss=0.0001619, whisper_loss=0.09622, over 23111.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01048, ecapa_loss=0.0001469, whisper_loss=0.08984, over 3878219.35 frames. ], batch size: 90, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:17:23,778 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.01 vs. limit=15.0 2024-08-18 08:17:29,876 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-18 08:17:32,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3789330.0, ans=0.125 2024-08-18 08:17:41,694 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 14 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-18 08:17:46,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3789430.0, ans=0.0 2024-08-18 08:17:48,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3789430.0, ans=0.125 2024-08-18 08:18:02,645 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3789530.0, ans=0.025 2024-08-18 08:18:02,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3789530.0, ans=0.0 2024-08-18 08:18:05,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3789530.0, ans=0.0 2024-08-18 08:18:21,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3789630.0, ans=0.125 2024-08-18 08:18:27,056 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.41 vs. limit=22.5 2024-08-18 08:18:27,563 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 6850, loss[loss=0.07599, beats_loss=0.01236, ecapa_loss=0.0001407, whisper_loss=0.06223, over 15044.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01046, ecapa_loss=0.0001458, whisper_loss=0.09059, over 3894362.61 frames. ], batch size: 63, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:18:49,197 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-18 08:19:09,386 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.742e+01 2.277e+01 2.530e+01 2.768e+01 4.068e+01, threshold=5.060e+01, percent-clipped=0.0 2024-08-18 08:19:17,875 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.91 vs. limit=22.5 2024-08-18 08:19:21,401 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-18 08:19:24,954 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 16 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-18 08:19:32,217 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-18 08:19:36,693 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 6900, loss[loss=0.1266, beats_loss=0.009724, ecapa_loss=0.0001349, whisper_loss=0.1155, over 23569.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01053, ecapa_loss=0.0001451, whisper_loss=0.09069, over 3918702.77 frames. ], batch size: 90, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:19:45,325 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3790230.0, ans=0.2 2024-08-18 08:19:53,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3790330.0, ans=0.125 2024-08-18 08:20:20,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3790530.0, ans=0.125 2024-08-18 08:20:21,321 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-18 08:20:22,790 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 19 from LS+wenet, 33 from Vox, 37 fro AS 2024-08-18 08:20:29,702 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3790530.0, ans=0.0 2024-08-18 08:20:39,583 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.89 vs. limit=15.0 2024-08-18 08:20:45,608 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 6950, loss[loss=0.08847, beats_loss=0.01108, ecapa_loss=0.0001515, whisper_loss=0.07587, over 16354.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0106, ecapa_loss=0.0001441, whisper_loss=0.09042, over 3923314.77 frames. ], batch size: 65, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:20:50,735 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.87 vs. limit=10.0 2024-08-18 08:21:05,889 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3790830.0, ans=0.2 2024-08-18 08:21:23,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3790930.0, ans=0.2 2024-08-18 08:21:28,901 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.780e+01 2.281e+01 2.543e+01 2.749e+01 3.905e+01, threshold=5.086e+01, percent-clipped=0.0 2024-08-18 08:21:36,053 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-18 08:21:43,658 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 13 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-18 08:21:46,969 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.73 vs. limit=15.0 2024-08-18 08:21:48,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=3791130.0, ans=22.5 2024-08-18 08:21:54,206 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 7000, loss[loss=0.105, beats_loss=0.008449, ecapa_loss=0.0001638, whisper_loss=0.09494, over 15819.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01055, ecapa_loss=0.0001447, whisper_loss=0.09092, over 3916482.63 frames. ], batch size: 64, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:21:54,403 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-18 08:21:55,031 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.79 vs. limit=15.0 2024-08-18 08:21:59,906 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 27 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-18 08:22:00,729 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.57 vs. limit=15.0 2024-08-18 08:22:19,664 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 28 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-18 08:22:22,269 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3791430.0, ans=0.0 2024-08-18 08:22:29,782 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 08:22:31,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3791430.0, ans=0.2 2024-08-18 08:22:35,180 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3791530.0, ans=0.125 2024-08-18 08:22:39,647 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3791530.0, ans=0.0 2024-08-18 08:22:48,206 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-18 08:22:58,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3791630.0, ans=0.125 2024-08-18 08:23:04,176 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 7050, loss[loss=0.08995, beats_loss=0.01205, ecapa_loss=0.0001364, whisper_loss=0.07654, over 17590.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01057, ecapa_loss=0.000145, whisper_loss=0.09043, over 3925506.52 frames. ], batch size: 70, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:23:22,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3791830.0, ans=0.125 2024-08-18 08:23:27,935 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2024-08-18 08:23:28,976 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3791830.0, ans=0.125 2024-08-18 08:23:47,629 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.689e+01 2.340e+01 2.598e+01 2.847e+01 9.151e+01, threshold=5.195e+01, percent-clipped=1.0 2024-08-18 08:24:04,437 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 36 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-18 08:24:06,058 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3792130.0, ans=0.0 2024-08-18 08:24:06,560 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.95 vs. limit=22.5 2024-08-18 08:24:13,623 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 7100, loss[loss=0.0988, beats_loss=0.008874, ecapa_loss=0.0001412, whisper_loss=0.08851, over 18373.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01053, ecapa_loss=0.0001446, whisper_loss=0.0905, over 3915513.85 frames. ], batch size: 73, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:24:18,403 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 24 from LS+wenet, 12 from Vox, 19 fro AS 2024-08-18 08:24:31,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3792330.0, ans=0.0 2024-08-18 08:24:59,994 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.00 vs. limit=15.0 2024-08-18 08:25:04,218 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3792530.0, ans=0.1 2024-08-18 08:25:20,067 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3792630.0, ans=0.125 2024-08-18 08:25:23,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3792630.0, ans=0.2 2024-08-18 08:25:24,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3792630.0, ans=0.125 2024-08-18 08:25:29,642 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 7150, loss[loss=0.08965, beats_loss=0.01083, ecapa_loss=0.0001463, whisper_loss=0.07736, over 22533.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01058, ecapa_loss=0.0001444, whisper_loss=0.08995, over 3905457.99 frames. ], batch size: 93, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:25:57,654 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.99 vs. limit=15.0 2024-08-18 08:25:58,226 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 36 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-18 08:26:15,861 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.780e+01 2.213e+01 2.415e+01 2.691e+01 1.069e+02, threshold=4.830e+01, percent-clipped=1.0 2024-08-18 08:26:19,368 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3793030.0, ans=0.0 2024-08-18 08:26:22,090 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3793030.0, ans=0.2 2024-08-18 08:26:28,527 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.235e+05 2024-08-18 08:26:40,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3793130.0, ans=0.0 2024-08-18 08:26:42,123 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.90 vs. limit=10.0 2024-08-18 08:26:44,266 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 7200, loss[loss=0.09746, beats_loss=0.01089, ecapa_loss=0.0001329, whisper_loss=0.08524, over 21779.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01055, ecapa_loss=0.0001448, whisper_loss=0.09004, over 3902959.45 frames. ], batch size: 87, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:27:05,515 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-18 08:27:14,602 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-18 08:27:25,009 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-18 08:27:28,972 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=3793430.0, ans=0.025 2024-08-18 08:27:29,890 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-18 08:27:30,309 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3793530.0, ans=0.125 2024-08-18 08:27:32,378 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.54 vs. limit=15.0 2024-08-18 08:27:41,526 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-18 08:27:50,155 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-18 08:27:58,455 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 26 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-18 08:28:03,218 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 7250, loss[loss=0.09193, beats_loss=0.01239, ecapa_loss=0.0001455, whisper_loss=0.07809, over 22190.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01058, ecapa_loss=0.0001447, whisper_loss=0.09019, over 3923317.48 frames. ], batch size: 92, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:28:05,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3793730.0, ans=0.125 2024-08-18 08:28:09,032 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3793730.0, ans=0.0 2024-08-18 08:28:11,603 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-18 08:28:31,826 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 38 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-18 08:28:42,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3793930.0, ans=0.125 2024-08-18 08:28:46,320 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3793930.0, ans=0.125 2024-08-18 08:28:48,491 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.34 vs. limit=10.0 2024-08-18 08:28:53,593 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.41 vs. limit=15.0 2024-08-18 08:28:55,565 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.633e+01 2.291e+01 2.562e+01 2.924e+01 4.395e+01, threshold=5.124e+01, percent-clipped=0.0 2024-08-18 08:28:55,750 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 18 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-18 08:29:01,095 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-18 08:29:05,751 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 38 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-18 08:29:17,065 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.53 vs. limit=10.0 2024-08-18 08:29:24,424 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.28 vs. limit=15.0 2024-08-18 08:29:25,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3794230.0, ans=0.125 2024-08-18 08:29:26,019 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 7300, loss[loss=0.09688, beats_loss=0.01039, ecapa_loss=0.000143, whisper_loss=0.08506, over 19924.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0106, ecapa_loss=0.0001448, whisper_loss=0.09042, over 3916488.80 frames. ], batch size: 80, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:29:29,852 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.42 vs. limit=22.5 2024-08-18 08:29:38,774 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-18 08:29:41,899 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-18 08:29:51,414 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.91 vs. limit=22.5 2024-08-18 08:30:14,132 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 23 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-18 08:30:25,131 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3794630.0, ans=0.0 2024-08-18 08:30:33,570 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3794630.0, ans=0.2 2024-08-18 08:30:39,480 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 7350, loss[loss=0.1056, beats_loss=0.01, ecapa_loss=0.000146, whisper_loss=0.09418, over 22562.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01058, ecapa_loss=0.0001461, whisper_loss=0.09028, over 3904003.02 frames. ], batch size: 88, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:30:39,944 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3794730.0, ans=0.1 2024-08-18 08:30:52,278 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-18 08:30:58,087 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3794830.0, ans=0.95 2024-08-18 08:31:19,362 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 22 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-18 08:31:21,624 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.256e+01 2.556e+01 3.026e+01 2.430e+02, threshold=5.112e+01, percent-clipped=3.0 2024-08-18 08:31:25,326 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.45 vs. limit=15.0 2024-08-18 08:31:31,570 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.666e+05 2024-08-18 08:31:33,856 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-18 08:31:35,296 INFO [train_multi_KD3.py:844] (2/4) A total of 97 cuts. 33 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-18 08:31:43,443 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-18 08:31:46,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3795130.0, ans=0.1 2024-08-18 08:31:47,621 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 14 from LS+wenet, 13 from Vox, 41 fro AS 2024-08-18 08:31:48,683 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 7400, loss[loss=0.07119, beats_loss=0.01558, ecapa_loss=0.0001243, whisper_loss=0.05436, over 16476.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0107, ecapa_loss=0.0001453, whisper_loss=0.08904, over 3893185.62 frames. ], batch size: 68, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:31:53,029 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 18 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-18 08:33:00,638 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 7450, loss[loss=0.1046, beats_loss=0.008652, ecapa_loss=0.0001645, whisper_loss=0.09432, over 14741.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01055, ecapa_loss=0.0001458, whisper_loss=0.0909, over 3901927.81 frames. ], batch size: 58, lr: 2.36e-03, grad_scale: 1.152921504606847e+18 2024-08-18 08:33:14,468 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.18 vs. limit=12.0 2024-08-18 08:33:44,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3796030.0, ans=0.07 2024-08-18 08:33:45,458 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.005e+01 2.298e+01 2.515e+01 2.831e+01 4.675e+01, threshold=5.030e+01, percent-clipped=0.0 2024-08-18 08:33:48,808 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3796030.0, ans=0.125 2024-08-18 08:34:02,109 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.14 vs. limit=22.5 2024-08-18 08:34:05,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3796130.0, ans=0.2 2024-08-18 08:34:10,381 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 16 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-18 08:34:12,729 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 7500, loss[loss=0.09621, beats_loss=0.01079, ecapa_loss=0.0001086, whisper_loss=0.08434, over 17467.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01057, ecapa_loss=0.000145, whisper_loss=0.09047, over 3875510.50 frames. ], batch size: 68, lr: 2.35e-03, grad_scale: 1.152921504606847e+18 2024-08-18 08:34:20,246 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 11 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-18 08:34:24,431 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3796230.0, ans=0.1 2024-08-18 08:34:27,189 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3796330.0, ans=0.0 2024-08-18 08:34:29,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3796330.0, ans=0.0 2024-08-18 08:34:32,304 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-18 08:34:55,262 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-18 08:35:01,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3796530.0, ans=0.0 2024-08-18 08:35:12,819 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.41 vs. limit=22.5 2024-08-18 08:35:20,472 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3796730.0, ans=0.035 2024-08-18 08:35:21,328 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 7550, loss[loss=0.09784, beats_loss=0.01259, ecapa_loss=0.0001342, whisper_loss=0.08391, over 23426.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01054, ecapa_loss=0.0001457, whisper_loss=0.08959, over 3837300.18 frames. ], batch size: 94, lr: 2.35e-03, grad_scale: 1.152921504606847e+18 2024-08-18 08:35:45,289 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.04 vs. limit=22.5 2024-08-18 08:35:50,121 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3796930.0, ans=0.125 2024-08-18 08:35:51,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3796930.0, ans=0.2 2024-08-18 08:36:00,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3797030.0, ans=0.0 2024-08-18 08:36:01,609 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.773e+01 2.289e+01 2.559e+01 2.836e+01 1.504e+02, threshold=5.118e+01, percent-clipped=1.0 2024-08-18 08:36:03,999 WARNING [optim.py:496] (2/4) Scaling gradients by 0.04845663160085678, model_norm_threshold=51.1837158203125 2024-08-18 08:36:04,170 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.15, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.631e+05, grad_sumsq=1.588e+07, orig_rms_sq=1.027e-02 2024-08-18 08:36:05,974 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3797030.0, ans=0.125 2024-08-18 08:36:22,959 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3797130.0, ans=0.0 2024-08-18 08:36:26,287 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 7600, loss[loss=0.08771, beats_loss=0.01269, ecapa_loss=0.0001752, whisper_loss=0.07327, over 19921.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01052, ecapa_loss=0.0001467, whisper_loss=0.08962, over 3837594.21 frames. ], batch size: 87, lr: 2.35e-03, grad_scale: 1.152921504606847e+18 2024-08-18 08:37:06,199 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3797530.0, ans=0.035 2024-08-18 08:37:07,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3797530.0, ans=0.1 2024-08-18 08:37:19,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3797630.0, ans=0.125 2024-08-18 08:37:20,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3797630.0, ans=0.0 2024-08-18 08:37:21,218 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 23 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-18 08:37:35,301 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 7650, loss[loss=0.09009, beats_loss=0.009625, ecapa_loss=0.0001511, whisper_loss=0.07895, over 15342.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01039, ecapa_loss=0.0001462, whisper_loss=0.09048, over 3840270.00 frames. ], batch size: 62, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:37:39,709 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3797730.0, ans=0.2 2024-08-18 08:37:41,343 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.34 vs. limit=12.0 2024-08-18 08:37:43,379 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3797730.0, ans=0.0 2024-08-18 08:37:47,268 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3797830.0, ans=0.2 2024-08-18 08:37:53,337 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 25 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-18 08:37:53,470 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3797830.0, ans=0.125 2024-08-18 08:37:56,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3797830.0, ans=0.0 2024-08-18 08:38:04,652 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3797930.0, ans=0.125 2024-08-18 08:38:18,420 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.705e+01 2.494e+01 2.701e+01 3.087e+01 1.056e+03, threshold=5.401e+01, percent-clipped=1.0 2024-08-18 08:38:28,386 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.49 vs. limit=15.0 2024-08-18 08:38:29,107 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3798130.0, ans=0.0 2024-08-18 08:38:40,471 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.48 vs. limit=10.0 2024-08-18 08:38:41,293 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 7700, loss[loss=0.1039, beats_loss=0.006685, ecapa_loss=0.0001684, whisper_loss=0.09551, over 16011.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01029, ecapa_loss=0.0001471, whisper_loss=0.09076, over 3841926.78 frames. ], batch size: 61, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:38:44,833 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 19 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-18 08:38:46,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3798230.0, ans=0.0 2024-08-18 08:38:52,956 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.18 vs. limit=15.0 2024-08-18 08:39:09,110 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3798430.0, ans=0.125 2024-08-18 08:39:15,212 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 13 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-18 08:39:30,787 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 25 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-18 08:39:42,800 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.09 vs. limit=8.0 2024-08-18 08:39:45,425 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 7750, loss[loss=0.08863, beats_loss=0.009925, ecapa_loss=0.0001837, whisper_loss=0.07687, over 17858.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01033, ecapa_loss=0.0001464, whisper_loss=0.09039, over 3834812.86 frames. ], batch size: 71, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:39:50,694 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-18 08:39:51,959 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-18 08:39:56,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3798730.0, ans=0.1 2024-08-18 08:40:07,755 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-18 08:40:08,255 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.54 vs. limit=15.0 2024-08-18 08:40:11,847 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3798930.0, ans=0.125 2024-08-18 08:40:24,912 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-18 08:40:27,406 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.53 vs. limit=15.0 2024-08-18 08:40:27,849 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.319e+01 2.612e+01 2.885e+01 4.256e+01, threshold=5.223e+01, percent-clipped=0.0 2024-08-18 08:40:51,660 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 7800, loss[loss=0.1224, beats_loss=0.009168, ecapa_loss=0.0001389, whisper_loss=0.1118, over 23357.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01038, ecapa_loss=0.0001461, whisper_loss=0.09038, over 3830590.62 frames. ], batch size: 91, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:41:01,174 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3799230.0, ans=0.125 2024-08-18 08:41:12,236 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 23 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-18 08:41:12,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3799330.0, ans=0.2 2024-08-18 08:41:24,773 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 28 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-18 08:41:37,038 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3799530.0, ans=0.125 2024-08-18 08:41:38,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3799530.0, ans=0.125 2024-08-18 08:41:48,991 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-18 08:41:51,844 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 16 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-18 08:41:53,150 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-18 08:41:58,030 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 7850, loss[loss=0.08144, beats_loss=0.01006, ecapa_loss=0.0001828, whisper_loss=0.06955, over 20753.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01042, ecapa_loss=0.0001469, whisper_loss=0.08998, over 3821019.08 frames. ], batch size: 90, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:41:58,449 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3799730.0, ans=0.125 2024-08-18 08:42:09,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3799730.0, ans=0.1 2024-08-18 08:42:10,765 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 30 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-18 08:42:11,958 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 21 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-18 08:42:23,571 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-18 08:42:31,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3799930.0, ans=0.125 2024-08-18 08:42:40,474 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3800030.0, ans=0.125 2024-08-18 08:42:42,482 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.955e+01 2.336e+01 2.580e+01 3.031e+01 8.251e+01, threshold=5.160e+01, percent-clipped=2.0 2024-08-18 08:42:46,708 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=15.0 2024-08-18 08:42:58,903 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 24 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-18 08:43:03,367 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=4.997e-01 2024-08-18 08:43:03,601 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.64 vs. limit=15.0 2024-08-18 08:43:05,402 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 7900, loss[loss=0.1077, beats_loss=0.009962, ecapa_loss=0.0001686, whisper_loss=0.09604, over 21729.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01055, ecapa_loss=0.0001451, whisper_loss=0.09045, over 3841115.67 frames. ], batch size: 89, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:43:05,510 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-18 08:43:13,275 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-18 08:43:18,078 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3800330.0, ans=0.125 2024-08-18 08:43:18,135 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3800330.0, ans=0.1 2024-08-18 08:43:26,806 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 20 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-18 08:43:45,055 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.524e+00 2024-08-18 08:43:46,209 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3800530.0, ans=0.0 2024-08-18 08:43:55,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3800530.0, ans=0.2 2024-08-18 08:43:59,622 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3800630.0, ans=0.1 2024-08-18 08:44:01,945 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3800630.0, ans=0.125 2024-08-18 08:44:04,296 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 24 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-18 08:44:11,685 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 7950, loss[loss=0.1229, beats_loss=0.009415, ecapa_loss=0.0001593, whisper_loss=0.1119, over 22077.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01055, ecapa_loss=0.0001455, whisper_loss=0.09056, over 3821414.98 frames. ], batch size: 88, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:44:26,546 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.01 vs. limit=10.0 2024-08-18 08:44:34,672 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3800830.0, ans=0.125 2024-08-18 08:44:39,735 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 29 from LS+wenet, 10 from Vox, 39 fro AS 2024-08-18 08:44:53,887 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3801030.0, ans=0.2 2024-08-18 08:44:56,042 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.349e+01 2.585e+01 3.062e+01 4.002e+02, threshold=5.169e+01, percent-clipped=3.0 2024-08-18 08:44:57,163 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.63 vs. limit=15.0 2024-08-18 08:44:59,570 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3801030.0, ans=0.125 2024-08-18 08:45:21,682 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 8000, loss[loss=0.1099, beats_loss=0.009822, ecapa_loss=0.0001339, whisper_loss=0.09871, over 19692.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01048, ecapa_loss=0.0001452, whisper_loss=0.09086, over 3834142.43 frames. ], batch size: 76, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:45:28,863 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 08:46:17,448 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 20 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-18 08:46:24,719 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.77 vs. limit=15.0 2024-08-18 08:46:30,577 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 8050, loss[loss=0.07744, beats_loss=0.01073, ecapa_loss=0.0001318, whisper_loss=0.0654, over 15146.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01042, ecapa_loss=0.0001452, whisper_loss=0.09162, over 3837189.33 frames. ], batch size: 60, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:46:40,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3801730.0, ans=0.125 2024-08-18 08:46:41,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3801730.0, ans=0.125 2024-08-18 08:46:48,069 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3801830.0, ans=0.04949747468305833 2024-08-18 08:46:50,647 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.33 vs. limit=22.5 2024-08-18 08:46:53,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3801830.0, ans=0.2 2024-08-18 08:47:01,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3801930.0, ans=0.0 2024-08-18 08:47:14,951 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.216e+01 2.436e+01 2.750e+01 5.017e+01, threshold=4.873e+01, percent-clipped=0.0 2024-08-18 08:47:38,715 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 8100, loss[loss=0.1028, beats_loss=0.01043, ecapa_loss=0.0001125, whisper_loss=0.09124, over 23743.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0104, ecapa_loss=0.000146, whisper_loss=0.09174, over 3839059.91 frames. ], batch size: 90, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:47:46,484 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3802230.0, ans=0.07 2024-08-18 08:47:47,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=3802230.0, ans=15.0 2024-08-18 08:47:48,812 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3802230.0, ans=0.125 2024-08-18 08:47:59,008 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.18 vs. limit=22.5 2024-08-18 08:48:13,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3802430.0, ans=0.125 2024-08-18 08:48:17,864 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.72 vs. limit=22.5 2024-08-18 08:48:34,110 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-18 08:48:38,170 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 35 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-18 08:48:39,916 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3802630.0, ans=0.125 2024-08-18 08:48:47,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3802630.0, ans=0.1 2024-08-18 08:48:49,408 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 8150, loss[loss=0.07876, beats_loss=0.01268, ecapa_loss=0.0001369, whisper_loss=0.06471, over 22734.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0103, ecapa_loss=0.0001473, whisper_loss=0.09181, over 3870540.34 frames. ], batch size: 94, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:48:56,161 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.55 vs. limit=12.0 2024-08-18 08:49:12,939 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-18 08:49:13,568 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.00 vs. limit=6.0 2024-08-18 08:49:17,726 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 24 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-18 08:49:23,798 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3802930.0, ans=0.05 2024-08-18 08:49:26,006 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-18 08:49:33,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3803030.0, ans=0.1 2024-08-18 08:49:35,793 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.360e+01 2.577e+01 2.939e+01 1.297e+02, threshold=5.154e+01, percent-clipped=2.0 2024-08-18 08:49:41,650 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-18 08:49:53,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3803130.0, ans=0.125 2024-08-18 08:49:55,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3803130.0, ans=0.0 2024-08-18 08:50:01,595 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 8200, loss[loss=0.09236, beats_loss=0.009427, ecapa_loss=0.000187, whisper_loss=0.08106, over 19000.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01036, ecapa_loss=0.0001479, whisper_loss=0.09187, over 3894922.47 frames. ], batch size: 81, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:50:04,559 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 19 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-18 08:50:08,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3803230.0, ans=0.0 2024-08-18 08:50:26,689 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 25 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-18 08:50:45,415 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-18 08:51:07,868 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-18 08:51:11,467 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3803630.0, ans=0.2 2024-08-18 08:51:14,006 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-18 08:51:15,038 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 8250, loss[loss=0.1167, beats_loss=0.009257, ecapa_loss=0.0001758, whisper_loss=0.1057, over 21967.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01037, ecapa_loss=0.000147, whisper_loss=0.09177, over 3853935.38 frames. ], batch size: 89, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:51:16,472 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.15 vs. limit=15.0 2024-08-18 08:51:19,920 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-18 08:52:02,880 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.328e+01 2.473e+01 2.837e+01 4.238e+01, threshold=4.947e+01, percent-clipped=0.0 2024-08-18 08:52:29,040 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3804230.0, ans=0.125 2024-08-18 08:52:29,858 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 8300, loss[loss=0.1063, beats_loss=0.008397, ecapa_loss=0.0001728, whisper_loss=0.09616, over 21239.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01042, ecapa_loss=0.000146, whisper_loss=0.09125, over 3870638.44 frames. ], batch size: 85, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:52:38,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3804230.0, ans=0.2 2024-08-18 08:52:56,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3804330.0, ans=0.125 2024-08-18 08:53:02,077 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 26 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-18 08:53:07,749 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2024-08-18 08:53:18,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3804530.0, ans=0.0 2024-08-18 08:53:29,067 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3804530.0, ans=0.1 2024-08-18 08:53:38,648 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3804630.0, ans=0.2 2024-08-18 08:53:45,274 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-18 08:53:48,470 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 8350, loss[loss=0.1268, beats_loss=0.008943, ecapa_loss=0.0001503, whisper_loss=0.1163, over 22349.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01057, ecapa_loss=0.0001454, whisper_loss=0.09032, over 3894225.96 frames. ], batch size: 89, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:53:49,190 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.780e+05 2024-08-18 08:53:59,275 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3804730.0, ans=0.2 2024-08-18 08:53:59,488 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.79 vs. limit=22.5 2024-08-18 08:54:03,927 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 22 from LS+wenet, 27 from Vox, 45 fro AS 2024-08-18 08:54:08,008 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-18 08:54:08,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3804830.0, ans=0.125 2024-08-18 08:54:14,264 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3804830.0, ans=0.1 2024-08-18 08:54:15,460 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 19 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-18 08:54:34,890 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.322e+01 2.536e+01 2.820e+01 4.636e+01, threshold=5.072e+01, percent-clipped=0.0 2024-08-18 08:54:51,361 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3805130.0, ans=0.125 2024-08-18 08:55:01,972 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.43 vs. limit=12.0 2024-08-18 08:55:03,571 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 8400, loss[loss=0.1062, beats_loss=0.01128, ecapa_loss=0.0001437, whisper_loss=0.0935, over 22982.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01056, ecapa_loss=0.0001459, whisper_loss=0.09077, over 3887022.45 frames. ], batch size: 93, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:55:50,531 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 21 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-18 08:55:51,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3805530.0, ans=0.125 2024-08-18 08:56:03,650 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 22 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-18 08:56:19,877 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3805630.0, ans=0.2 2024-08-18 08:56:23,599 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 8450, loss[loss=0.1365, beats_loss=0.006703, ecapa_loss=0.0001611, whisper_loss=0.1281, over 24072.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01049, ecapa_loss=0.0001457, whisper_loss=0.09096, over 3914081.70 frames. ], batch size: 91, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:56:46,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3805830.0, ans=0.0 2024-08-18 08:57:45,548 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.807e+01 2.360e+01 2.600e+01 3.017e+01 9.268e+01, threshold=5.200e+01, percent-clipped=1.0 2024-08-18 08:57:45,690 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-18 08:58:07,882 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3806130.0, ans=0.125 2024-08-18 08:58:16,433 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 8500, loss[loss=0.09245, beats_loss=0.01241, ecapa_loss=0.0001245, whisper_loss=0.0788, over 17322.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01046, ecapa_loss=0.000146, whisper_loss=0.09056, over 3912292.75 frames. ], batch size: 69, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:58:40,137 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-18 08:58:52,363 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3806430.0, ans=0.125 2024-08-18 08:59:04,557 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 22 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-18 08:59:05,718 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.53 vs. limit=15.0 2024-08-18 08:59:28,513 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 34 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-18 08:59:37,896 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 8550, loss[loss=0.1012, beats_loss=0.01335, ecapa_loss=0.0001076, whisper_loss=0.08677, over 19709.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01041, ecapa_loss=0.0001454, whisper_loss=0.09098, over 3923833.99 frames. ], batch size: 77, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:59:38,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3806730.0, ans=0.125 2024-08-18 08:59:43,742 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3806730.0, ans=0.05 2024-08-18 08:59:47,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3806730.0, ans=0.125 2024-08-18 08:59:54,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3806830.0, ans=0.125 2024-08-18 09:00:26,444 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-18 09:00:27,968 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.776e+01 2.270e+01 2.542e+01 2.954e+01 6.029e+01, threshold=5.084e+01, percent-clipped=1.0 2024-08-18 09:00:32,931 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3807030.0, ans=0.125 2024-08-18 09:00:55,597 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 8600, loss[loss=0.09717, beats_loss=0.0114, ecapa_loss=0.0001593, whisper_loss=0.08418, over 20683.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01043, ecapa_loss=0.0001461, whisper_loss=0.0906, over 3908830.52 frames. ], batch size: 85, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:01:30,616 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=3807430.0, ans=22.5 2024-08-18 09:01:31,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3807430.0, ans=0.125 2024-08-18 09:01:32,954 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 17 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-18 09:01:58,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3807630.0, ans=0.125 2024-08-18 09:02:06,964 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3807630.0, ans=0.0 2024-08-18 09:02:09,288 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 8650, loss[loss=0.1428, beats_loss=0.007265, ecapa_loss=0.0001845, whisper_loss=0.1337, over 17879.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01048, ecapa_loss=0.000146, whisper_loss=0.09018, over 3879718.13 frames. ], batch size: 68, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:02:23,741 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=17.24 vs. limit=15.0 2024-08-18 09:02:53,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3808030.0, ans=0.125 2024-08-18 09:02:56,386 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.256e+01 2.496e+01 2.847e+01 1.710e+02, threshold=4.992e+01, percent-clipped=2.0 2024-08-18 09:02:57,851 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-18 09:03:12,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3808130.0, ans=0.125 2024-08-18 09:03:15,413 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 18 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-18 09:03:17,414 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.74 vs. limit=15.0 2024-08-18 09:03:22,270 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 8700, loss[loss=0.1236, beats_loss=0.01019, ecapa_loss=0.0001349, whisper_loss=0.112, over 14697.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01052, ecapa_loss=0.0001461, whisper_loss=0.08997, over 3832573.72 frames. ], batch size: 56, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:03:44,145 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.34 vs. limit=15.0 2024-08-18 09:03:49,736 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.67 vs. limit=15.0 2024-08-18 09:04:07,121 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=4.437e-02 2024-08-18 09:04:17,100 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.22 vs. limit=22.5 2024-08-18 09:04:31,677 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.69 vs. limit=15.0 2024-08-18 09:04:32,231 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 8750, loss[loss=0.1214, beats_loss=0.009246, ecapa_loss=0.0001891, whisper_loss=0.1103, over 21351.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01044, ecapa_loss=0.0001463, whisper_loss=0.09058, over 3870751.80 frames. ], batch size: 87, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:04:44,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3808830.0, ans=0.125 2024-08-18 09:05:00,613 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.779e+01 2024-08-18 09:05:15,045 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.311e+01 2.590e+01 2.877e+01 4.936e+01, threshold=5.179e+01, percent-clipped=0.0 2024-08-18 09:05:19,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3809030.0, ans=0.95 2024-08-18 09:05:20,272 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 10 from Vox, 34 fro AS 2024-08-18 09:05:20,491 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3809030.0, ans=0.2 2024-08-18 09:05:29,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3809130.0, ans=0.2 2024-08-18 09:05:36,138 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3809130.0, ans=0.125 2024-08-18 09:05:38,475 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 8800, loss[loss=0.08495, beats_loss=0.01296, ecapa_loss=0.0001458, whisper_loss=0.07053, over 20808.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01043, ecapa_loss=0.0001454, whisper_loss=0.09107, over 3877285.23 frames. ], batch size: 86, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:05:38,644 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 29 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-18 09:05:40,092 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3809230.0, ans=0.125 2024-08-18 09:05:47,123 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-18 09:05:47,527 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3809230.0, ans=0.125 2024-08-18 09:05:52,744 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 19 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-18 09:05:56,388 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-18 09:06:09,413 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.157e+00 2024-08-18 09:06:22,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3809530.0, ans=0.025 2024-08-18 09:06:39,919 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 15 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-18 09:06:42,311 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 8850, loss[loss=0.06731, beats_loss=0.01115, ecapa_loss=0.0001347, whisper_loss=0.05482, over 17687.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01061, ecapa_loss=0.0001442, whisper_loss=0.08983, over 3865067.06 frames. ], batch size: 69, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:06:43,710 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-18 09:06:55,765 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3809830.0, ans=0.2 2024-08-18 09:07:21,918 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.61 vs. limit=22.5 2024-08-18 09:07:23,697 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.147e+01 2.440e+01 2.882e+01 4.203e+01, threshold=4.880e+01, percent-clipped=0.0 2024-08-18 09:07:27,291 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2024-08-18 09:07:28,989 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 39 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-18 09:07:41,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3810130.0, ans=0.1 2024-08-18 09:07:47,352 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 8900, loss[loss=0.1141, beats_loss=0.009546, ecapa_loss=0.0001225, whisper_loss=0.1033, over 20973.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01057, ecapa_loss=0.0001445, whisper_loss=0.09031, over 3866576.44 frames. ], batch size: 81, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:07:47,526 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-18 09:07:52,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3810230.0, ans=0.125 2024-08-18 09:07:53,506 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 24 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-18 09:07:55,341 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.28 vs. limit=15.0 2024-08-18 09:08:06,875 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.60 vs. limit=15.0 2024-08-18 09:08:16,824 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3810430.0, ans=0.0 2024-08-18 09:08:21,449 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3810430.0, ans=0.125 2024-08-18 09:08:24,343 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.93 vs. limit=10.0 2024-08-18 09:08:25,061 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3810430.0, ans=0.1 2024-08-18 09:08:26,412 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3810530.0, ans=0.125 2024-08-18 09:08:32,612 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-18 09:08:50,999 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.577e+01 2024-08-18 09:08:53,016 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 8950, loss[loss=0.1121, beats_loss=0.01036, ecapa_loss=0.0001209, whisper_loss=0.1006, over 23958.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01058, ecapa_loss=0.0001436, whisper_loss=0.09054, over 3875147.57 frames. ], batch size: 94, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:09:33,329 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-18 09:09:35,387 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.997e+01 2.305e+01 2.570e+01 2.937e+01 4.370e+01, threshold=5.139e+01, percent-clipped=0.0 2024-08-18 09:09:50,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3811130.0, ans=0.0 2024-08-18 09:09:56,156 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3811130.0, ans=0.2 2024-08-18 09:09:59,359 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 9000, loss[loss=0.09866, beats_loss=0.01169, ecapa_loss=0.0001311, whisper_loss=0.08566, over 19574.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01064, ecapa_loss=0.0001442, whisper_loss=0.0902, over 3900225.84 frames. ], batch size: 77, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:09:59,360 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-18 09:10:40,543 INFO [train_multi_KD3.py:1149] (2/4) Epoch 26, validation on ASR_libri: loss=0.2531, beats_loss=0, ecapa_loss=0.0005276, whisper_loss=0.2478, over 922467.00 frames. 2024-08-18 09:10:56,596 INFO [train_multi_KD3.py:1149] (2/4) Epoch 26, validation on SV_voxceleb1: loss=0.004116, beats_loss=0, ecapa_loss=0.0004116, whisper_loss=0, over 939242.00 frames. 2024-08-18 09:12:49,866 INFO [train_multi_KD3.py:1149] (2/4) Epoch 26, validation on AT_audioset: loss=0.02315, beats_loss=0.02315, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 09:12:49,870 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31584MB 2024-08-18 09:12:52,889 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3811230.0, ans=0.125 2024-08-18 09:13:00,071 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3811230.0, ans=0.2 2024-08-18 09:13:06,876 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3811330.0, ans=0.125 2024-08-18 09:13:11,987 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3811330.0, ans=0.2 2024-08-18 09:13:15,951 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-18 09:13:31,412 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-18 09:13:37,526 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3811530.0, ans=0.125 2024-08-18 09:13:40,364 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3811530.0, ans=0.2 2024-08-18 09:13:56,143 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 9050, loss[loss=0.09461, beats_loss=0.009567, ecapa_loss=0.0001588, whisper_loss=0.08345, over 16459.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01061, ecapa_loss=0.0001448, whisper_loss=0.09024, over 3881815.43 frames. ], batch size: 66, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:14:24,180 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3811930.0, ans=0.125 2024-08-18 09:14:37,992 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.711e+01 2.306e+01 2.516e+01 2.800e+01 4.042e+01, threshold=5.033e+01, percent-clipped=0.0 2024-08-18 09:14:38,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3812030.0, ans=0.0 2024-08-18 09:14:44,534 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.10 vs. limit=12.0 2024-08-18 09:14:47,332 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.58 vs. limit=22.5 2024-08-18 09:15:02,072 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 9100, loss[loss=0.09834, beats_loss=0.01087, ecapa_loss=0.0001338, whisper_loss=0.08612, over 19120.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01053, ecapa_loss=0.0001447, whisper_loss=0.0902, over 3867562.83 frames. ], batch size: 76, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:15:12,645 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3812230.0, ans=0.95 2024-08-18 09:15:14,225 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.27 vs. limit=15.0 2024-08-18 09:15:17,722 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3812330.0, ans=0.1 2024-08-18 09:15:17,798 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3812330.0, ans=0.05 2024-08-18 09:15:21,967 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3812330.0, ans=0.1 2024-08-18 09:15:42,622 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-18 09:15:49,174 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3812530.0, ans=0.0 2024-08-18 09:15:51,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3812530.0, ans=0.0 2024-08-18 09:16:06,554 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 9150, loss[loss=0.0695, beats_loss=0.01394, ecapa_loss=0.0001115, whisper_loss=0.05445, over 20036.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01052, ecapa_loss=0.0001455, whisper_loss=0.09037, over 3896659.38 frames. ], batch size: 81, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:16:07,009 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 23 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-18 09:16:07,536 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.87 vs. limit=6.0 2024-08-18 09:16:11,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3812730.0, ans=0.0 2024-08-18 09:16:13,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3812730.0, ans=10.0 2024-08-18 09:16:23,582 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3812830.0, ans=10.0 2024-08-18 09:16:29,732 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 27 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-18 09:16:47,031 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.602e+01 2.322e+01 2.560e+01 2.828e+01 4.789e+01, threshold=5.121e+01, percent-clipped=0.0 2024-08-18 09:16:50,145 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3813030.0, ans=0.0 2024-08-18 09:16:50,207 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3813030.0, ans=0.1 2024-08-18 09:16:58,405 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3813130.0, ans=0.0 2024-08-18 09:17:00,457 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-18 09:17:10,770 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 9200, loss[loss=0.1134, beats_loss=0.0087, ecapa_loss=0.0001494, whisper_loss=0.1032, over 16171.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01053, ecapa_loss=0.0001467, whisper_loss=0.08993, over 3885337.21 frames. ], batch size: 64, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:17:21,759 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3813230.0, ans=0.125 2024-08-18 09:17:21,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3813230.0, ans=0.0 2024-08-18 09:17:24,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3813330.0, ans=10.0 2024-08-18 09:17:43,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3813430.0, ans=0.1 2024-08-18 09:17:47,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3813430.0, ans=0.2 2024-08-18 09:17:48,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3813530.0, ans=0.0 2024-08-18 09:18:08,763 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 35 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-18 09:18:11,196 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 27 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-18 09:18:13,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3813730.0, ans=0.0 2024-08-18 09:18:14,899 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 9250, loss[loss=0.1114, beats_loss=0.01259, ecapa_loss=0.0001094, whisper_loss=0.09769, over 18462.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01057, ecapa_loss=0.0001472, whisper_loss=0.09031, over 3896146.65 frames. ], batch size: 70, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:18:31,478 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 21 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-18 09:18:55,774 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.301e+01 2.520e+01 2.839e+01 9.703e+01, threshold=5.041e+01, percent-clipped=1.0 2024-08-18 09:19:12,134 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 24 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-18 09:19:18,373 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 9300, loss[loss=0.12, beats_loss=0.01047, ecapa_loss=0.0001395, whisper_loss=0.1081, over 24330.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01049, ecapa_loss=0.0001472, whisper_loss=0.09072, over 3887861.43 frames. ], batch size: 95, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:19:30,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3814330.0, ans=0.125 2024-08-18 09:19:42,097 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 33 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-18 09:19:43,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3814430.0, ans=0.125 2024-08-18 09:19:52,709 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3814430.0, ans=0.125 2024-08-18 09:19:54,750 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-18 09:20:06,173 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3814530.0, ans=0.0 2024-08-18 09:20:07,110 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 11 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-18 09:20:20,881 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 9350, loss[loss=0.1362, beats_loss=0.008697, ecapa_loss=0.0001515, whisper_loss=0.126, over 16074.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01047, ecapa_loss=0.0001459, whisper_loss=0.0912, over 3894785.66 frames. ], batch size: 63, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:20:22,241 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 20 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-18 09:20:22,526 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3814730.0, ans=0.125 2024-08-18 09:20:46,286 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.70 vs. limit=15.0 2024-08-18 09:21:00,570 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.210e+01 2.465e+01 2.743e+01 3.638e+02, threshold=4.930e+01, percent-clipped=1.0 2024-08-18 09:21:07,961 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-18 09:21:15,328 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3815130.0, ans=0.125 2024-08-18 09:21:18,918 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3815130.0, ans=0.2 2024-08-18 09:21:20,156 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3815130.0, ans=0.025 2024-08-18 09:21:23,436 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 9400, loss[loss=0.1044, beats_loss=0.01006, ecapa_loss=0.0001666, whisper_loss=0.09263, over 12901.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01047, ecapa_loss=0.0001469, whisper_loss=0.09096, over 3873684.29 frames. ], batch size: 55, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:21:23,783 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3815230.0, ans=0.025 2024-08-18 09:21:23,825 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.171e-02 2024-08-18 09:21:27,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3815230.0, ans=0.125 2024-08-18 09:21:37,282 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 21 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-18 09:21:38,784 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3815330.0, ans=0.125 2024-08-18 09:21:48,343 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-18 09:21:48,624 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3815430.0, ans=0.0 2024-08-18 09:21:48,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3815430.0, ans=0.0 2024-08-18 09:21:59,523 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-18 09:22:12,636 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 31 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-18 09:22:26,094 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 9450, loss[loss=0.09533, beats_loss=0.00995, ecapa_loss=0.0001593, whisper_loss=0.08379, over 22404.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01051, ecapa_loss=0.0001466, whisper_loss=0.09093, over 3889598.72 frames. ], batch size: 87, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:22:30,075 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-18 09:22:43,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3815830.0, ans=0.0 2024-08-18 09:22:53,473 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 21 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-18 09:23:00,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3815930.0, ans=0.0 2024-08-18 09:23:02,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3816030.0, ans=0.125 2024-08-18 09:23:05,761 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.291e+01 2.521e+01 2.767e+01 4.094e+02, threshold=5.042e+01, percent-clipped=1.0 2024-08-18 09:23:10,922 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3816030.0, ans=0.125 2024-08-18 09:23:11,977 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-18 09:23:14,359 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 11 from Vox, 50 fro AS 2024-08-18 09:23:16,144 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.92 vs. limit=15.0 2024-08-18 09:23:20,125 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-18 09:23:27,404 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 9500, loss[loss=0.08021, beats_loss=0.01099, ecapa_loss=0.0001412, whisper_loss=0.0678, over 15336.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0105, ecapa_loss=0.0001472, whisper_loss=0.0907, over 3915123.22 frames. ], batch size: 61, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:23:55,405 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-18 09:24:19,075 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-18 09:24:28,542 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 9550, loss[loss=0.1218, beats_loss=0.007986, ecapa_loss=0.0001774, whisper_loss=0.1121, over 18081.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01047, ecapa_loss=0.0001468, whisper_loss=0.09085, over 3897825.01 frames. ], batch size: 68, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:24:33,638 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 20 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-18 09:24:51,430 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3816830.0, ans=0.1 2024-08-18 09:25:00,116 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3816930.0, ans=10.0 2024-08-18 09:25:03,841 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3816930.0, ans=0.125 2024-08-18 09:25:04,916 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3817030.0, ans=0.2 2024-08-18 09:25:07,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3817030.0, ans=0.0 2024-08-18 09:25:08,131 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.361e+01 2.629e+01 2.923e+01 5.081e+01, threshold=5.257e+01, percent-clipped=1.0 2024-08-18 09:25:14,310 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 24 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-18 09:25:29,328 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3817230.0, ans=0.0 2024-08-18 09:25:30,068 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 9600, loss[loss=0.1077, beats_loss=0.008956, ecapa_loss=0.0001697, whisper_loss=0.09706, over 16110.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01038, ecapa_loss=0.0001473, whisper_loss=0.09117, over 3881336.89 frames. ], batch size: 66, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:25:37,777 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.01 vs. limit=10.0 2024-08-18 09:25:41,940 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 15 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-18 09:25:55,311 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 13 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-18 09:25:56,997 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-18 09:26:04,802 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.65 vs. limit=22.5 2024-08-18 09:26:10,463 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3817530.0, ans=0.125 2024-08-18 09:26:31,407 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 32 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-18 09:26:32,476 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 9650, loss[loss=0.1267, beats_loss=0.008799, ecapa_loss=0.0001427, whisper_loss=0.1165, over 20721.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01032, ecapa_loss=0.0001479, whisper_loss=0.09121, over 3848020.90 frames. ], batch size: 78, lr: 2.35e-03, grad_scale: 1.152921504606847e+18 2024-08-18 09:26:51,148 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 17 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-18 09:26:55,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3817830.0, ans=0.025 2024-08-18 09:27:07,749 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3817930.0, ans=0.1 2024-08-18 09:27:11,842 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.349e+01 2.605e+01 2.994e+01 4.918e+01, threshold=5.210e+01, percent-clipped=0.0 2024-08-18 09:27:19,119 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.43 vs. limit=10.0 2024-08-18 09:27:19,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3818030.0, ans=0.0 2024-08-18 09:27:32,902 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 22 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-18 09:27:34,004 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 9700, loss[loss=0.08694, beats_loss=0.01115, ecapa_loss=0.000127, whisper_loss=0.07452, over 21859.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01032, ecapa_loss=0.0001488, whisper_loss=0.09098, over 3845490.35 frames. ], batch size: 89, lr: 2.35e-03, grad_scale: 1.152921504606847e+18 2024-08-18 09:27:35,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3818230.0, ans=0.0 2024-08-18 09:27:39,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3818230.0, ans=0.0 2024-08-18 09:28:07,734 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 17 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-18 09:28:20,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3818530.0, ans=0.125 2024-08-18 09:28:21,333 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 20 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-18 09:28:23,062 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.86 vs. limit=15.0 2024-08-18 09:28:27,342 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-18 09:28:27,882 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.26 vs. limit=15.0 2024-08-18 09:28:35,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3818730.0, ans=0.0 2024-08-18 09:28:36,221 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 9750, loss[loss=0.1093, beats_loss=0.01059, ecapa_loss=0.0001376, whisper_loss=0.09738, over 19398.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01038, ecapa_loss=0.000148, whisper_loss=0.09062, over 3838558.62 frames. ], batch size: 77, lr: 2.35e-03, grad_scale: 1.152921504606847e+18 2024-08-18 09:28:36,421 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-18 09:28:45,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3818730.0, ans=0.2 2024-08-18 09:29:01,227 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-18 09:29:03,732 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 15 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-18 09:29:06,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3818930.0, ans=0.0 2024-08-18 09:29:15,745 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.762e+01 2.255e+01 2.464e+01 2.832e+01 2.481e+02, threshold=4.927e+01, percent-clipped=2.0 2024-08-18 09:29:16,346 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3819030.0, ans=0.1 2024-08-18 09:29:24,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3819130.0, ans=0.125 2024-08-18 09:29:24,569 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3819130.0, ans=0.0 2024-08-18 09:29:24,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3819130.0, ans=0.0 2024-08-18 09:29:29,573 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3819130.0, ans=0.125 2024-08-18 09:29:31,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3819130.0, ans=0.125 2024-08-18 09:29:37,756 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 9800, loss[loss=0.08688, beats_loss=0.01002, ecapa_loss=0.000159, whisper_loss=0.07527, over 21872.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01049, ecapa_loss=0.0001477, whisper_loss=0.08963, over 3833203.60 frames. ], batch size: 91, lr: 2.35e-03, grad_scale: 1.152921504606847e+18 2024-08-18 09:29:43,972 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 14 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-18 09:29:51,498 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3819330.0, ans=0.1 2024-08-18 09:29:56,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3819330.0, ans=0.95 2024-08-18 09:30:09,745 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 20 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-18 09:30:23,055 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3819530.0, ans=0.0 2024-08-18 09:30:26,956 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3819630.0, ans=0.0 2024-08-18 09:30:31,700 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3819630.0, ans=0.09899494936611666 2024-08-18 09:30:33,855 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 26 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-18 09:30:38,660 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 9850, loss[loss=0.1178, beats_loss=0.00925, ecapa_loss=0.0001451, whisper_loss=0.1071, over 22044.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01047, ecapa_loss=0.000147, whisper_loss=0.09039, over 3851127.11 frames. ], batch size: 89, lr: 2.35e-03, grad_scale: 1.152921504606847e+18 2024-08-18 09:30:47,151 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-18 09:31:00,876 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3819830.0, ans=0.125 2024-08-18 09:31:18,676 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.410e+01 2.708e+01 2.991e+01 3.936e+01, threshold=5.416e+01, percent-clipped=0.0 2024-08-18 09:31:19,527 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.66 vs. limit=22.5 2024-08-18 09:31:22,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3820030.0, ans=0.1 2024-08-18 09:31:25,735 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.55 vs. limit=6.0 2024-08-18 09:31:29,785 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 22 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-18 09:31:39,609 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 9900, loss[loss=0.1006, beats_loss=0.01225, ecapa_loss=0.0001423, whisper_loss=0.08691, over 21481.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01055, ecapa_loss=0.0001458, whisper_loss=0.09026, over 3870974.72 frames. ], batch size: 88, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:31:47,858 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-18 09:32:27,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3820530.0, ans=0.125 2024-08-18 09:32:32,904 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3820630.0, ans=0.0 2024-08-18 09:32:40,335 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3820630.0, ans=0.1 2024-08-18 09:32:42,363 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 9950, loss[loss=0.1308, beats_loss=0.009001, ecapa_loss=0.0001539, whisper_loss=0.1202, over 21973.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01057, ecapa_loss=0.0001458, whisper_loss=0.0903, over 3894785.21 frames. ], batch size: 87, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:32:58,269 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 27 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-18 09:33:06,612 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3820930.0, ans=0.1 2024-08-18 09:33:15,312 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 26 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-18 09:33:20,651 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 09:33:20,734 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3821030.0, ans=0.0 2024-08-18 09:33:21,886 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3821030.0, ans=0.07 2024-08-18 09:33:22,698 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.257e+01 2.517e+01 2.867e+01 4.376e+01, threshold=5.034e+01, percent-clipped=0.0 2024-08-18 09:33:43,936 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 10000, loss[loss=0.09972, beats_loss=0.0108, ecapa_loss=0.0001571, whisper_loss=0.08735, over 17086.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01057, ecapa_loss=0.0001462, whisper_loss=0.09034, over 3895355.65 frames. ], batch size: 71, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:33:50,889 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.09 vs. limit=10.0 2024-08-18 09:34:09,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3821430.0, ans=0.0 2024-08-18 09:34:12,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3821430.0, ans=0.0 2024-08-18 09:34:15,541 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3821430.0, ans=0.0 2024-08-18 09:34:18,997 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3821430.0, ans=0.1 2024-08-18 09:34:29,886 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 31 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-18 09:34:30,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3821530.0, ans=0.0 2024-08-18 09:34:31,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3821530.0, ans=0.0 2024-08-18 09:34:36,267 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3821630.0, ans=0.0 2024-08-18 09:34:42,499 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.26 vs. limit=15.0 2024-08-18 09:34:45,188 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 10050, loss[loss=0.0926, beats_loss=0.01128, ecapa_loss=0.0001578, whisper_loss=0.07975, over 20949.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01057, ecapa_loss=0.0001456, whisper_loss=0.09, over 3894438.08 frames. ], batch size: 88, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:35:01,923 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 21 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-18 09:35:21,011 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=16.60 vs. limit=15.0 2024-08-18 09:35:25,377 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.531e+01 2.231e+01 2.440e+01 2.652e+01 3.423e+01, threshold=4.880e+01, percent-clipped=0.0 2024-08-18 09:35:33,461 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.43 vs. limit=22.5 2024-08-18 09:35:34,392 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3822130.0, ans=0.125 2024-08-18 09:35:39,915 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 15 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-18 09:35:45,981 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 10100, loss[loss=0.1066, beats_loss=0.009004, ecapa_loss=0.0001363, whisper_loss=0.09627, over 14846.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01062, ecapa_loss=0.0001444, whisper_loss=0.09014, over 3910535.61 frames. ], batch size: 54, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:35:47,451 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3822230.0, ans=0.125 2024-08-18 09:35:56,236 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3822230.0, ans=0.125 2024-08-18 09:35:58,389 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-18 09:36:15,593 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.98 vs. limit=15.0 2024-08-18 09:36:16,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3822430.0, ans=0.2 2024-08-18 09:36:26,032 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3822530.0, ans=0.125 2024-08-18 09:36:29,410 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 22 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-18 09:36:37,146 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3822630.0, ans=0.125 2024-08-18 09:36:38,049 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-18 09:36:38,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3822630.0, ans=0.2 2024-08-18 09:36:41,636 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 19 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-18 09:36:47,548 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 10150, loss[loss=0.09642, beats_loss=0.01169, ecapa_loss=0.0001712, whisper_loss=0.08302, over 14900.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01056, ecapa_loss=0.0001457, whisper_loss=0.09033, over 3877914.14 frames. ], batch size: 61, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:36:48,132 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.59 vs. limit=10.0 2024-08-18 09:36:49,065 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3822730.0, ans=0.125 2024-08-18 09:36:50,871 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.13 vs. limit=15.0 2024-08-18 09:36:52,592 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 27 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-18 09:36:52,784 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3822730.0, ans=0.1 2024-08-18 09:37:16,059 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3822930.0, ans=0.5 2024-08-18 09:37:23,567 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3823030.0, ans=0.2 2024-08-18 09:37:28,038 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.258e+01 2.544e+01 2.982e+01 1.019e+02, threshold=5.088e+01, percent-clipped=1.0 2024-08-18 09:37:45,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3823130.0, ans=0.125 2024-08-18 09:37:47,289 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.73 vs. limit=12.0 2024-08-18 09:37:48,911 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 10200, loss[loss=0.08524, beats_loss=0.01277, ecapa_loss=0.0001796, whisper_loss=0.07067, over 17732.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01056, ecapa_loss=0.0001463, whisper_loss=0.09029, over 3877117.47 frames. ], batch size: 77, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:37:50,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3823230.0, ans=0.0 2024-08-18 09:37:56,829 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-18 09:38:00,584 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-18 09:38:02,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3823330.0, ans=0.125 2024-08-18 09:38:04,811 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.65 vs. limit=15.0 2024-08-18 09:38:04,916 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.19 vs. limit=15.0 2024-08-18 09:38:05,381 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 22 from LS+wenet, 29 from Vox, 45 fro AS 2024-08-18 09:38:13,333 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 34 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-18 09:38:48,626 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.07 vs. limit=22.5 2024-08-18 09:38:51,517 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 10250, loss[loss=0.09131, beats_loss=0.01316, ecapa_loss=0.0001456, whisper_loss=0.07669, over 17758.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01047, ecapa_loss=0.0001461, whisper_loss=0.09107, over 3913028.96 frames. ], batch size: 73, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:38:55,008 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-18 09:39:10,836 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 29 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-18 09:39:11,510 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.21 vs. limit=15.0 2024-08-18 09:39:12,101 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 20 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-18 09:39:22,909 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3823930.0, ans=0.2 2024-08-18 09:39:34,724 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.298e+01 2.473e+01 2.719e+01 4.293e+01, threshold=4.946e+01, percent-clipped=0.0 2024-08-18 09:39:52,887 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 26 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-18 09:39:56,907 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 10300, loss[loss=0.07754, beats_loss=0.01278, ecapa_loss=0.0001132, whisper_loss=0.06363, over 21262.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01048, ecapa_loss=0.000145, whisper_loss=0.09062, over 3900941.03 frames. ], batch size: 85, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:39:58,297 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-18 09:40:12,156 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3824330.0, ans=0.125 2024-08-18 09:40:20,331 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 24 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-18 09:40:20,905 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.03 vs. limit=15.0 2024-08-18 09:40:25,375 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3824430.0, ans=0.0 2024-08-18 09:40:32,098 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.65 vs. limit=15.0 2024-08-18 09:40:34,011 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-18 09:40:43,248 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3824530.0, ans=0.5 2024-08-18 09:40:54,135 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.37 vs. limit=15.0 2024-08-18 09:41:01,208 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 10350, loss[loss=0.1276, beats_loss=0.009889, ecapa_loss=0.0001554, whisper_loss=0.1161, over 18101.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01046, ecapa_loss=0.0001451, whisper_loss=0.09071, over 3917758.94 frames. ], batch size: 72, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:41:14,585 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3824830.0, ans=0.125 2024-08-18 09:41:35,944 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=7.737e-01 2024-08-18 09:41:35,961 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3824930.0, ans=0.2 2024-08-18 09:41:42,577 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.620e+01 2.342e+01 2.610e+01 2.810e+01 3.800e+01, threshold=5.220e+01, percent-clipped=0.0 2024-08-18 09:41:44,307 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3825030.0, ans=0.125 2024-08-18 09:41:50,405 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-18 09:42:02,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3825130.0, ans=0.125 2024-08-18 09:42:03,290 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3825230.0, ans=0.125 2024-08-18 09:42:04,101 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 10400, loss[loss=0.08297, beats_loss=0.01145, ecapa_loss=0.0001422, whisper_loss=0.0701, over 20543.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0104, ecapa_loss=0.0001445, whisper_loss=0.09062, over 3857885.15 frames. ], batch size: 85, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:42:06,120 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3825230.0, ans=0.5 2024-08-18 09:42:26,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3825330.0, ans=0.0 2024-08-18 09:42:28,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3825430.0, ans=0.125 2024-08-18 09:42:38,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3825430.0, ans=0.125 2024-08-18 09:42:51,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3825530.0, ans=15.0 2024-08-18 09:43:00,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3825630.0, ans=0.04949747468305833 2024-08-18 09:43:07,508 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 10450, loss[loss=0.1249, beats_loss=0.008744, ecapa_loss=0.0001219, whisper_loss=0.1149, over 18169.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01044, ecapa_loss=0.0001444, whisper_loss=0.0904, over 3846746.46 frames. ], batch size: 68, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:43:42,381 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 26 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-18 09:43:49,662 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.276e+01 2.479e+01 2.678e+01 3.882e+01, threshold=4.957e+01, percent-clipped=0.0 2024-08-18 09:43:54,175 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3826030.0, ans=0.0 2024-08-18 09:43:55,128 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-18 09:44:02,837 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 20 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-18 09:44:11,444 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 10500, loss[loss=0.0874, beats_loss=0.01133, ecapa_loss=0.0001333, whisper_loss=0.07473, over 20050.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01039, ecapa_loss=0.0001462, whisper_loss=0.09041, over 3853753.30 frames. ], batch size: 80, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:44:18,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3826230.0, ans=0.125 2024-08-18 09:44:37,244 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3826330.0, ans=0.125 2024-08-18 09:44:38,235 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-18 09:44:38,562 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3826430.0, ans=0.125 2024-08-18 09:44:39,572 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 24 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-18 09:44:41,863 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 23 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-18 09:44:43,772 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.01 vs. limit=6.0 2024-08-18 09:44:46,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3826430.0, ans=0.125 2024-08-18 09:44:50,917 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 29 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-18 09:44:51,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3826530.0, ans=0.0 2024-08-18 09:44:55,132 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 12 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-18 09:44:57,123 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3826530.0, ans=0.125 2024-08-18 09:45:07,641 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.37 vs. limit=12.0 2024-08-18 09:45:09,351 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 21 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-18 09:45:20,463 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 10550, loss[loss=0.0942, beats_loss=0.009297, ecapa_loss=0.000157, whisper_loss=0.08333, over 20840.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01036, ecapa_loss=0.0001467, whisper_loss=0.09014, over 3862815.44 frames. ], batch size: 81, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:45:24,173 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 22 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-18 09:45:34,739 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 35 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-18 09:45:37,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3826830.0, ans=0.2 2024-08-18 09:46:07,955 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.737e+01 2.328e+01 2.590e+01 2.836e+01 3.998e+01, threshold=5.181e+01, percent-clipped=0.0 2024-08-18 09:46:19,630 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.67 vs. limit=22.5 2024-08-18 09:46:23,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3827130.0, ans=0.125 2024-08-18 09:46:25,896 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3827130.0, ans=0.1 2024-08-18 09:46:30,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3827230.0, ans=0.125 2024-08-18 09:46:30,841 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 10600, loss[loss=0.1023, beats_loss=0.008564, ecapa_loss=0.000141, whisper_loss=0.09234, over 17048.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01034, ecapa_loss=0.0001466, whisper_loss=0.08972, over 3869285.19 frames. ], batch size: 66, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:46:39,268 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 28 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-18 09:46:53,850 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 25 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-18 09:47:07,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3827430.0, ans=0.0 2024-08-18 09:47:19,041 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 25 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-18 09:47:20,783 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3827530.0, ans=0.0 2024-08-18 09:47:23,572 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3827530.0, ans=0.0 2024-08-18 09:47:28,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3827630.0, ans=0.125 2024-08-18 09:47:40,365 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 10650, loss[loss=0.09853, beats_loss=0.01138, ecapa_loss=0.0001493, whisper_loss=0.08566, over 22627.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01039, ecapa_loss=0.0001455, whisper_loss=0.08999, over 3878570.41 frames. ], batch size: 92, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:47:45,180 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3827730.0, ans=0.125 2024-08-18 09:47:45,183 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3827730.0, ans=0.0 2024-08-18 09:48:00,286 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-18 09:48:22,456 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 20 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-18 09:48:25,982 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.286e+01 2.479e+01 2.836e+01 4.249e+01, threshold=4.958e+01, percent-clipped=0.0 2024-08-18 09:48:43,537 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3828130.0, ans=0.0 2024-08-18 09:48:48,990 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 10700, loss[loss=0.1145, beats_loss=0.01015, ecapa_loss=0.0001447, whisper_loss=0.1029, over 22334.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01046, ecapa_loss=0.0001437, whisper_loss=0.09031, over 3874100.77 frames. ], batch size: 92, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:48:53,909 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3828230.0, ans=0.0 2024-08-18 09:49:03,042 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.78 vs. limit=15.0 2024-08-18 09:49:09,638 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-18 09:49:11,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3828330.0, ans=0.2 2024-08-18 09:49:12,649 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 23 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-18 09:49:12,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3828330.0, ans=0.0 2024-08-18 09:49:25,074 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3828430.0, ans=0.1 2024-08-18 09:49:31,484 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3828530.0, ans=0.125 2024-08-18 09:49:33,248 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.58 vs. limit=15.0 2024-08-18 09:49:51,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3828630.0, ans=0.125 2024-08-18 09:49:51,766 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.88 vs. limit=15.0 2024-08-18 09:49:58,111 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.55 vs. limit=15.0 2024-08-18 09:49:58,731 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 10750, loss[loss=0.08231, beats_loss=0.01212, ecapa_loss=0.0001336, whisper_loss=0.06885, over 19414.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01044, ecapa_loss=0.0001445, whisper_loss=0.0912, over 3865244.82 frames. ], batch size: 81, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:50:00,773 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.25 vs. limit=15.0 2024-08-18 09:50:01,439 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 17 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-18 09:50:01,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3828730.0, ans=0.035 2024-08-18 09:50:11,040 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3828830.0, ans=0.125 2024-08-18 09:50:19,485 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.49 vs. limit=15.0 2024-08-18 09:50:24,391 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 24 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-18 09:50:29,034 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.11 vs. limit=22.5 2024-08-18 09:50:43,104 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.343e+01 2.580e+01 2.880e+01 1.020e+02, threshold=5.160e+01, percent-clipped=2.0 2024-08-18 09:50:44,852 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3829030.0, ans=0.2 2024-08-18 09:50:53,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3829130.0, ans=0.125 2024-08-18 09:51:04,583 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.66 vs. limit=15.0 2024-08-18 09:51:05,065 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 10800, loss[loss=0.1365, beats_loss=0.009472, ecapa_loss=0.0001236, whisper_loss=0.1258, over 23201.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01043, ecapa_loss=0.000144, whisper_loss=0.09232, over 3879473.90 frames. ], batch size: 87, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:51:09,316 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3829230.0, ans=0.0 2024-08-18 09:51:21,373 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 23 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-18 09:51:24,343 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-18 09:51:27,047 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 26 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-18 09:51:33,948 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-18 09:51:36,022 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3829430.0, ans=0.025 2024-08-18 09:52:04,277 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.90 vs. limit=15.0 2024-08-18 09:52:08,573 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 10850, loss[loss=0.08897, beats_loss=0.01156, ecapa_loss=0.0001175, whisper_loss=0.07623, over 23114.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01041, ecapa_loss=0.0001443, whisper_loss=0.09254, over 3880114.53 frames. ], batch size: 89, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:52:28,653 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.751e+01 2024-08-18 09:52:44,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3830030.0, ans=0.0 2024-08-18 09:52:49,081 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.278e+01 2.571e+01 2.919e+01 2.090e+02, threshold=5.141e+01, percent-clipped=1.0 2024-08-18 09:53:00,507 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-18 09:53:03,301 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3830130.0, ans=0.125 2024-08-18 09:53:06,563 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 22 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-18 09:53:10,071 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 10900, loss[loss=0.09752, beats_loss=0.01083, ecapa_loss=0.0001443, whisper_loss=0.08524, over 16107.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.0104, ecapa_loss=0.0001445, whisper_loss=0.09261, over 3888262.28 frames. ], batch size: 63, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:53:16,882 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3830230.0, ans=0.5 2024-08-18 09:53:20,351 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 33 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-18 09:53:26,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3830330.0, ans=0.125 2024-08-18 09:53:30,899 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.68 vs. limit=22.5 2024-08-18 09:53:49,378 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.46 vs. limit=22.5 2024-08-18 09:53:50,256 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=5.081e-03 2024-08-18 09:53:56,866 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.27 vs. limit=10.0 2024-08-18 09:54:04,397 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3830630.0, ans=0.125 2024-08-18 09:54:12,683 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 10950, loss[loss=0.08765, beats_loss=0.01014, ecapa_loss=0.0001503, whisper_loss=0.07601, over 21821.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01043, ecapa_loss=0.0001442, whisper_loss=0.0923, over 3930657.35 frames. ], batch size: 92, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:54:21,664 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 34 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-18 09:54:23,959 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 16 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-18 09:54:30,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3830830.0, ans=0.125 2024-08-18 09:54:34,177 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3830830.0, ans=0.0 2024-08-18 09:54:43,205 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.28 vs. limit=15.0 2024-08-18 09:54:46,078 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 27 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-18 09:54:46,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3830930.0, ans=0.125 2024-08-18 09:54:47,385 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 20 from LS+wenet, 32 from Vox, 32 fro AS 2024-08-18 09:54:50,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3831030.0, ans=0.125 2024-08-18 09:54:53,574 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.701e+01 2.311e+01 2.560e+01 2.813e+01 5.362e+01, threshold=5.120e+01, percent-clipped=1.0 2024-08-18 09:55:09,834 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 27 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-18 09:55:11,327 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3831130.0, ans=0.07 2024-08-18 09:55:14,498 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 11000, loss[loss=0.1044, beats_loss=0.01157, ecapa_loss=0.0001311, whisper_loss=0.09154, over 23246.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01036, ecapa_loss=0.0001444, whisper_loss=0.09213, over 3928285.01 frames. ], batch size: 93, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:55:17,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3831230.0, ans=0.035 2024-08-18 09:55:17,591 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.37 vs. limit=22.5 2024-08-18 09:55:20,606 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 22 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-18 09:55:20,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3831230.0, ans=0.125 2024-08-18 09:55:31,787 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-18 09:55:39,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3831430.0, ans=0.1 2024-08-18 09:55:43,676 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 18 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-18 09:55:51,464 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3831530.0, ans=0.0 2024-08-18 09:55:52,675 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3831530.0, ans=0.2 2024-08-18 09:55:53,590 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-18 09:55:57,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3831530.0, ans=0.125 2024-08-18 09:56:15,763 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 11050, loss[loss=0.1187, beats_loss=0.008752, ecapa_loss=0.0001581, whisper_loss=0.1084, over 22304.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01038, ecapa_loss=0.0001442, whisper_loss=0.0918, over 3957824.44 frames. ], batch size: 90, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:56:38,440 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3831830.0, ans=0.0 2024-08-18 09:56:50,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3831930.0, ans=0.125 2024-08-18 09:56:56,743 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.378e+01 2.551e+01 2.783e+01 4.873e+01, threshold=5.103e+01, percent-clipped=0.0 2024-08-18 09:56:57,123 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3832030.0, ans=0.125 2024-08-18 09:56:58,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3832030.0, ans=0.0 2024-08-18 09:57:04,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3832130.0, ans=0.07 2024-08-18 09:57:06,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3832130.0, ans=0.125 2024-08-18 09:57:17,713 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 11100, loss[loss=0.0997, beats_loss=0.01013, ecapa_loss=0.0001884, whisper_loss=0.08769, over 13109.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01041, ecapa_loss=0.0001447, whisper_loss=0.09165, over 3907395.03 frames. ], batch size: 57, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:57:20,961 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3832230.0, ans=0.04949747468305833 2024-08-18 09:57:22,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3832230.0, ans=0.0 2024-08-18 09:57:33,087 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 26 from LS+wenet, 20 from Vox, 17 fro AS 2024-08-18 09:57:34,286 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 12 from Vox, 44 fro AS 2024-08-18 09:57:34,562 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3832330.0, ans=0.2 2024-08-18 09:58:19,937 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 11150, loss[loss=0.1081, beats_loss=0.009512, ecapa_loss=0.00015, whisper_loss=0.09713, over 20569.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01036, ecapa_loss=0.0001445, whisper_loss=0.09187, over 3909358.26 frames. ], batch size: 79, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:58:22,644 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-18 09:58:33,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3832830.0, ans=0.0 2024-08-18 09:58:56,855 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3833030.0, ans=0.125 2024-08-18 09:58:59,558 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.45 vs. limit=15.0 2024-08-18 09:59:00,325 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.739e+01 2.313e+01 2.531e+01 2.915e+01 1.663e+02, threshold=5.062e+01, percent-clipped=1.0 2024-08-18 09:59:04,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3833030.0, ans=0.95 2024-08-18 09:59:17,874 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 21 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-18 09:59:21,585 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 11200, loss[loss=0.09033, beats_loss=0.01257, ecapa_loss=0.0001225, whisper_loss=0.07653, over 22177.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01031, ecapa_loss=0.0001457, whisper_loss=0.09198, over 3872465.03 frames. ], batch size: 88, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:59:32,170 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.56 vs. limit=12.0 2024-08-18 09:59:36,921 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3833330.0, ans=0.125 2024-08-18 09:59:41,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3833330.0, ans=0.125 2024-08-18 09:59:42,940 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3833330.0, ans=0.125 2024-08-18 10:00:17,135 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-18 10:00:23,235 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 11250, loss[loss=0.0959, beats_loss=0.009229, ecapa_loss=0.000152, whisper_loss=0.08515, over 20172.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01029, ecapa_loss=0.0001459, whisper_loss=0.09226, over 3886478.99 frames. ], batch size: 81, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:00:39,225 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 26 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-18 10:00:43,651 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.88 vs. limit=15.0 2024-08-18 10:00:52,343 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3833930.0, ans=0.0 2024-08-18 10:00:53,291 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 31 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-18 10:00:54,832 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3833930.0, ans=0.125 2024-08-18 10:01:00,938 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-18 10:01:04,439 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.304e+01 2.562e+01 2.921e+01 1.559e+02, threshold=5.123e+01, percent-clipped=1.0 2024-08-18 10:01:09,689 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 15 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-18 10:01:16,344 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3834130.0, ans=0.0 2024-08-18 10:01:23,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3834130.0, ans=0.0 2024-08-18 10:01:25,739 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 11300, loss[loss=0.1042, beats_loss=0.009357, ecapa_loss=0.0001309, whisper_loss=0.09356, over 22381.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01032, ecapa_loss=0.0001456, whisper_loss=0.09207, over 3892370.17 frames. ], batch size: 89, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:01:27,768 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=15.64 vs. limit=15.0 2024-08-18 10:01:28,583 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-18 10:01:31,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3834230.0, ans=0.2 2024-08-18 10:02:00,707 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 32 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-18 10:02:01,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3834430.0, ans=0.125 2024-08-18 10:02:02,497 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3834530.0, ans=0.125 2024-08-18 10:02:10,158 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-18 10:02:17,860 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.44 vs. limit=22.5 2024-08-18 10:02:19,755 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3834630.0, ans=0.0 2024-08-18 10:02:27,609 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3834730.0, ans=0.125 2024-08-18 10:02:28,511 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 11350, loss[loss=0.09036, beats_loss=0.00687, ecapa_loss=0.0001845, whisper_loss=0.08165, over 14867.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0104, ecapa_loss=0.000145, whisper_loss=0.09154, over 3911683.41 frames. ], batch size: 55, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:02:59,727 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3834930.0, ans=0.125 2024-08-18 10:03:07,602 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 23 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-18 10:03:09,869 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.308e+01 2.551e+01 2.797e+01 2.772e+02, threshold=5.101e+01, percent-clipped=2.0 2024-08-18 10:03:11,364 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3835030.0, ans=0.0 2024-08-18 10:03:14,917 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 22 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-18 10:03:17,604 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3835130.0, ans=0.125 2024-08-18 10:03:18,546 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 23 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-18 10:03:19,736 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 18 from LS+wenet, 27 from Vox, 45 fro AS 2024-08-18 10:03:30,664 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 11400, loss[loss=0.09891, beats_loss=0.009777, ecapa_loss=0.0001472, whisper_loss=0.08766, over 19595.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01042, ecapa_loss=0.0001463, whisper_loss=0.0911, over 3898150.76 frames. ], batch size: 77, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:03:36,610 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-18 10:03:36,979 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3835230.0, ans=0.0 2024-08-18 10:03:39,714 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.53 vs. limit=15.0 2024-08-18 10:03:52,714 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-18 10:04:10,219 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3835530.0, ans=0.1 2024-08-18 10:04:11,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3835530.0, ans=0.0 2024-08-18 10:04:17,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3835530.0, ans=0.125 2024-08-18 10:04:17,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3835530.0, ans=0.1 2024-08-18 10:04:24,298 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 37 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-18 10:04:32,559 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 11450, loss[loss=0.0969, beats_loss=0.01132, ecapa_loss=0.000145, whisper_loss=0.08413, over 21548.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01046, ecapa_loss=0.0001463, whisper_loss=0.09075, over 3901284.05 frames. ], batch size: 88, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:04:35,653 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.70 vs. limit=15.0 2024-08-18 10:04:39,296 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3835730.0, ans=0.125 2024-08-18 10:04:41,352 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 27 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-18 10:04:52,766 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.23 vs. limit=22.5 2024-08-18 10:05:02,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3835930.0, ans=0.2 2024-08-18 10:05:06,411 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 10:05:13,163 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.297e+01 2.481e+01 2.767e+01 4.172e+01, threshold=4.962e+01, percent-clipped=0.0 2024-08-18 10:05:22,353 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3836130.0, ans=0.0 2024-08-18 10:05:26,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3836130.0, ans=0.0 2024-08-18 10:05:33,994 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 11500, loss[loss=0.11, beats_loss=0.009102, ecapa_loss=0.0001394, whisper_loss=0.0995, over 19101.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01042, ecapa_loss=0.0001461, whisper_loss=0.09124, over 3924588.33 frames. ], batch size: 70, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:05:38,734 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3836230.0, ans=0.2 2024-08-18 10:05:44,006 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.96 vs. limit=15.0 2024-08-18 10:05:59,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3836430.0, ans=0.125 2024-08-18 10:06:01,367 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3836430.0, ans=0.125 2024-08-18 10:06:09,985 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 18 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-18 10:06:11,313 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-18 10:06:12,427 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-18 10:06:39,915 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 11550, loss[loss=0.1153, beats_loss=0.008432, ecapa_loss=0.0001389, whisper_loss=0.1055, over 17014.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01039, ecapa_loss=0.0001466, whisper_loss=0.0917, over 3905429.92 frames. ], batch size: 64, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:06:47,685 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 27 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-18 10:07:15,911 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 14 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-18 10:07:18,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3836930.0, ans=0.125 2024-08-18 10:07:26,134 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.646e+01 2.337e+01 2.538e+01 2.781e+01 3.732e+01, threshold=5.076e+01, percent-clipped=0.0 2024-08-18 10:07:33,002 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-18 10:07:39,773 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 20 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-18 10:07:43,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3837130.0, ans=0.125 2024-08-18 10:07:45,078 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.39 vs. limit=15.0 2024-08-18 10:07:47,234 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-18 10:07:52,152 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 11600, loss[loss=0.09092, beats_loss=0.01326, ecapa_loss=0.0001057, whisper_loss=0.0766, over 18335.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01047, ecapa_loss=0.0001458, whisper_loss=0.09159, over 3931209.07 frames. ], batch size: 70, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:07:56,326 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-18 10:08:02,713 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3837230.0, ans=0.0 2024-08-18 10:08:16,040 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 23 from LS+wenet, 32 from Vox, 26 fro AS 2024-08-18 10:08:35,699 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-18 10:08:46,686 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-18 10:08:49,834 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 28 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-18 10:08:55,620 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 17 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-18 10:09:06,949 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 11650, loss[loss=0.1005, beats_loss=0.006473, ecapa_loss=0.0001313, whisper_loss=0.09276, over 15208.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0104, ecapa_loss=0.0001474, whisper_loss=0.09224, over 3949388.16 frames. ], batch size: 54, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:09:36,351 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-18 10:09:36,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3837930.0, ans=0.125 2024-08-18 10:09:44,919 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-18 10:09:56,756 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.353e+01 2.626e+01 3.025e+01 7.544e+01, threshold=5.251e+01, percent-clipped=1.0 2024-08-18 10:09:58,832 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3838030.0, ans=0.125 2024-08-18 10:10:21,913 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 11700, loss[loss=0.09684, beats_loss=0.01311, ecapa_loss=0.0001498, whisper_loss=0.08223, over 17955.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01043, ecapa_loss=0.0001473, whisper_loss=0.09233, over 3966999.90 frames. ], batch size: 74, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:10:46,211 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 18 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-18 10:11:05,510 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.58 vs. limit=22.5 2024-08-18 10:11:19,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3838630.0, ans=0.0 2024-08-18 10:11:19,717 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2024-08-18 10:11:20,708 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-18 10:11:34,423 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3838730.0, ans=0.125 2024-08-18 10:11:35,229 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 11750, loss[loss=0.1164, beats_loss=0.01034, ecapa_loss=0.0001507, whisper_loss=0.1045, over 22977.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01054, ecapa_loss=0.0001462, whisper_loss=0.09182, over 3948017.65 frames. ], batch size: 91, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:11:42,113 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3838730.0, ans=0.0 2024-08-18 10:11:45,095 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3838730.0, ans=0.125 2024-08-18 10:11:55,983 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3838830.0, ans=0.0 2024-08-18 10:12:24,821 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.258e+01 2.471e+01 2.735e+01 3.598e+01, threshold=4.941e+01, percent-clipped=0.0 2024-08-18 10:12:25,657 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.84 vs. limit=10.0 2024-08-18 10:12:39,268 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.71 vs. limit=10.0 2024-08-18 10:12:51,154 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 11800, loss[loss=0.09169, beats_loss=0.01071, ecapa_loss=0.00017, whisper_loss=0.07928, over 21459.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01061, ecapa_loss=0.0001447, whisper_loss=0.09106, over 3939773.47 frames. ], batch size: 89, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:12:51,910 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2024-08-18 10:13:10,250 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 22 from LS+wenet, 18 from Vox, 52 fro AS 2024-08-18 10:13:23,401 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-18 10:13:37,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3839530.0, ans=0.125 2024-08-18 10:13:47,236 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3839630.0, ans=0.05 2024-08-18 10:13:56,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3839630.0, ans=0.0 2024-08-18 10:14:04,423 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 11850, loss[loss=0.1108, beats_loss=0.01095, ecapa_loss=0.0001461, whisper_loss=0.09843, over 15637.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01066, ecapa_loss=0.0001439, whisper_loss=0.09029, over 3906767.55 frames. ], batch size: 61, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:14:06,312 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3839730.0, ans=0.125 2024-08-18 10:14:11,309 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 30 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-18 10:14:19,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3839830.0, ans=0.0 2024-08-18 10:14:28,143 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.02 vs. limit=15.0 2024-08-18 10:14:29,847 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.84 vs. limit=15.0 2024-08-18 10:14:42,791 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 27 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-18 10:14:56,043 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.711e+01 2.290e+01 2.585e+01 2.979e+01 4.833e+01, threshold=5.171e+01, percent-clipped=0.0 2024-08-18 10:15:03,192 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.17 vs. limit=10.0 2024-08-18 10:15:15,379 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-18 10:15:19,722 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 11900, loss[loss=0.09255, beats_loss=0.008111, ecapa_loss=0.0001478, whisper_loss=0.08296, over 16177.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01065, ecapa_loss=0.0001446, whisper_loss=0.09021, over 3924732.52 frames. ], batch size: 61, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:15:22,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3840230.0, ans=0.1 2024-08-18 10:15:23,422 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.96 vs. limit=5.0 2024-08-18 10:15:31,385 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-18 10:15:46,528 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 23 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-18 10:15:49,843 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3840430.0, ans=0.0 2024-08-18 10:16:07,669 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3840530.0, ans=0.2 2024-08-18 10:16:21,401 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-18 10:16:30,007 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 11950, loss[loss=0.09526, beats_loss=0.01176, ecapa_loss=0.0001519, whisper_loss=0.08198, over 21719.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0106, ecapa_loss=0.0001433, whisper_loss=0.0902, over 3911395.39 frames. ], batch size: 90, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:16:40,416 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=3840730.0, ans=15.0 2024-08-18 10:16:42,964 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3840830.0, ans=0.025 2024-08-18 10:17:07,985 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3840930.0, ans=0.1 2024-08-18 10:17:16,028 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3841030.0, ans=0.0 2024-08-18 10:17:20,863 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.256e+01 2.516e+01 2.795e+01 5.453e+01, threshold=5.033e+01, percent-clipped=1.0 2024-08-18 10:17:24,434 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3841030.0, ans=0.125 2024-08-18 10:17:25,944 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3841030.0, ans=0.0 2024-08-18 10:17:28,384 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-18 10:17:35,376 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 10:17:36,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3841130.0, ans=0.1 2024-08-18 10:17:45,307 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 12000, loss[loss=0.09258, beats_loss=0.01072, ecapa_loss=0.0001457, whisper_loss=0.0804, over 18638.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01059, ecapa_loss=0.0001427, whisper_loss=0.0903, over 3910582.42 frames. ], batch size: 72, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:17:45,307 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-18 10:18:21,583 INFO [train_multi_KD3.py:1149] (2/4) Epoch 26, validation on ASR_libri: loss=0.2531, beats_loss=0, ecapa_loss=0.0005311, whisper_loss=0.2478, over 922467.00 frames. 2024-08-18 10:18:40,413 INFO [train_multi_KD3.py:1149] (2/4) Epoch 26, validation on SV_voxceleb1: loss=0.004077, beats_loss=0, ecapa_loss=0.0004077, whisper_loss=0, over 939242.00 frames. 2024-08-18 10:18:58,083 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.4793, 2.0845, 2.2240, 1.6283, 1.7824, 2.3729, 2.6918, 1.7319], device='cuda:2') 2024-08-18 10:20:17,393 INFO [train_multi_KD3.py:1149] (2/4) Epoch 26, validation on AT_audioset: loss=0.02316, beats_loss=0.02316, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 10:20:17,398 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31584MB 2024-08-18 10:20:20,364 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3841230.0, ans=0.125 2024-08-18 10:20:25,462 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-18 10:20:26,338 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3841230.0, ans=0.0 2024-08-18 10:20:28,692 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3841230.0, ans=0.2 2024-08-18 10:20:38,759 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.08 vs. limit=12.0 2024-08-18 10:20:42,626 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3841330.0, ans=0.125 2024-08-18 10:20:52,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3841430.0, ans=0.0 2024-08-18 10:21:11,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3841530.0, ans=0.125 2024-08-18 10:21:29,164 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.23 vs. limit=15.0 2024-08-18 10:21:30,116 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 12050, loss[loss=0.1065, beats_loss=0.0101, ecapa_loss=0.0001876, whisper_loss=0.09453, over 20286.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01052, ecapa_loss=0.0001437, whisper_loss=0.09011, over 3895568.11 frames. ], batch size: 85, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:21:33,536 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-18 10:21:39,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3841730.0, ans=0.1 2024-08-18 10:21:59,421 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 19 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-18 10:22:24,309 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.275e+01 2.552e+01 2.886e+01 4.482e+01, threshold=5.104e+01, percent-clipped=0.0 2024-08-18 10:22:32,862 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3842130.0, ans=0.2 2024-08-18 10:22:35,860 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-18 10:22:40,375 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3842130.0, ans=0.125 2024-08-18 10:22:42,764 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.63 vs. limit=15.0 2024-08-18 10:22:48,995 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 12100, loss[loss=0.08326, beats_loss=0.01102, ecapa_loss=0.0001317, whisper_loss=0.07093, over 17762.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01048, ecapa_loss=0.0001447, whisper_loss=0.09013, over 3876762.35 frames. ], batch size: 71, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:23:15,937 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3842330.0, ans=0.125 2024-08-18 10:23:20,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3842430.0, ans=0.125 2024-08-18 10:23:21,747 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3842430.0, ans=0.125 2024-08-18 10:23:21,832 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3842430.0, ans=0.125 2024-08-18 10:23:41,639 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3842530.0, ans=0.025 2024-08-18 10:23:47,954 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-18 10:24:04,686 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 12150, loss[loss=0.125, beats_loss=0.008192, ecapa_loss=0.0001223, whisper_loss=0.1156, over 21433.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01051, ecapa_loss=0.0001443, whisper_loss=0.08944, over 3888510.34 frames. ], batch size: 77, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:24:06,054 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 29 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-18 10:24:12,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3842730.0, ans=0.0 2024-08-18 10:24:13,530 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 17 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-18 10:24:36,200 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.763e+01 2024-08-18 10:24:43,874 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 21 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-18 10:24:54,708 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.341e+01 2.544e+01 2.867e+01 3.722e+01, threshold=5.088e+01, percent-clipped=0.0 2024-08-18 10:24:56,194 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-18 10:25:07,201 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.94 vs. limit=15.0 2024-08-18 10:25:08,838 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3843130.0, ans=0.125 2024-08-18 10:25:12,457 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3843130.0, ans=0.1 2024-08-18 10:25:19,762 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 12200, loss[loss=0.1041, beats_loss=0.009451, ecapa_loss=0.0001648, whisper_loss=0.09299, over 22213.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01048, ecapa_loss=0.0001449, whisper_loss=0.09008, over 3905175.60 frames. ], batch size: 92, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:25:38,086 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3843330.0, ans=0.125 2024-08-18 10:25:39,146 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-18 10:25:41,789 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 24 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-18 10:25:47,885 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.56 vs. limit=15.0 2024-08-18 10:26:08,600 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 21 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-18 10:26:19,673 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.86 vs. limit=15.0 2024-08-18 10:26:25,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3843630.0, ans=0.0 2024-08-18 10:26:27,304 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3843630.0, ans=0.2 2024-08-18 10:26:27,307 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3843630.0, ans=0.07 2024-08-18 10:26:32,335 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3843630.0, ans=0.125 2024-08-18 10:26:35,030 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.98 vs. limit=22.5 2024-08-18 10:26:42,594 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 12250, loss[loss=0.08115, beats_loss=0.008523, ecapa_loss=0.000184, whisper_loss=0.07079, over 16544.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01043, ecapa_loss=0.0001457, whisper_loss=0.08956, over 3877844.19 frames. ], batch size: 65, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:26:56,645 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3843730.0, ans=0.05 2024-08-18 10:26:57,711 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 26 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-18 10:27:00,671 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 21 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-18 10:27:01,251 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3843830.0, ans=0.2 2024-08-18 10:27:21,658 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-18 10:27:23,721 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3843930.0, ans=0.2 2024-08-18 10:27:24,294 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.94 vs. limit=15.0 2024-08-18 10:27:29,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3844030.0, ans=0.125 2024-08-18 10:27:36,180 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.791e+01 2.268e+01 2.517e+01 2.839e+01 6.711e+01, threshold=5.035e+01, percent-clipped=1.0 2024-08-18 10:27:37,859 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-18 10:28:00,141 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-18 10:28:01,912 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 12300, loss[loss=0.09601, beats_loss=0.01046, ecapa_loss=0.0001256, whisper_loss=0.08429, over 22117.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.0105, ecapa_loss=0.0001463, whisper_loss=0.08842, over 3876817.82 frames. ], batch size: 87, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:28:02,177 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 24 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-18 10:28:10,340 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.67 vs. limit=15.0 2024-08-18 10:28:17,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3844330.0, ans=0.0 2024-08-18 10:28:45,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3844430.0, ans=0.0 2024-08-18 10:29:03,226 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-18 10:29:14,999 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3844630.0, ans=0.2 2024-08-18 10:29:23,538 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 12350, loss[loss=0.1093, beats_loss=0.009114, ecapa_loss=0.0001465, whisper_loss=0.09868, over 22414.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01044, ecapa_loss=0.0001472, whisper_loss=0.08958, over 3888444.14 frames. ], batch size: 88, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:29:28,185 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3844730.0, ans=0.0 2024-08-18 10:29:46,118 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.24 vs. limit=15.0 2024-08-18 10:29:59,902 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-18 10:30:14,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3845030.0, ans=0.0 2024-08-18 10:30:19,976 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.330e+01 2.553e+01 2.743e+01 3.939e+01, threshold=5.106e+01, percent-clipped=0.0 2024-08-18 10:30:20,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=3845030.0, ans=0.2 2024-08-18 10:30:27,550 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 30 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-18 10:30:32,204 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-18 10:30:35,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3845130.0, ans=0.0 2024-08-18 10:30:41,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3845130.0, ans=0.125 2024-08-18 10:30:41,636 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3845130.0, ans=0.0 2024-08-18 10:30:45,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3845230.0, ans=0.2 2024-08-18 10:30:46,898 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 12400, loss[loss=0.1185, beats_loss=0.009936, ecapa_loss=0.0001257, whisper_loss=0.1073, over 23068.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0104, ecapa_loss=0.0001465, whisper_loss=0.09032, over 3890275.78 frames. ], batch size: 87, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:31:11,160 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-18 10:31:44,347 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 21 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-18 10:31:56,399 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3845630.0, ans=0.125 2024-08-18 10:31:59,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3845630.0, ans=0.125 2024-08-18 10:32:06,362 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3845730.0, ans=0.125 2024-08-18 10:32:07,468 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 12450, loss[loss=0.1106, beats_loss=0.007121, ecapa_loss=0.0001617, whisper_loss=0.1018, over 16568.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0104, ecapa_loss=0.0001457, whisper_loss=0.09043, over 3881586.09 frames. ], batch size: 65, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:32:08,054 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.142e-03 2024-08-18 10:32:41,141 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2024-08-18 10:32:49,680 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-18 10:33:01,441 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.04 vs. limit=8.0 2024-08-18 10:33:04,947 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.360e+01 2.669e+01 3.014e+01 4.345e+01, threshold=5.338e+01, percent-clipped=0.0 2024-08-18 10:33:07,606 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3846030.0, ans=0.2 2024-08-18 10:33:17,896 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-18 10:33:18,381 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.57 vs. limit=22.5 2024-08-18 10:33:22,323 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 21 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-18 10:33:28,073 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 10:33:31,223 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 12500, loss[loss=0.1127, beats_loss=0.009805, ecapa_loss=0.0001827, whisper_loss=0.1011, over 22089.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01034, ecapa_loss=0.000145, whisper_loss=0.09091, over 3881266.19 frames. ], batch size: 92, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:33:31,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3846230.0, ans=0.125 2024-08-18 10:33:38,899 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3846230.0, ans=0.0 2024-08-18 10:33:51,750 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.48 vs. limit=15.0 2024-08-18 10:34:08,473 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-18 10:34:20,879 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3846530.0, ans=0.125 2024-08-18 10:34:25,173 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3846530.0, ans=0.125 2024-08-18 10:34:29,570 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-18 10:34:38,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3846630.0, ans=0.1 2024-08-18 10:34:44,474 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3846630.0, ans=0.125 2024-08-18 10:34:46,848 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 12550, loss[loss=0.09385, beats_loss=0.01175, ecapa_loss=0.000127, whisper_loss=0.08084, over 14107.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01039, ecapa_loss=0.0001454, whisper_loss=0.09089, over 3883222.68 frames. ], batch size: 55, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:34:59,287 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.99 vs. limit=15.0 2024-08-18 10:35:04,420 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3846830.0, ans=0.2 2024-08-18 10:35:07,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3846830.0, ans=0.0 2024-08-18 10:35:28,745 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.57 vs. limit=15.0 2024-08-18 10:35:38,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3847030.0, ans=0.0 2024-08-18 10:35:41,254 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.783e+01 2.372e+01 2.623e+01 3.122e+01 3.895e+01, threshold=5.245e+01, percent-clipped=0.0 2024-08-18 10:35:49,259 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 20 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-18 10:36:05,624 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 12600, loss[loss=0.07533, beats_loss=0.008355, ecapa_loss=0.0002001, whisper_loss=0.06497, over 13539.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01038, ecapa_loss=0.0001452, whisper_loss=0.09201, over 3880587.81 frames. ], batch size: 55, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:36:19,514 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 21 from LS+wenet, 31 from Vox, 42 fro AS 2024-08-18 10:36:25,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3847330.0, ans=0.04949747468305833 2024-08-18 10:36:32,890 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.58 vs. limit=22.5 2024-08-18 10:36:39,019 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3847430.0, ans=0.125 2024-08-18 10:37:27,607 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 12650, loss[loss=0.0784, beats_loss=0.01315, ecapa_loss=0.0001362, whisper_loss=0.06389, over 19326.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01049, ecapa_loss=0.0001451, whisper_loss=0.09091, over 3832523.13 frames. ], batch size: 78, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:37:32,684 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3847730.0, ans=0.125 2024-08-18 10:37:43,172 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.12 vs. limit=6.0 2024-08-18 10:37:59,898 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-18 10:38:07,139 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3847930.0, ans=0.0 2024-08-18 10:38:13,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3848030.0, ans=0.1 2024-08-18 10:38:13,871 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3848030.0, ans=0.125 2024-08-18 10:38:16,861 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3848030.0, ans=0.0 2024-08-18 10:38:20,868 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.252e+01 2.537e+01 2.887e+01 4.368e+01, threshold=5.074e+01, percent-clipped=0.0 2024-08-18 10:38:32,405 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 12 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-18 10:38:48,083 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 12700, loss[loss=0.1113, beats_loss=0.01102, ecapa_loss=0.0001206, whisper_loss=0.09911, over 17942.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01048, ecapa_loss=0.0001443, whisper_loss=0.09104, over 3810858.85 frames. ], batch size: 69, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:38:55,907 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.88 vs. limit=22.5 2024-08-18 10:38:58,721 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.05 vs. limit=22.5 2024-08-18 10:39:00,874 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-18 10:39:26,406 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3848430.0, ans=0.05 2024-08-18 10:39:30,593 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-18 10:39:41,780 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-18 10:39:48,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=3848530.0, ans=6.0 2024-08-18 10:39:49,512 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-18 10:40:02,633 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-18 10:40:09,664 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 12750, loss[loss=0.08003, beats_loss=0.01081, ecapa_loss=0.0001483, whisper_loss=0.06773, over 16326.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01047, ecapa_loss=0.0001454, whisper_loss=0.09143, over 3878614.26 frames. ], batch size: 67, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:40:13,419 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3848730.0, ans=0.1 2024-08-18 10:40:14,791 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-18 10:40:22,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3848730.0, ans=0.0 2024-08-18 10:40:31,681 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3848830.0, ans=0.2 2024-08-18 10:40:34,720 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 10 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-18 10:40:35,512 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.99 vs. limit=22.5 2024-08-18 10:41:04,855 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.297e+01 2.524e+01 2.809e+01 4.213e+01, threshold=5.048e+01, percent-clipped=0.0 2024-08-18 10:41:11,576 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.86 vs. limit=22.5 2024-08-18 10:41:14,956 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3849130.0, ans=0.125 2024-08-18 10:41:29,566 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 12800, loss[loss=0.09689, beats_loss=0.01081, ecapa_loss=0.0001374, whisper_loss=0.08471, over 18568.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01061, ecapa_loss=0.0001458, whisper_loss=0.09004, over 3860999.43 frames. ], batch size: 74, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:41:29,727 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-18 10:41:43,182 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 21 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-18 10:41:55,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3849330.0, ans=0.125 2024-08-18 10:42:04,555 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 27 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-18 10:42:15,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3849430.0, ans=0.125 2024-08-18 10:42:25,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3849530.0, ans=0.125 2024-08-18 10:42:32,121 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 18 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-18 10:42:36,428 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3849630.0, ans=0.125 2024-08-18 10:42:43,849 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3849630.0, ans=0.125 2024-08-18 10:42:49,790 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 12850, loss[loss=0.09821, beats_loss=0.01068, ecapa_loss=0.0001743, whisper_loss=0.08579, over 20777.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01065, ecapa_loss=0.0001458, whisper_loss=0.0897, over 3839891.76 frames. ], batch size: 87, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:43:00,439 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.02 vs. limit=22.5 2024-08-18 10:43:01,785 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 19 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-18 10:43:08,628 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.23 vs. limit=22.5 2024-08-18 10:43:34,074 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3850030.0, ans=0.125 2024-08-18 10:43:41,727 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.714e+01 2.317e+01 2.573e+01 2.906e+01 6.087e+01, threshold=5.147e+01, percent-clipped=1.0 2024-08-18 10:43:46,485 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3850030.0, ans=0.125 2024-08-18 10:43:53,643 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-18 10:44:04,130 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 12900, loss[loss=0.1022, beats_loss=0.01037, ecapa_loss=0.000192, whisper_loss=0.08995, over 18194.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01056, ecapa_loss=0.0001471, whisper_loss=0.08953, over 3820211.98 frames. ], batch size: 77, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:44:15,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3850230.0, ans=0.0 2024-08-18 10:44:22,805 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-18 10:44:28,915 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-18 10:44:34,914 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3850430.0, ans=0.1 2024-08-18 10:44:43,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3850430.0, ans=0.125 2024-08-18 10:45:13,829 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-18 10:45:23,099 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 12950, loss[loss=0.1187, beats_loss=0.007782, ecapa_loss=0.0001547, whisper_loss=0.1094, over 20308.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01053, ecapa_loss=0.0001469, whisper_loss=0.08972, over 3843304.16 frames. ], batch size: 80, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:45:34,899 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 18 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-18 10:45:37,092 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.29 vs. limit=22.5 2024-08-18 10:45:47,148 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-18 10:45:47,457 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3850830.0, ans=0.2 2024-08-18 10:45:56,331 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 21 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-18 10:46:06,071 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 11 from Vox, 34 fro AS 2024-08-18 10:46:11,510 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-18 10:46:13,014 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.206e+01 2.482e+01 2.888e+01 5.100e+01, threshold=4.963e+01, percent-clipped=0.0 2024-08-18 10:46:22,089 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-18 10:46:33,038 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3851130.0, ans=0.125 2024-08-18 10:46:35,368 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 32 from LS+wenet, 15 from Vox, 48 fro AS 2024-08-18 10:46:36,867 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 13000, loss[loss=0.1112, beats_loss=0.01183, ecapa_loss=9.946e-05, whisper_loss=0.0984, over 25064.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0105, ecapa_loss=0.0001469, whisper_loss=0.09031, over 3856325.75 frames. ], batch size: 95, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:46:48,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3851230.0, ans=0.125 2024-08-18 10:46:59,073 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3851330.0, ans=0.0 2024-08-18 10:47:53,387 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 13050, loss[loss=0.08871, beats_loss=0.01272, ecapa_loss=0.0001078, whisper_loss=0.07491, over 16405.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01052, ecapa_loss=0.0001462, whisper_loss=0.0897, over 3810384.66 frames. ], batch size: 62, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:48:42,269 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-18 10:48:48,297 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.555e+01 2.212e+01 2.477e+01 2.743e+01 3.903e+01, threshold=4.954e+01, percent-clipped=0.0 2024-08-18 10:49:02,397 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 21 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-18 10:49:14,440 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3852230.0, ans=0.125 2024-08-18 10:49:15,860 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 13100, loss[loss=0.1042, beats_loss=0.007866, ecapa_loss=0.0001378, whisper_loss=0.095, over 16223.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01056, ecapa_loss=0.0001451, whisper_loss=0.08934, over 3846371.46 frames. ], batch size: 63, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:49:26,847 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 18 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-18 10:49:28,779 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 19 from LS+wenet, 14 from Vox, 49 fro AS 2024-08-18 10:49:34,067 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3852330.0, ans=0.125 2024-08-18 10:49:45,336 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 24 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-18 10:49:48,336 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.18 vs. limit=15.0 2024-08-18 10:49:53,691 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3852430.0, ans=0.125 2024-08-18 10:49:56,523 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 19 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-18 10:50:04,798 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-18 10:50:17,112 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 16 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-18 10:50:20,829 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3852630.0, ans=0.125 2024-08-18 10:50:30,869 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 13150, loss[loss=0.1173, beats_loss=0.01026, ecapa_loss=0.0001191, whisper_loss=0.1059, over 22394.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01054, ecapa_loss=0.0001456, whisper_loss=0.08953, over 3840452.54 frames. ], batch size: 87, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:50:40,771 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-18 10:50:45,889 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3852830.0, ans=0.0 2024-08-18 10:50:49,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3852830.0, ans=0.0 2024-08-18 10:50:55,286 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3852830.0, ans=0.125 2024-08-18 10:51:02,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3852930.0, ans=0.0 2024-08-18 10:51:18,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=3853030.0, ans=10.0 2024-08-18 10:51:19,073 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.841e+01 2.362e+01 2.528e+01 2.819e+01 1.566e+02, threshold=5.056e+01, percent-clipped=2.0 2024-08-18 10:51:39,696 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2024-08-18 10:51:41,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3853230.0, ans=0.1 2024-08-18 10:51:42,069 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 13200, loss[loss=0.08284, beats_loss=0.01141, ecapa_loss=0.0001753, whisper_loss=0.06967, over 15551.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01046, ecapa_loss=0.0001461, whisper_loss=0.0902, over 3815559.43 frames. ], batch size: 62, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:51:42,877 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.20 vs. limit=12.0 2024-08-18 10:51:48,906 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 21 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-18 10:51:51,137 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.646e+00 2024-08-18 10:51:55,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3853230.0, ans=0.0 2024-08-18 10:51:56,851 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-18 10:51:58,194 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-18 10:52:10,111 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 29 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-18 10:52:11,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3853430.0, ans=0.125 2024-08-18 10:52:17,533 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-18 10:52:37,303 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-18 10:52:40,552 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-18 10:52:58,378 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.810e-01 2024-08-18 10:52:59,116 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 13250, loss[loss=0.1261, beats_loss=0.009413, ecapa_loss=0.0001587, whisper_loss=0.1151, over 23383.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01042, ecapa_loss=0.0001467, whisper_loss=0.09026, over 3817766.07 frames. ], batch size: 94, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:53:01,901 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-18 10:53:10,890 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 19 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-18 10:53:18,777 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 21 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-18 10:53:22,354 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3853830.0, ans=0.125 2024-08-18 10:53:37,714 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3853930.0, ans=0.0 2024-08-18 10:53:47,027 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.315e+01 2.671e+01 3.107e+01 9.539e+01, threshold=5.342e+01, percent-clipped=1.0 2024-08-18 10:53:58,032 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3854130.0, ans=0.0 2024-08-18 10:54:09,452 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 13300, loss[loss=0.07956, beats_loss=0.01186, ecapa_loss=0.0001141, whisper_loss=0.06656, over 14126.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01044, ecapa_loss=0.0001451, whisper_loss=0.09076, over 3829577.58 frames. ], batch size: 53, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:54:15,441 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 16 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-18 10:54:19,631 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3854230.0, ans=0.125 2024-08-18 10:54:35,258 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3854330.0, ans=0.125 2024-08-18 10:54:41,904 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 16 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-18 10:55:02,606 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 10:55:04,420 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.56 vs. limit=10.0 2024-08-18 10:55:07,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3854630.0, ans=0.125 2024-08-18 10:55:19,838 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 13350, loss[loss=0.1137, beats_loss=0.01068, ecapa_loss=0.000133, whisper_loss=0.1017, over 22760.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01046, ecapa_loss=0.0001444, whisper_loss=0.09061, over 3835472.22 frames. ], batch size: 89, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:55:25,413 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-18 10:55:58,194 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 24 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-18 10:55:58,878 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-08-18 10:56:02,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3855030.0, ans=0.0 2024-08-18 10:56:04,780 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 22 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-18 10:56:05,992 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.282e+01 2.499e+01 2.710e+01 4.906e+01, threshold=4.999e+01, percent-clipped=0.0 2024-08-18 10:56:14,874 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-18 10:56:17,537 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-18 10:56:23,026 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3855130.0, ans=0.125 2024-08-18 10:56:28,240 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 13400, loss[loss=0.1017, beats_loss=0.008516, ecapa_loss=0.0001787, whisper_loss=0.09142, over 20907.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01045, ecapa_loss=0.0001455, whisper_loss=0.08975, over 3824839.79 frames. ], batch size: 88, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:56:34,994 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.99 vs. limit=15.0 2024-08-18 10:56:37,351 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 16 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-18 10:56:50,609 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 33 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-18 10:57:01,862 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-18 10:57:10,189 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3855530.0, ans=0.125 2024-08-18 10:57:23,222 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 31 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-18 10:57:36,972 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 13450, loss[loss=0.1124, beats_loss=0.008591, ecapa_loss=0.0001354, whisper_loss=0.1025, over 22890.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01051, ecapa_loss=0.0001448, whisper_loss=0.08959, over 3871253.40 frames. ], batch size: 88, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:58:09,527 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3855930.0, ans=0.2 2024-08-18 10:58:19,897 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3856030.0, ans=0.035 2024-08-18 10:58:20,735 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.386e+01 2.619e+01 2.833e+01 4.407e+01, threshold=5.238e+01, percent-clipped=0.0 2024-08-18 10:58:41,445 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 13500, loss[loss=0.08256, beats_loss=0.01184, ecapa_loss=0.0001609, whisper_loss=0.06912, over 16337.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01057, ecapa_loss=0.0001439, whisper_loss=0.08965, over 3878872.63 frames. ], batch size: 70, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:59:01,779 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 10:59:07,064 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3856430.0, ans=0.2 2024-08-18 10:59:10,363 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 21 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-18 10:59:14,777 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.62 vs. limit=15.0 2024-08-18 10:59:25,198 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 23 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-18 10:59:30,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3856530.0, ans=0.0 2024-08-18 10:59:35,714 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.50 vs. limit=15.0 2024-08-18 10:59:40,690 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 26 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-18 10:59:41,048 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3856630.0, ans=0.125 2024-08-18 10:59:44,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3856630.0, ans=0.1 2024-08-18 10:59:47,091 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 13550, loss[loss=0.08478, beats_loss=0.01113, ecapa_loss=0.0001309, whisper_loss=0.07235, over 16161.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01056, ecapa_loss=0.0001442, whisper_loss=0.08948, over 3872460.65 frames. ], batch size: 64, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:59:53,938 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3856730.0, ans=0.2 2024-08-18 11:00:00,223 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-18 11:00:11,616 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 14 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-18 11:00:14,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3856930.0, ans=0.0 2024-08-18 11:00:19,875 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3856930.0, ans=0.125 2024-08-18 11:00:19,967 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3856930.0, ans=0.125 2024-08-18 11:00:21,086 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3856930.0, ans=0.0 2024-08-18 11:00:27,397 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=5.051e+00 2024-08-18 11:00:30,652 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.267e+01 2.460e+01 2.738e+01 4.250e+01, threshold=4.920e+01, percent-clipped=0.0 2024-08-18 11:00:34,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3857030.0, ans=0.0 2024-08-18 11:00:51,477 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 13600, loss[loss=0.09554, beats_loss=0.01168, ecapa_loss=0.0001618, whisper_loss=0.08224, over 17587.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01052, ecapa_loss=0.0001448, whisper_loss=0.08977, over 3863978.62 frames. ], batch size: 72, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:01:00,158 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-18 11:01:05,924 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-18 11:01:07,300 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-18 11:01:11,104 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3857330.0, ans=0.0 2024-08-18 11:01:13,217 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 14 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-18 11:01:14,590 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 19 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-18 11:01:19,991 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 17 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-18 11:01:21,415 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3857430.0, ans=0.1 2024-08-18 11:01:26,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3857430.0, ans=0.125 2024-08-18 11:01:32,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3857530.0, ans=0.125 2024-08-18 11:01:51,078 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 22 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-18 11:01:56,767 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 11:01:57,872 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 13650, loss[loss=0.1093, beats_loss=0.01046, ecapa_loss=9.803e-05, whisper_loss=0.09782, over 15336.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01063, ecapa_loss=0.0001444, whisper_loss=0.08997, over 3868080.89 frames. ], batch size: 57, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:02:23,836 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-18 11:02:29,520 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 19 from LS+wenet, 18 from Vox, 54 fro AS 2024-08-18 11:02:35,661 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-18 11:02:37,109 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3858030.0, ans=0.125 2024-08-18 11:02:43,486 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.717e+01 2.270e+01 2.557e+01 2.779e+01 4.099e+01, threshold=5.114e+01, percent-clipped=0.0 2024-08-18 11:02:47,851 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.08 vs. limit=15.0 2024-08-18 11:02:55,361 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3858130.0, ans=0.0 2024-08-18 11:03:05,607 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 13700, loss[loss=0.1112, beats_loss=0.01035, ecapa_loss=0.0001391, whisper_loss=0.0995, over 22648.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01066, ecapa_loss=0.000145, whisper_loss=0.09027, over 3878194.30 frames. ], batch size: 92, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:03:06,055 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3858230.0, ans=0.2 2024-08-18 11:03:30,120 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 25 from LS+wenet, 11 from Vox, 22 fro AS 2024-08-18 11:03:49,382 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-18 11:03:54,860 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-18 11:03:59,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=3858530.0, ans=15.0 2024-08-18 11:04:13,195 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.039e+01 2024-08-18 11:04:15,452 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 13750, loss[loss=0.1065, beats_loss=0.009606, ecapa_loss=0.0001764, whisper_loss=0.09517, over 21012.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01069, ecapa_loss=0.0001462, whisper_loss=0.08997, over 3863517.66 frames. ], batch size: 89, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:04:22,405 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3858730.0, ans=0.125 2024-08-18 11:04:24,983 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 27 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-18 11:04:27,994 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 27 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-18 11:04:35,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3858830.0, ans=0.125 2024-08-18 11:04:43,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3858930.0, ans=0.125 2024-08-18 11:04:54,432 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2024-08-18 11:05:01,845 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.684e+01 2.313e+01 2.596e+01 3.019e+01 1.808e+02, threshold=5.192e+01, percent-clipped=2.0 2024-08-18 11:05:02,300 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3859030.0, ans=0.125 2024-08-18 11:05:04,773 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 23 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-18 11:05:08,762 WARNING [optim.py:496] (2/4) Scaling gradients by 0.014583197422325611, model_norm_threshold=51.91682815551758 2024-08-18 11:05:08,931 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.15, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.923e+06, grad_sumsq=1.923e+06, orig_rms_sq=1.000e+00 2024-08-18 11:05:09,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3859130.0, ans=0.0 2024-08-18 11:05:19,108 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 23 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-18 11:05:22,641 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3859130.0, ans=0.125 2024-08-18 11:05:26,122 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3859230.0, ans=0.125 2024-08-18 11:05:26,901 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 13800, loss[loss=0.08743, beats_loss=0.01182, ecapa_loss=0.0001381, whisper_loss=0.07423, over 21823.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01067, ecapa_loss=0.000146, whisper_loss=0.0899, over 3877167.04 frames. ], batch size: 91, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:05:27,631 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3859230.0, ans=6.0 2024-08-18 11:05:47,430 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3859330.0, ans=0.0 2024-08-18 11:05:48,306 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-18 11:05:52,199 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 30 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-18 11:05:55,074 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3859430.0, ans=0.0 2024-08-18 11:06:09,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3859530.0, ans=0.0 2024-08-18 11:06:14,071 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3859530.0, ans=0.2 2024-08-18 11:06:18,696 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-18 11:06:22,583 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=3859630.0, ans=10.0 2024-08-18 11:06:39,821 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 13850, loss[loss=0.09704, beats_loss=0.009507, ecapa_loss=0.0001399, whisper_loss=0.08613, over 17331.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01056, ecapa_loss=0.0001461, whisper_loss=0.0905, over 3898860.11 frames. ], batch size: 70, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:07:02,719 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 19 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-18 11:07:03,011 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3859830.0, ans=0.1 2024-08-18 11:07:16,232 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-18 11:07:18,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3859930.0, ans=0.125 2024-08-18 11:07:18,852 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3859930.0, ans=0.125 2024-08-18 11:07:28,628 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.23 vs. limit=22.5 2024-08-18 11:07:29,239 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-18 11:07:39,494 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.736e+01 2.397e+01 2.640e+01 3.060e+01 3.560e+03, threshold=5.281e+01, percent-clipped=3.0 2024-08-18 11:08:10,227 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 13900, loss[loss=0.1095, beats_loss=0.0112, ecapa_loss=0.0001502, whisper_loss=0.09682, over 19984.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01061, ecapa_loss=0.0001447, whisper_loss=0.09026, over 3893987.58 frames. ], batch size: 80, lr: 2.34e-03, grad_scale: 1.152921504606847e+18 2024-08-18 11:08:17,026 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3860230.0, ans=0.0 2024-08-18 11:08:17,981 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 22 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-18 11:08:22,037 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-18 11:08:27,338 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-18 11:08:46,838 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-18 11:09:04,417 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.65 vs. limit=15.0 2024-08-18 11:09:19,399 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 18 from LS+wenet, 19 from Vox, 17 fro AS 2024-08-18 11:09:40,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3860630.0, ans=0.125 2024-08-18 11:09:42,524 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.38 vs. limit=15.0 2024-08-18 11:09:44,951 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 11:09:45,452 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.60 vs. limit=12.0 2024-08-18 11:09:48,577 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 32 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-18 11:09:51,897 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 13950, loss[loss=0.1345, beats_loss=0.008647, ecapa_loss=0.0001603, whisper_loss=0.1243, over 18883.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01058, ecapa_loss=0.0001448, whisper_loss=0.09042, over 3887510.03 frames. ], batch size: 76, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:10:32,023 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 21 from Vox, 17 fro AS 2024-08-18 11:10:32,335 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3860830.0, ans=0.2 2024-08-18 11:10:42,913 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 22 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-18 11:10:46,689 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3860930.0, ans=0.125 2024-08-18 11:10:54,326 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3860930.0, ans=0.125 2024-08-18 11:11:08,965 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.328e+01 2.570e+01 2.839e+01 4.020e+01, threshold=5.139e+01, percent-clipped=0.0 2024-08-18 11:11:15,123 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 29 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-18 11:11:32,607 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.22 vs. limit=15.0 2024-08-18 11:11:41,670 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 14000, loss[loss=0.1073, beats_loss=0.01075, ecapa_loss=0.00016, whisper_loss=0.09495, over 17870.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01053, ecapa_loss=0.0001458, whisper_loss=0.09034, over 3862480.14 frames. ], batch size: 73, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:11:51,470 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3861230.0, ans=0.2 2024-08-18 11:11:53,465 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.23 vs. limit=12.0 2024-08-18 11:12:09,800 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-18 11:12:30,264 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-18 11:12:31,920 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 24 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-18 11:12:37,501 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=18.15 vs. limit=15.0 2024-08-18 11:12:38,253 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 17 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-18 11:13:01,323 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3861530.0, ans=0.2 2024-08-18 11:13:02,775 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3861530.0, ans=0.2 2024-08-18 11:13:06,133 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-18 11:13:28,667 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 14050, loss[loss=0.1188, beats_loss=0.008435, ecapa_loss=0.0001385, whisper_loss=0.109, over 18903.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01039, ecapa_loss=0.0001463, whisper_loss=0.09128, over 3840878.20 frames. ], batch size: 71, lr: 2.33e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:13:31,192 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 11 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-18 11:13:57,649 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-18 11:13:59,565 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3861830.0, ans=0.0 2024-08-18 11:14:05,857 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 20 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-18 11:14:15,548 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3861930.0, ans=0.0 2024-08-18 11:14:23,306 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3862030.0, ans=0.0 2024-08-18 11:14:25,805 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.314e+01 2.539e+01 2.739e+01 6.784e+01, threshold=5.079e+01, percent-clipped=1.0 2024-08-18 11:14:26,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3862030.0, ans=0.1 2024-08-18 11:14:31,882 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.64 vs. limit=15.0 2024-08-18 11:14:37,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3862130.0, ans=0.1 2024-08-18 11:14:38,288 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-18 11:14:45,296 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3862230.0, ans=0.0 2024-08-18 11:14:45,968 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 14100, loss[loss=0.1163, beats_loss=0.01062, ecapa_loss=0.0001374, whisper_loss=0.1043, over 22646.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01044, ecapa_loss=0.0001455, whisper_loss=0.0913, over 3841036.70 frames. ], batch size: 90, lr: 2.33e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:14:57,491 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 12 from Vox, 47 fro AS 2024-08-18 11:15:31,466 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-18 11:15:39,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3862530.0, ans=10.0 2024-08-18 11:15:47,393 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.39 vs. limit=15.0 2024-08-18 11:15:51,614 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 33 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-18 11:15:56,426 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 14150, loss[loss=0.1054, beats_loss=0.01018, ecapa_loss=0.0001676, whisper_loss=0.09352, over 22542.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01043, ecapa_loss=0.0001452, whisper_loss=0.09159, over 3821713.46 frames. ], batch size: 92, lr: 2.33e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:16:02,585 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-18 11:16:05,303 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 22 from LS+wenet, 24 from Vox, 48 fro AS 2024-08-18 11:16:15,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3862830.0, ans=0.1 2024-08-18 11:16:16,724 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3862830.0, ans=0.125 2024-08-18 11:16:40,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3863030.0, ans=0.2 2024-08-18 11:16:47,830 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.320e+01 2.571e+01 2.843e+01 2.344e+02, threshold=5.141e+01, percent-clipped=2.0 2024-08-18 11:16:49,431 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 24 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-18 11:16:53,204 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3863030.0, ans=0.125 2024-08-18 11:17:00,609 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3863130.0, ans=0.125 2024-08-18 11:17:05,543 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 27 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-18 11:17:10,142 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-18 11:17:11,333 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 14200, loss[loss=0.09747, beats_loss=0.0106, ecapa_loss=0.0001386, whisper_loss=0.08548, over 16020.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01045, ecapa_loss=0.0001451, whisper_loss=0.0913, over 3861991.42 frames. ], batch size: 64, lr: 2.33e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:17:15,228 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3863230.0, ans=0.125 2024-08-18 11:17:28,654 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3863330.0, ans=0.125 2024-08-18 11:18:01,713 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 25 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-18 11:18:27,942 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 14250, loss[loss=0.1107, beats_loss=0.009548, ecapa_loss=0.0001391, whisper_loss=0.09979, over 15371.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0104, ecapa_loss=0.0001453, whisper_loss=0.09163, over 3860200.94 frames. ], batch size: 58, lr: 2.33e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:18:34,163 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.33 vs. limit=22.5 2024-08-18 11:18:47,228 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3863830.0, ans=0.0 2024-08-18 11:19:01,602 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 27 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-18 11:19:07,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3863930.0, ans=0.125 2024-08-18 11:19:07,668 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3863930.0, ans=0.125 2024-08-18 11:19:23,872 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.369e+01 2.559e+01 2.841e+01 3.701e+01, threshold=5.118e+01, percent-clipped=0.0 2024-08-18 11:19:31,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3864130.0, ans=0.1 2024-08-18 11:19:31,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3864130.0, ans=0.0 2024-08-18 11:19:43,532 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3864130.0, ans=0.125 2024-08-18 11:19:44,824 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3864230.0, ans=0.2 2024-08-18 11:19:45,671 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 14300, loss[loss=0.1065, beats_loss=0.01172, ecapa_loss=0.000112, whisper_loss=0.09365, over 23130.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01043, ecapa_loss=0.0001448, whisper_loss=0.0911, over 3848794.36 frames. ], batch size: 90, lr: 2.33e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:20:01,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3864330.0, ans=0.125 2024-08-18 11:20:15,439 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-18 11:20:19,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3864430.0, ans=0.125 2024-08-18 11:20:25,107 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.04 vs. limit=22.5 2024-08-18 11:20:41,450 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.78 vs. limit=15.0 2024-08-18 11:20:46,831 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.82 vs. limit=22.5 2024-08-18 11:20:48,143 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3864530.0, ans=0.1 2024-08-18 11:20:53,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3864630.0, ans=0.1 2024-08-18 11:20:57,489 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3864630.0, ans=0.2 2024-08-18 11:20:58,389 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-18 11:21:04,467 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 14350, loss[loss=0.1204, beats_loss=0.01016, ecapa_loss=0.0001189, whisper_loss=0.109, over 23755.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01046, ecapa_loss=0.0001443, whisper_loss=0.09075, over 3854071.65 frames. ], batch size: 90, lr: 2.33e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:21:07,797 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 24 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-18 11:21:09,804 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.95 vs. limit=15.0 2024-08-18 11:21:22,419 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 24 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-18 11:21:44,839 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3864930.0, ans=0.125 2024-08-18 11:21:47,125 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.64 vs. limit=15.0 2024-08-18 11:21:53,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3865030.0, ans=0.125 2024-08-18 11:21:57,160 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.333e+01 2.599e+01 2.804e+01 6.490e+01, threshold=5.198e+01, percent-clipped=1.0 2024-08-18 11:22:09,052 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.62 vs. limit=12.0 2024-08-18 11:22:18,691 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 14400, loss[loss=0.09101, beats_loss=0.01287, ecapa_loss=0.0001228, whisper_loss=0.07692, over 18236.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01046, ecapa_loss=0.0001436, whisper_loss=0.09107, over 3886215.00 frames. ], batch size: 72, lr: 2.33e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:22:24,606 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.64 vs. limit=15.0 2024-08-18 11:22:52,671 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3865430.0, ans=0.2 2024-08-18 11:22:55,050 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 17 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-18 11:23:04,014 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-18 11:23:04,380 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.95 vs. limit=15.0 2024-08-18 11:23:05,651 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-18 11:23:10,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3865530.0, ans=0.125 2024-08-18 11:23:31,520 INFO [train_multi_KD3.py:1116] (2/4) Epoch 26, batch 14450, loss[loss=0.08048, beats_loss=0.01076, ecapa_loss=0.000137, whisper_loss=0.06835, over 13896.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01057, ecapa_loss=0.000144, whisper_loss=0.08966, over 3874744.74 frames. ], batch size: 55, lr: 2.33e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:23:52,196 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3865830.0, ans=0.2 2024-08-18 11:24:09,412 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3865930.0, ans=0.2 2024-08-18 11:24:13,286 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-18 11:24:17,339 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-18 11:24:19,453 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.265e+01 2.482e+01 2.875e+01 2.050e+02, threshold=4.963e+01, percent-clipped=1.0 2024-08-18 11:24:20,792 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 21 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-18 11:24:26,130 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.58 vs. limit=15.0 2024-08-18 11:25:14,302 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 0, loss[loss=0.1043, beats_loss=0.008852, ecapa_loss=0.0001817, whisper_loss=0.09368, over 19604.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.008852, ecapa_loss=0.0001817, whisper_loss=0.09368, over 19604.00 frames. ], batch size: 84, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:25:14,303 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-18 11:25:51,135 INFO [train_multi_KD3.py:1149] (2/4) Epoch 27, validation on ASR_libri: loss=0.2537, beats_loss=0, ecapa_loss=0.0005188, whisper_loss=0.2485, over 922467.00 frames. 2024-08-18 11:26:06,008 INFO [train_multi_KD3.py:1149] (2/4) Epoch 27, validation on SV_voxceleb1: loss=0.004147, beats_loss=0, ecapa_loss=0.0004147, whisper_loss=0, over 939242.00 frames. 2024-08-18 11:27:48,111 INFO [train_multi_KD3.py:1149] (2/4) Epoch 27, validation on AT_audioset: loss=0.0231, beats_loss=0.0231, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 11:27:48,114 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31584MB 2024-08-18 11:28:12,879 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3866290.0, ans=0.125 2024-08-18 11:28:15,300 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 31 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-18 11:28:36,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3866290.0, ans=0.1 2024-08-18 11:28:40,844 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-18 11:28:50,513 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.70 vs. limit=15.0 2024-08-18 11:29:03,899 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.59 vs. limit=12.0 2024-08-18 11:29:19,613 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 9 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-18 11:29:28,051 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3866590.0, ans=0.125 2024-08-18 11:29:46,762 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 50, loss[loss=0.09951, beats_loss=0.009843, ecapa_loss=0.0001269, whisper_loss=0.08839, over 23554.00 frames. ], tot_loss[loss=0.09797, beats_loss=0.009703, ecapa_loss=0.0001438, whisper_loss=0.08683, over 895993.77 frames. ], batch size: 92, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:30:08,229 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 29 from LS+wenet, 10 from Vox, 17 fro AS 2024-08-18 11:30:14,829 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 22 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-18 11:30:17,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3866790.0, ans=0.125 2024-08-18 11:30:41,706 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.32 vs. limit=15.0 2024-08-18 11:31:05,451 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3866990.0, ans=0.0 2024-08-18 11:31:11,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3866990.0, ans=0.0 2024-08-18 11:31:11,997 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.004e+01 2.562e+01 2.806e+01 3.203e+01 5.774e+01, threshold=5.612e+01, percent-clipped=2.0 2024-08-18 11:31:18,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3867090.0, ans=0.125 2024-08-18 11:31:19,964 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3867090.0, ans=0.0 2024-08-18 11:31:35,758 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 100, loss[loss=0.09708, beats_loss=0.009288, ecapa_loss=0.0001331, whisper_loss=0.08646, over 23258.00 frames. ], tot_loss[loss=0.09933, beats_loss=0.009522, ecapa_loss=0.0001445, whisper_loss=0.08836, over 1539783.79 frames. ], batch size: 91, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:31:45,020 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-18 11:31:45,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3867190.0, ans=0.0 2024-08-18 11:31:54,807 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-18 11:32:04,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3867290.0, ans=0.0 2024-08-18 11:32:17,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3867390.0, ans=0.125 2024-08-18 11:32:19,965 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.75 vs. limit=6.0 2024-08-18 11:32:38,985 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3867490.0, ans=0.125 2024-08-18 11:32:55,294 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.19 vs. limit=6.0 2024-08-18 11:33:06,982 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-18 11:33:16,690 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 150, loss[loss=0.1017, beats_loss=0.01001, ecapa_loss=0.00014, whisper_loss=0.09034, over 16950.00 frames. ], tot_loss[loss=0.09889, beats_loss=0.009622, ecapa_loss=0.0001444, whisper_loss=0.08783, over 2067616.12 frames. ], batch size: 66, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:33:35,103 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 31 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-18 11:33:35,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3867790.0, ans=0.125 2024-08-18 11:33:42,324 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 26 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-18 11:34:05,956 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.86 vs. limit=15.0 2024-08-18 11:34:16,876 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.470e+01 2.705e+01 3.027e+01 2.809e+02, threshold=5.410e+01, percent-clipped=1.0 2024-08-18 11:34:17,069 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 16 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-18 11:34:17,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3868090.0, ans=0.125 2024-08-18 11:34:26,339 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 21 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-18 11:34:33,643 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 200, loss[loss=0.1111, beats_loss=0.008927, ecapa_loss=0.000157, whisper_loss=0.1006, over 18937.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.009563, ecapa_loss=0.0001458, whisper_loss=0.09084, over 2450494.79 frames. ], batch size: 76, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:34:34,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3868190.0, ans=0.1 2024-08-18 11:34:41,548 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3868190.0, ans=0.0 2024-08-18 11:34:48,919 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.81 vs. limit=22.5 2024-08-18 11:34:49,502 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 19 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-18 11:34:49,766 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3868290.0, ans=0.125 2024-08-18 11:35:19,059 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3868490.0, ans=0.025 2024-08-18 11:35:27,230 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 26 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-18 11:35:43,036 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 25 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-18 11:35:43,965 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 250, loss[loss=0.09915, beats_loss=0.01103, ecapa_loss=0.0001009, whisper_loss=0.08711, over 20679.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.009918, ecapa_loss=0.0001441, whisper_loss=0.08972, over 2780947.01 frames. ], batch size: 77, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:36:03,974 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.77 vs. limit=22.5 2024-08-18 11:36:04,499 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 27 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-18 11:36:08,734 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3868790.0, ans=0.0 2024-08-18 11:36:17,708 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 12 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-18 11:36:19,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3868890.0, ans=0.0 2024-08-18 11:36:23,169 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3868990.0, ans=0.125 2024-08-18 11:36:37,658 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.770e+01 2.252e+01 2.480e+01 2.798e+01 3.781e+01, threshold=4.961e+01, percent-clipped=0.0 2024-08-18 11:36:41,085 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.12 vs. limit=15.0 2024-08-18 11:36:46,012 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.05 vs. limit=15.0 2024-08-18 11:36:52,191 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 300, loss[loss=0.09598, beats_loss=0.01109, ecapa_loss=0.0001481, whisper_loss=0.08341, over 16400.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.0101, ecapa_loss=0.0001447, whisper_loss=0.08935, over 2980422.70 frames. ], batch size: 64, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:36:54,153 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 14 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-18 11:37:03,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3869190.0, ans=0.125 2024-08-18 11:37:06,166 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=15.0 2024-08-18 11:37:24,677 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 25 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-18 11:37:27,485 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3869390.0, ans=0.2 2024-08-18 11:37:33,196 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3869490.0, ans=0.0 2024-08-18 11:37:42,121 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3869490.0, ans=0.04949747468305833 2024-08-18 11:37:45,913 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3869590.0, ans=0.125 2024-08-18 11:37:53,985 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-18 11:37:54,504 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.23 vs. limit=10.0 2024-08-18 11:38:00,738 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 350, loss[loss=0.08051, beats_loss=0.01298, ecapa_loss=0.0001333, whisper_loss=0.0662, over 18867.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01026, ecapa_loss=0.0001429, whisper_loss=0.08868, over 3158051.42 frames. ], batch size: 78, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:38:06,821 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-18 11:38:11,328 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3869690.0, ans=0.035 2024-08-18 11:38:11,380 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3869690.0, ans=0.125 2024-08-18 11:38:16,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3869790.0, ans=0.0 2024-08-18 11:38:19,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3869790.0, ans=0.0 2024-08-18 11:38:19,622 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-18 11:38:25,914 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-18 11:38:43,541 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3869990.0, ans=0.2 2024-08-18 11:38:48,536 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-18 11:38:51,055 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3869990.0, ans=0.0 2024-08-18 11:38:52,304 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3869990.0, ans=0.125 2024-08-18 11:38:53,105 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.198e+01 2.411e+01 2.717e+01 4.096e+01, threshold=4.822e+01, percent-clipped=0.0 2024-08-18 11:38:57,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3870090.0, ans=0.0 2024-08-18 11:39:07,802 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 400, loss[loss=0.1006, beats_loss=0.007178, ecapa_loss=0.000174, whisper_loss=0.09167, over 20004.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01023, ecapa_loss=0.0001432, whisper_loss=0.08912, over 3306656.64 frames. ], batch size: 74, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:39:11,085 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.92 vs. limit=15.0 2024-08-18 11:39:13,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3870190.0, ans=0.0 2024-08-18 11:39:15,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3870190.0, ans=0.0 2024-08-18 11:39:33,917 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-18 11:39:37,408 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 22 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-18 11:39:55,679 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3870490.0, ans=0.5 2024-08-18 11:40:05,375 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3870590.0, ans=0.125 2024-08-18 11:40:12,135 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3870590.0, ans=0.2 2024-08-18 11:40:16,001 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 450, loss[loss=0.119, beats_loss=0.009882, ecapa_loss=0.0001266, whisper_loss=0.1078, over 23451.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01022, ecapa_loss=0.0001431, whisper_loss=0.09011, over 3474244.48 frames. ], batch size: 94, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:40:24,832 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 21 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-18 11:40:46,731 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.90 vs. limit=5.0 2024-08-18 11:40:55,712 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3870990.0, ans=0.125 2024-08-18 11:41:08,579 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.334e+01 2.651e+01 3.139e+01 3.582e+02, threshold=5.301e+01, percent-clipped=3.0 2024-08-18 11:41:12,996 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 25 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-18 11:41:22,656 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.40 vs. limit=15.0 2024-08-18 11:41:22,671 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3871190.0, ans=6.0 2024-08-18 11:41:23,192 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 500, loss[loss=0.1015, beats_loss=0.01015, ecapa_loss=0.000174, whisper_loss=0.08965, over 22240.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01029, ecapa_loss=0.0001434, whisper_loss=0.08943, over 3574866.43 frames. ], batch size: 94, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:41:31,902 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 23 from LS+wenet, 33 from Vox, 39 fro AS 2024-08-18 11:41:39,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3871290.0, ans=0.125 2024-08-18 11:41:43,560 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-18 11:41:53,253 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 28 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-18 11:41:54,668 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 12 from LS+wenet, 10 from Vox, 34 fro AS 2024-08-18 11:41:54,970 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3871390.0, ans=0.125 2024-08-18 11:42:00,359 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3871390.0, ans=0.0 2024-08-18 11:42:05,503 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 31 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-18 11:42:25,385 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3871590.0, ans=0.2 2024-08-18 11:42:26,982 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.57 vs. limit=12.0 2024-08-18 11:42:28,770 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 550, loss[loss=0.1201, beats_loss=0.00635, ecapa_loss=0.0001713, whisper_loss=0.112, over 18821.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01026, ecapa_loss=0.0001443, whisper_loss=0.08967, over 3617337.77 frames. ], batch size: 73, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:42:39,417 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 13 from LS+wenet, 30 from Vox, 17 fro AS 2024-08-18 11:42:39,790 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3871690.0, ans=0.0 2024-08-18 11:42:46,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3871790.0, ans=0.0 2024-08-18 11:42:50,090 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.17 vs. limit=22.5 2024-08-18 11:43:20,155 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.708e+01 2.358e+01 2.658e+01 2.869e+01 1.652e+02, threshold=5.315e+01, percent-clipped=4.0 2024-08-18 11:43:23,571 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.30 vs. limit=15.0 2024-08-18 11:43:23,695 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.60 vs. limit=15.0 2024-08-18 11:43:34,855 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 600, loss[loss=0.09965, beats_loss=0.01126, ecapa_loss=0.000105, whisper_loss=0.08735, over 14815.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01029, ecapa_loss=0.000144, whisper_loss=0.08957, over 3692336.30 frames. ], batch size: 55, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:43:50,033 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.28 vs. limit=15.0 2024-08-18 11:43:52,293 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-18 11:44:01,690 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 21 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-18 11:44:06,059 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3872390.0, ans=0.05 2024-08-18 11:44:08,669 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3872390.0, ans=0.2 2024-08-18 11:44:21,812 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.52 vs. limit=15.0 2024-08-18 11:44:41,745 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 20 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-18 11:44:43,873 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 650, loss[loss=0.09078, beats_loss=0.01073, ecapa_loss=0.0001551, whisper_loss=0.07849, over 22738.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0103, ecapa_loss=0.0001442, whisper_loss=0.08963, over 3750219.49 frames. ], batch size: 93, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:44:53,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3872690.0, ans=0.125 2024-08-18 11:45:08,571 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-18 11:45:10,463 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.93 vs. limit=22.5 2024-08-18 11:45:37,155 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.251e+01 2.525e+01 2.780e+01 3.539e+01, threshold=5.050e+01, percent-clipped=0.0 2024-08-18 11:45:51,733 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 700, loss[loss=0.1218, beats_loss=0.01109, ecapa_loss=0.0001175, whisper_loss=0.1096, over 23688.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01039, ecapa_loss=0.0001448, whisper_loss=0.0889, over 3771532.97 frames. ], batch size: 90, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:45:55,368 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.19 vs. limit=22.5 2024-08-18 11:45:55,692 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 16 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-18 11:46:14,462 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3873290.0, ans=0.125 2024-08-18 11:46:14,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3873290.0, ans=0.2 2024-08-18 11:46:25,331 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3873390.0, ans=0.0 2024-08-18 11:46:35,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3873490.0, ans=0.2 2024-08-18 11:46:45,459 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2024-08-18 11:46:50,678 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 24 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-18 11:46:59,773 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 750, loss[loss=0.08536, beats_loss=0.01231, ecapa_loss=0.0001311, whisper_loss=0.07174, over 22478.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.0104, ecapa_loss=0.0001442, whisper_loss=0.08891, over 3786290.00 frames. ], batch size: 92, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:47:23,145 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3873790.0, ans=0.125 2024-08-18 11:47:43,196 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-18 11:47:43,751 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.29 vs. limit=12.0 2024-08-18 11:47:45,169 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.11 vs. limit=22.5 2024-08-18 11:47:49,795 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.969e+01 2024-08-18 11:47:52,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3873990.0, ans=0.0 2024-08-18 11:47:53,148 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.637e+01 2.240e+01 2.488e+01 2.790e+01 6.250e+01, threshold=4.975e+01, percent-clipped=1.0 2024-08-18 11:48:07,678 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 800, loss[loss=0.1241, beats_loss=0.008048, ecapa_loss=0.0001307, whisper_loss=0.1147, over 17872.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.0104, ecapa_loss=0.0001434, whisper_loss=0.08901, over 3792048.73 frames. ], batch size: 64, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:48:28,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3874290.0, ans=0.0 2024-08-18 11:48:32,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3874290.0, ans=0.5 2024-08-18 11:48:32,740 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.77 vs. limit=15.0 2024-08-18 11:48:39,543 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3874390.0, ans=0.125 2024-08-18 11:48:42,439 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3874390.0, ans=0.1 2024-08-18 11:48:45,851 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.96 vs. limit=6.0 2024-08-18 11:49:00,310 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3874490.0, ans=0.125 2024-08-18 11:49:00,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3874490.0, ans=0.0 2024-08-18 11:49:02,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3874590.0, ans=0.125 2024-08-18 11:49:05,590 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3874590.0, ans=0.125 2024-08-18 11:49:16,725 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.54 vs. limit=22.5 2024-08-18 11:49:17,202 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 850, loss[loss=0.1033, beats_loss=0.00912, ecapa_loss=0.000147, whisper_loss=0.09268, over 18941.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01033, ecapa_loss=0.0001436, whisper_loss=0.08913, over 3820244.04 frames. ], batch size: 75, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:49:19,051 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-18 11:49:28,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3874690.0, ans=0.125 2024-08-18 11:49:28,666 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3874690.0, ans=0.0 2024-08-18 11:49:29,960 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3874790.0, ans=0.125 2024-08-18 11:49:48,687 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3874890.0, ans=0.0 2024-08-18 11:49:54,321 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3874890.0, ans=0.2 2024-08-18 11:49:57,656 WARNING [optim.py:496] (2/4) Scaling gradients by 0.08740431070327759, model_norm_threshold=49.75291061401367 2024-08-18 11:49:57,826 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.3.norm.log_scale with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.157e+04, grad_sumsq=5.157e+04, orig_rms_sq=1.000e+00 2024-08-18 11:49:59,611 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=3874990.0, ans=10.0 2024-08-18 11:49:59,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3874990.0, ans=0.1 2024-08-18 11:50:08,077 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.32 vs. limit=22.5 2024-08-18 11:50:08,841 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 18 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-18 11:50:10,973 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.49 vs. limit=15.0 2024-08-18 11:50:11,653 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.772e+01 2.302e+01 2.593e+01 2.885e+01 5.692e+02, threshold=5.187e+01, percent-clipped=2.0 2024-08-18 11:50:17,451 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3875090.0, ans=0.125 2024-08-18 11:50:18,869 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3875090.0, ans=0.125 2024-08-18 11:50:27,168 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 900, loss[loss=0.1045, beats_loss=0.009001, ecapa_loss=0.0001555, whisper_loss=0.09392, over 18401.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01047, ecapa_loss=0.0001426, whisper_loss=0.08914, over 3825726.27 frames. ], batch size: 73, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:50:36,057 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3875190.0, ans=0.1 2024-08-18 11:50:54,432 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3875390.0, ans=0.1 2024-08-18 11:50:55,594 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 21 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-18 11:51:02,846 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3875390.0, ans=0.0 2024-08-18 11:51:15,955 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 17 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-18 11:51:30,467 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-18 11:51:34,817 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 11:51:36,853 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 950, loss[loss=0.08827, beats_loss=0.01028, ecapa_loss=0.0001332, whisper_loss=0.07666, over 16691.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01045, ecapa_loss=0.000143, whisper_loss=0.08968, over 3847036.97 frames. ], batch size: 67, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:51:50,173 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 24 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-18 11:51:50,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3875790.0, ans=0.2 2024-08-18 11:51:59,319 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3875790.0, ans=0.0 2024-08-18 11:52:13,296 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3875890.0, ans=0.2 2024-08-18 11:52:25,543 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3875990.0, ans=0.2 2024-08-18 11:52:25,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3875990.0, ans=0.0 2024-08-18 11:52:26,244 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.54 vs. limit=15.0 2024-08-18 11:52:30,842 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.712e+01 2.296e+01 2.523e+01 2.746e+01 1.713e+02, threshold=5.046e+01, percent-clipped=1.0 2024-08-18 11:52:42,801 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 23 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-18 11:52:43,504 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.11 vs. limit=12.0 2024-08-18 11:52:46,854 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 1000, loss[loss=0.1183, beats_loss=0.011, ecapa_loss=0.0001269, whisper_loss=0.1061, over 20266.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01047, ecapa_loss=0.0001425, whisper_loss=0.08901, over 3838596.34 frames. ], batch size: 77, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:53:03,065 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 24 from LS+wenet, 32 from Vox, 28 fro AS 2024-08-18 11:53:09,951 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 21 from LS+wenet, 18 from Vox, 16 fro AS 2024-08-18 11:53:25,081 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-18 11:53:31,687 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 17 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-18 11:53:39,085 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 23 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-18 11:53:51,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3876590.0, ans=0.2 2024-08-18 11:53:51,504 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3876590.0, ans=0.125 2024-08-18 11:53:57,822 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 1050, loss[loss=0.1063, beats_loss=0.01205, ecapa_loss=0.0001312, whisper_loss=0.09297, over 21783.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01047, ecapa_loss=0.0001426, whisper_loss=0.08868, over 3823112.67 frames. ], batch size: 84, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:54:20,310 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3876790.0, ans=0.1 2024-08-18 11:54:38,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3876890.0, ans=0.07 2024-08-18 11:54:53,023 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 30 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-18 11:54:53,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3876990.0, ans=0.125 2024-08-18 11:54:54,096 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.342e+01 2.695e+01 2.899e+01 4.255e+01, threshold=5.389e+01, percent-clipped=0.0 2024-08-18 11:55:00,240 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3877090.0, ans=0.125 2024-08-18 11:55:05,928 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-18 11:55:07,136 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-18 11:55:10,015 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 1100, loss[loss=0.09588, beats_loss=0.008054, ecapa_loss=0.0001478, whisper_loss=0.08634, over 15299.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01038, ecapa_loss=0.0001424, whisper_loss=0.08895, over 3805635.48 frames. ], batch size: 60, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:55:13,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3877190.0, ans=0.1 2024-08-18 11:55:16,464 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3877190.0, ans=0.1 2024-08-18 11:55:52,201 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3877490.0, ans=0.04949747468305833 2024-08-18 11:55:56,204 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.95 vs. limit=15.0 2024-08-18 11:56:23,641 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 1150, loss[loss=0.09471, beats_loss=0.0104, ecapa_loss=0.0001306, whisper_loss=0.08301, over 21418.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01041, ecapa_loss=0.0001422, whisper_loss=0.08919, over 3820984.25 frames. ], batch size: 85, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:56:24,302 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3877690.0, ans=0.2 2024-08-18 11:56:36,672 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3877790.0, ans=0.07 2024-08-18 11:56:47,824 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3877790.0, ans=0.0 2024-08-18 11:57:01,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3877890.0, ans=0.07 2024-08-18 11:57:04,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3877990.0, ans=0.125 2024-08-18 11:57:04,896 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3877990.0, ans=0.1 2024-08-18 11:57:15,045 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 18 from LS+wenet, 24 from Vox, 53 fro AS 2024-08-18 11:57:19,250 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.346e+01 2.566e+01 2.901e+01 4.362e+01, threshold=5.132e+01, percent-clipped=0.0 2024-08-18 11:57:20,911 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3878090.0, ans=0.125 2024-08-18 11:57:28,200 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.80 vs. limit=15.0 2024-08-18 11:57:34,931 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 1200, loss[loss=0.09887, beats_loss=0.01175, ecapa_loss=0.0001362, whisper_loss=0.08576, over 22428.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01047, ecapa_loss=0.0001418, whisper_loss=0.08925, over 3834102.58 frames. ], batch size: 89, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:57:44,833 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.82 vs. limit=15.0 2024-08-18 11:57:45,765 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3878190.0, ans=0.2 2024-08-18 11:57:46,608 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 19 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-18 11:57:47,797 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.71 vs. limit=15.0 2024-08-18 11:57:48,611 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3878290.0, ans=0.125 2024-08-18 11:57:55,836 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3878290.0, ans=0.0 2024-08-18 11:58:02,889 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.07 vs. limit=15.0 2024-08-18 11:58:05,684 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3878390.0, ans=0.125 2024-08-18 11:58:07,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3878390.0, ans=0.125 2024-08-18 11:58:09,559 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-18 11:58:13,588 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 16 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-18 11:58:16,080 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.21 vs. limit=12.0 2024-08-18 11:58:21,143 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3878490.0, ans=0.0 2024-08-18 11:58:30,421 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-18 11:58:32,451 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.20 vs. limit=22.5 2024-08-18 11:58:32,613 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.13 vs. limit=22.5 2024-08-18 11:58:35,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3878590.0, ans=0.1 2024-08-18 11:58:37,035 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 14 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-18 11:58:41,593 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3878590.0, ans=0.2 2024-08-18 11:58:46,591 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 1250, loss[loss=0.09763, beats_loss=0.009189, ecapa_loss=0.0001652, whisper_loss=0.08679, over 17301.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01045, ecapa_loss=0.0001438, whisper_loss=0.08846, over 3798612.74 frames. ], batch size: 67, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:58:53,538 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3878690.0, ans=0.125 2024-08-18 11:58:55,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3878690.0, ans=0.125 2024-08-18 11:59:07,795 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 35 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-18 11:59:12,614 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3878790.0, ans=0.125 2024-08-18 11:59:12,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3878790.0, ans=0.07 2024-08-18 11:59:13,789 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3878790.0, ans=0.0 2024-08-18 11:59:15,993 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 18 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-18 11:59:30,986 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.48 vs. limit=15.0 2024-08-18 11:59:38,308 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.16 vs. limit=12.0 2024-08-18 11:59:43,007 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.277e+01 2.561e+01 2.801e+01 1.202e+02, threshold=5.122e+01, percent-clipped=2.0 2024-08-18 11:59:59,276 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 1300, loss[loss=0.09913, beats_loss=0.01078, ecapa_loss=0.0001446, whisper_loss=0.08691, over 14588.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.0105, ecapa_loss=0.0001433, whisper_loss=0.08814, over 3794139.91 frames. ], batch size: 59, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:00:01,267 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3879190.0, ans=0.125 2024-08-18 12:00:06,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3879190.0, ans=0.125 2024-08-18 12:00:21,152 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 16 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-18 12:00:29,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3879390.0, ans=0.07 2024-08-18 12:00:39,879 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-18 12:00:40,287 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3879390.0, ans=0.125 2024-08-18 12:00:45,428 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.05 vs. limit=6.0 2024-08-18 12:01:10,557 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-18 12:01:12,034 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 1350, loss[loss=0.09451, beats_loss=0.01251, ecapa_loss=0.0001251, whisper_loss=0.08075, over 23076.00 frames. ], tot_loss[loss=0.09954, beats_loss=0.0105, ecapa_loss=0.0001433, whisper_loss=0.08761, over 3797018.66 frames. ], batch size: 94, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:01:14,699 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3879690.0, ans=0.1 2024-08-18 12:01:16,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=3879690.0, ans=15.0 2024-08-18 12:01:16,853 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 28 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-18 12:01:21,113 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3879690.0, ans=0.125 2024-08-18 12:01:49,880 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3879890.0, ans=0.125 2024-08-18 12:02:15,308 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.267e+01 2.490e+01 2.849e+01 7.961e+01, threshold=4.979e+01, percent-clipped=1.0 2024-08-18 12:02:22,019 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.44 vs. limit=15.0 2024-08-18 12:02:31,460 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 1400, loss[loss=0.09032, beats_loss=0.01075, ecapa_loss=0.0001636, whisper_loss=0.07793, over 19245.00 frames. ], tot_loss[loss=0.09975, beats_loss=0.01046, ecapa_loss=0.0001435, whisper_loss=0.08786, over 3802996.99 frames. ], batch size: 78, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:02:35,820 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-18 12:02:48,886 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3880290.0, ans=0.125 2024-08-18 12:02:57,903 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3880290.0, ans=0.125 2024-08-18 12:03:06,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3880390.0, ans=0.125 2024-08-18 12:03:09,297 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3880390.0, ans=0.05 2024-08-18 12:03:14,547 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 12:04:12,627 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.53 vs. limit=12.0 2024-08-18 12:04:14,361 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 1450, loss[loss=0.08638, beats_loss=0.0135, ecapa_loss=0.0001089, whisper_loss=0.07179, over 15813.00 frames. ], tot_loss[loss=0.0994, beats_loss=0.01052, ecapa_loss=0.0001415, whisper_loss=0.08747, over 3766926.20 frames. ], batch size: 63, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:04:29,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3880790.0, ans=0.0 2024-08-18 12:04:46,462 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3880890.0, ans=0.125 2024-08-18 12:04:53,067 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 27 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-18 12:04:53,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3880890.0, ans=0.125 2024-08-18 12:05:14,249 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.730e+01 2.279e+01 2.479e+01 2.751e+01 4.183e+01, threshold=4.959e+01, percent-clipped=0.0 2024-08-18 12:05:16,551 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-18 12:05:23,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3881090.0, ans=0.125 2024-08-18 12:05:30,749 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 1500, loss[loss=0.07878, beats_loss=0.01128, ecapa_loss=0.0001559, whisper_loss=0.06594, over 19534.00 frames. ], tot_loss[loss=0.09864, beats_loss=0.01061, ecapa_loss=0.0001398, whisper_loss=0.08664, over 3774749.14 frames. ], batch size: 82, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:05:32,645 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 17 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-18 12:05:54,306 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3881290.0, ans=0.125 2024-08-18 12:06:19,065 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 22 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-18 12:06:20,494 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 32 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-18 12:06:22,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3881490.0, ans=0.0 2024-08-18 12:06:24,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3881490.0, ans=0.0 2024-08-18 12:06:44,522 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 1550, loss[loss=0.1013, beats_loss=0.008091, ecapa_loss=0.0001316, whisper_loss=0.09187, over 16511.00 frames. ], tot_loss[loss=0.09857, beats_loss=0.01063, ecapa_loss=0.0001403, whisper_loss=0.08653, over 3781853.69 frames. ], batch size: 61, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:06:52,189 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3881690.0, ans=0.05 2024-08-18 12:06:57,795 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3881690.0, ans=0.1 2024-08-18 12:07:10,698 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 28 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-18 12:07:13,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3881790.0, ans=0.04949747468305833 2024-08-18 12:07:23,312 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.62 vs. limit=15.0 2024-08-18 12:07:35,823 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3881990.0, ans=0.1 2024-08-18 12:07:41,819 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 31 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-18 12:07:43,582 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3881990.0, ans=0.125 2024-08-18 12:07:45,116 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3882090.0, ans=0.09899494936611666 2024-08-18 12:07:45,732 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.226e+01 2.364e+01 2.655e+01 3.408e+01, threshold=4.729e+01, percent-clipped=0.0 2024-08-18 12:07:54,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3882090.0, ans=0.1 2024-08-18 12:08:00,521 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 1600, loss[loss=0.09197, beats_loss=0.01081, ecapa_loss=0.0001716, whisper_loss=0.07944, over 21548.00 frames. ], tot_loss[loss=0.09984, beats_loss=0.01048, ecapa_loss=0.0001409, whisper_loss=0.08794, over 3795724.61 frames. ], batch size: 91, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:08:05,914 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-18 12:08:27,563 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-18 12:08:39,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3882390.0, ans=0.125 2024-08-18 12:08:40,794 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3882390.0, ans=0.2 2024-08-18 12:09:00,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3882590.0, ans=0.2 2024-08-18 12:09:03,924 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3882590.0, ans=0.125 2024-08-18 12:09:03,987 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3882590.0, ans=0.1 2024-08-18 12:09:16,011 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 1650, loss[loss=0.1189, beats_loss=0.01104, ecapa_loss=0.0001009, whisper_loss=0.1068, over 23674.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01046, ecapa_loss=0.0001403, whisper_loss=0.08919, over 3831213.22 frames. ], batch size: 88, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:09:17,641 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3882690.0, ans=0.125 2024-08-18 12:09:27,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3882690.0, ans=0.2 2024-08-18 12:09:38,307 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3882790.0, ans=0.125 2024-08-18 12:09:57,646 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-18 12:10:00,680 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3882990.0, ans=0.015 2024-08-18 12:10:03,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3882990.0, ans=0.1 2024-08-18 12:10:03,853 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3882990.0, ans=0.125 2024-08-18 12:10:11,071 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3882990.0, ans=0.125 2024-08-18 12:10:13,366 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.497e+01 2.361e+01 2.617e+01 2.894e+01 3.984e+01, threshold=5.235e+01, percent-clipped=0.0 2024-08-18 12:10:13,846 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3883090.0, ans=10.0 2024-08-18 12:10:21,136 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3883090.0, ans=0.125 2024-08-18 12:10:27,975 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 1700, loss[loss=0.1016, beats_loss=0.01045, ecapa_loss=0.0001215, whisper_loss=0.08996, over 23033.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0104, ecapa_loss=0.0001395, whisper_loss=0.0894, over 3823860.33 frames. ], batch size: 90, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:10:40,375 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3883190.0, ans=0.125 2024-08-18 12:11:01,356 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 23 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-18 12:11:09,649 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3883490.0, ans=0.125 2024-08-18 12:11:13,759 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3883490.0, ans=0.0 2024-08-18 12:11:19,255 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3883490.0, ans=0.125 2024-08-18 12:11:34,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3883590.0, ans=0.0 2024-08-18 12:11:38,441 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 1750, loss[loss=0.1008, beats_loss=0.01084, ecapa_loss=0.0001393, whisper_loss=0.08856, over 19507.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01045, ecapa_loss=0.0001404, whisper_loss=0.08894, over 3821088.33 frames. ], batch size: 80, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:11:44,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3883690.0, ans=0.125 2024-08-18 12:11:45,655 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-18 12:11:54,512 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3883790.0, ans=0.0 2024-08-18 12:12:05,346 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-18 12:12:34,935 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.272e+01 2.549e+01 2.826e+01 1.079e+02, threshold=5.098e+01, percent-clipped=1.0 2024-08-18 12:12:35,800 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.35 vs. limit=10.0 2024-08-18 12:12:48,436 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 1800, loss[loss=0.08205, beats_loss=0.01158, ecapa_loss=0.0001367, whisper_loss=0.0691, over 18458.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01045, ecapa_loss=0.0001408, whisper_loss=0.08856, over 3830101.84 frames. ], batch size: 75, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:12:49,038 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3884190.0, ans=0.125 2024-08-18 12:12:54,628 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=14.74 vs. limit=15.0 2024-08-18 12:13:04,040 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-18 12:13:28,286 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=15.80 vs. limit=15.0 2024-08-18 12:13:29,260 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-18 12:13:39,450 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3884490.0, ans=0.0 2024-08-18 12:13:57,790 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2024-08-18 12:13:58,586 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 1850, loss[loss=0.1023, beats_loss=0.01037, ecapa_loss=0.0001296, whisper_loss=0.09065, over 24050.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01039, ecapa_loss=0.0001424, whisper_loss=0.08898, over 3802999.40 frames. ], batch size: 95, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:13:59,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3884690.0, ans=0.125 2024-08-18 12:14:02,894 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 22 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-18 12:14:52,936 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 25 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-18 12:14:54,287 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.251e+01 2.486e+01 2.812e+01 3.198e+02, threshold=4.971e+01, percent-clipped=2.0 2024-08-18 12:15:08,238 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 1900, loss[loss=0.1049, beats_loss=0.008974, ecapa_loss=0.0001419, whisper_loss=0.09451, over 22625.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01043, ecapa_loss=0.0001423, whisper_loss=0.08893, over 3801674.20 frames. ], batch size: 90, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:15:13,833 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-18 12:15:36,349 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-18 12:15:43,183 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 20 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-18 12:15:43,526 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3885390.0, ans=10.0 2024-08-18 12:15:50,786 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3885490.0, ans=0.1 2024-08-18 12:15:51,880 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-18 12:15:58,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3885490.0, ans=0.125 2024-08-18 12:16:06,698 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-18 12:16:09,567 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 15 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-18 12:16:17,364 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 1950, loss[loss=0.08684, beats_loss=0.01084, ecapa_loss=0.0001358, whisper_loss=0.07464, over 18064.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01041, ecapa_loss=0.0001434, whisper_loss=0.08921, over 3783524.29 frames. ], batch size: 73, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:16:19,256 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3885690.0, ans=0.0 2024-08-18 12:16:24,380 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3885690.0, ans=0.125 2024-08-18 12:16:25,859 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-18 12:16:37,089 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-18 12:16:47,249 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 31 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-18 12:17:10,357 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 18 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-18 12:17:14,428 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.298e+01 2.562e+01 2.933e+01 2.107e+02, threshold=5.124e+01, percent-clipped=3.0 2024-08-18 12:17:23,968 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 25 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-18 12:17:29,256 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 2000, loss[loss=0.1135, beats_loss=0.009347, ecapa_loss=0.0001638, whisper_loss=0.1025, over 16886.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01039, ecapa_loss=0.0001427, whisper_loss=0.08979, over 3785255.99 frames. ], batch size: 69, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:17:32,206 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 16 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-18 12:17:48,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3886290.0, ans=0.0 2024-08-18 12:17:49,091 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.56 vs. limit=22.5 2024-08-18 12:18:04,897 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-18 12:18:19,377 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3886490.0, ans=0.09899494936611666 2024-08-18 12:18:27,432 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-18 12:18:27,883 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.87 vs. limit=15.0 2024-08-18 12:18:28,041 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2024-08-18 12:18:29,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3886590.0, ans=0.125 2024-08-18 12:18:37,578 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3886590.0, ans=0.0 2024-08-18 12:18:40,188 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 2050, loss[loss=0.1007, beats_loss=0.01019, ecapa_loss=0.0001318, whisper_loss=0.08916, over 22090.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01042, ecapa_loss=0.0001427, whisper_loss=0.08927, over 3810623.59 frames. ], batch size: 91, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:18:47,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3886690.0, ans=0.125 2024-08-18 12:19:00,189 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3886790.0, ans=0.0 2024-08-18 12:19:00,407 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.63 vs. limit=15.0 2024-08-18 12:19:29,937 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3886990.0, ans=0.5 2024-08-18 12:19:30,776 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-18 12:19:31,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3886990.0, ans=0.1 2024-08-18 12:19:33,462 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3886990.0, ans=0.1 2024-08-18 12:19:36,189 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.270e+01 2.543e+01 2.879e+01 5.540e+01, threshold=5.086e+01, percent-clipped=1.0 2024-08-18 12:19:39,589 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3887090.0, ans=0.125 2024-08-18 12:19:50,522 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 2100, loss[loss=0.09545, beats_loss=0.0124, ecapa_loss=0.000141, whisper_loss=0.08164, over 21265.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01042, ecapa_loss=0.0001423, whisper_loss=0.08974, over 3806247.80 frames. ], batch size: 87, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:19:58,838 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3887190.0, ans=0.2 2024-08-18 12:20:01,587 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3887190.0, ans=0.2 2024-08-18 12:20:20,039 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-18 12:20:48,904 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3887590.0, ans=0.0 2024-08-18 12:20:55,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3887590.0, ans=0.125 2024-08-18 12:20:57,264 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 23 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-18 12:20:59,992 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 2150, loss[loss=0.08782, beats_loss=0.009623, ecapa_loss=0.0001863, whisper_loss=0.07634, over 19560.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01038, ecapa_loss=0.000142, whisper_loss=0.09019, over 3802927.40 frames. ], batch size: 85, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:21:06,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3887690.0, ans=0.1 2024-08-18 12:21:09,009 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-18 12:21:40,769 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.60 vs. limit=12.0 2024-08-18 12:21:55,793 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.09 vs. limit=15.0 2024-08-18 12:21:58,475 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.293e+01 2.488e+01 2.854e+01 6.681e+01, threshold=4.977e+01, percent-clipped=1.0 2024-08-18 12:22:06,519 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3888090.0, ans=0.1 2024-08-18 12:22:12,558 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 2200, loss[loss=0.1104, beats_loss=0.01105, ecapa_loss=0.0001253, whisper_loss=0.09805, over 24035.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01037, ecapa_loss=0.0001413, whisper_loss=0.09098, over 3829937.34 frames. ], batch size: 92, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:22:36,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3888290.0, ans=0.0 2024-08-18 12:22:51,651 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 16 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-18 12:23:11,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3888590.0, ans=0.0 2024-08-18 12:23:19,175 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.97 vs. limit=15.0 2024-08-18 12:23:23,889 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 2250, loss[loss=0.1213, beats_loss=0.008883, ecapa_loss=0.0001445, whisper_loss=0.111, over 22369.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01042, ecapa_loss=0.0001416, whisper_loss=0.09163, over 3864966.45 frames. ], batch size: 83, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:23:34,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3888690.0, ans=0.0 2024-08-18 12:23:39,552 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 15 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-18 12:23:45,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3888790.0, ans=0.125 2024-08-18 12:24:08,597 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3888890.0, ans=0.025 2024-08-18 12:24:08,902 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.09 vs. limit=15.0 2024-08-18 12:24:15,763 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 19 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-18 12:24:26,489 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.345e+01 2.551e+01 2.862e+01 1.228e+02, threshold=5.102e+01, percent-clipped=1.0 2024-08-18 12:24:35,493 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=5.188e-03 2024-08-18 12:24:42,551 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 2300, loss[loss=0.08888, beats_loss=0.01231, ecapa_loss=0.0001288, whisper_loss=0.07528, over 21589.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01046, ecapa_loss=0.0001412, whisper_loss=0.09183, over 3881803.87 frames. ], batch size: 90, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:24:48,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3889190.0, ans=0.0 2024-08-18 12:24:54,711 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3889190.0, ans=0.125 2024-08-18 12:25:01,523 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3889290.0, ans=0.125 2024-08-18 12:25:02,453 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 27 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-18 12:25:04,258 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3889290.0, ans=10.0 2024-08-18 12:25:07,040 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-18 12:25:14,824 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.69 vs. limit=15.0 2024-08-18 12:25:21,516 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 35 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-18 12:25:43,644 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 19 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-18 12:25:52,330 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3889590.0, ans=10.0 2024-08-18 12:25:56,287 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.93 vs. limit=12.0 2024-08-18 12:25:58,403 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-18 12:26:01,351 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 2350, loss[loss=0.105, beats_loss=0.008849, ecapa_loss=0.0001524, whisper_loss=0.09467, over 21608.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01039, ecapa_loss=0.0001415, whisper_loss=0.09226, over 3875755.44 frames. ], batch size: 87, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:26:17,451 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3889790.0, ans=0.0 2024-08-18 12:26:22,017 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3889790.0, ans=0.0 2024-08-18 12:26:28,416 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3889790.0, ans=0.125 2024-08-18 12:26:43,603 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 12:27:02,953 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.58 vs. limit=15.0 2024-08-18 12:27:04,494 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.417e+01 2.635e+01 3.019e+01 1.167e+02, threshold=5.271e+01, percent-clipped=1.0 2024-08-18 12:27:07,303 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 26 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-18 12:27:18,849 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 2400, loss[loss=0.08968, beats_loss=0.01171, ecapa_loss=0.0001363, whisper_loss=0.07661, over 14501.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0104, ecapa_loss=0.0001428, whisper_loss=0.09137, over 3880660.58 frames. ], batch size: 58, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:27:23,663 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-18 12:27:47,450 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3890390.0, ans=0.1 2024-08-18 12:27:57,419 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3890390.0, ans=0.125 2024-08-18 12:28:19,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3890590.0, ans=0.125 2024-08-18 12:28:30,921 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 2450, loss[loss=0.07434, beats_loss=0.01174, ecapa_loss=9.369e-05, whisper_loss=0.06166, over 15237.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01041, ecapa_loss=0.0001433, whisper_loss=0.09108, over 3834366.53 frames. ], batch size: 58, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:28:35,996 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.48 vs. limit=22.5 2024-08-18 12:28:48,572 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-18 12:28:49,428 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.13 vs. limit=15.0 2024-08-18 12:28:52,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3890790.0, ans=0.125 2024-08-18 12:28:56,060 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3890790.0, ans=0.125 2024-08-18 12:28:56,079 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3890790.0, ans=0.125 2024-08-18 12:29:06,819 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3890890.0, ans=0.2 2024-08-18 12:29:08,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3890890.0, ans=0.125 2024-08-18 12:29:26,838 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-18 12:29:27,934 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.266e+01 2.434e+01 2.760e+01 4.670e+01, threshold=4.867e+01, percent-clipped=0.0 2024-08-18 12:29:35,781 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.95 vs. limit=15.0 2024-08-18 12:29:42,809 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 2500, loss[loss=0.09303, beats_loss=0.01049, ecapa_loss=0.0001242, whisper_loss=0.0813, over 17736.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01044, ecapa_loss=0.0001423, whisper_loss=0.09067, over 3858491.41 frames. ], batch size: 69, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:29:43,008 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 22 from LS+wenet, 19 from Vox, 52 fro AS 2024-08-18 12:29:48,211 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 28 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-18 12:29:53,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3891190.0, ans=0.125 2024-08-18 12:30:07,766 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.85 vs. limit=15.0 2024-08-18 12:30:11,584 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 26 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-18 12:30:14,556 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-18 12:30:15,741 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3891390.0, ans=6.0 2024-08-18 12:30:38,349 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-18 12:30:51,877 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 2550, loss[loss=0.099, beats_loss=0.01098, ecapa_loss=0.0001366, whisper_loss=0.08666, over 17509.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01033, ecapa_loss=0.0001423, whisper_loss=0.09192, over 3887706.77 frames. ], batch size: 70, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:30:58,114 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3891690.0, ans=0.125 2024-08-18 12:30:58,192 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3891690.0, ans=0.0 2024-08-18 12:30:59,758 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.31 vs. limit=15.0 2024-08-18 12:31:22,889 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.83 vs. limit=15.0 2024-08-18 12:31:23,631 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 27 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-18 12:31:30,938 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 21 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-18 12:31:43,211 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.892e+01 2.325e+01 2.537e+01 2.934e+01 3.751e+01, threshold=5.074e+01, percent-clipped=0.0 2024-08-18 12:31:44,853 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3892090.0, ans=0.125 2024-08-18 12:31:47,353 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3892090.0, ans=0.0 2024-08-18 12:31:55,539 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 2600, loss[loss=0.1162, beats_loss=0.007236, ecapa_loss=0.0001567, whisper_loss=0.1074, over 16554.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01032, ecapa_loss=0.0001435, whisper_loss=0.09113, over 3868718.84 frames. ], batch size: 65, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:32:02,249 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-18 12:32:23,511 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 24 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-18 12:32:29,444 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 21 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-18 12:32:32,430 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3892490.0, ans=0.125 2024-08-18 12:32:35,793 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3892490.0, ans=0.125 2024-08-18 12:32:37,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3892490.0, ans=0.1 2024-08-18 12:32:57,987 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 2650, loss[loss=0.08745, beats_loss=0.0117, ecapa_loss=0.0001238, whisper_loss=0.07451, over 20802.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01043, ecapa_loss=0.000143, whisper_loss=0.09025, over 3877043.10 frames. ], batch size: 82, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:33:01,057 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3892690.0, ans=0.0 2024-08-18 12:33:08,331 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-18 12:33:10,684 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 30 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-18 12:33:25,910 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 24 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-18 12:33:37,716 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3892990.0, ans=0.2 2024-08-18 12:33:47,592 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3893090.0, ans=0.2 2024-08-18 12:33:48,373 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.994e+01 2.397e+01 2.618e+01 2.808e+01 4.074e+01, threshold=5.236e+01, percent-clipped=0.0 2024-08-18 12:33:52,174 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 20 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-18 12:34:00,616 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 2700, loss[loss=0.1039, beats_loss=0.01032, ecapa_loss=0.0001558, whisper_loss=0.09198, over 19300.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01043, ecapa_loss=0.0001427, whisper_loss=0.08996, over 3868556.64 frames. ], batch size: 76, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:34:04,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3893190.0, ans=0.1 2024-08-18 12:34:07,077 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-18 12:34:27,391 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 22 from LS+wenet, 22 from Vox, 18 fro AS 2024-08-18 12:34:30,223 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.52 vs. limit=15.0 2024-08-18 12:34:57,549 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 26 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-18 12:34:58,822 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-18 12:34:59,230 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.04 vs. limit=15.0 2024-08-18 12:35:03,441 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 2750, loss[loss=0.09846, beats_loss=0.008694, ecapa_loss=0.0001968, whisper_loss=0.0878, over 18153.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01037, ecapa_loss=0.0001429, whisper_loss=0.09039, over 3869655.06 frames. ], batch size: 78, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:35:08,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3893690.0, ans=0.0 2024-08-18 12:35:29,199 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3893890.0, ans=0.125 2024-08-18 12:35:35,234 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 20 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-18 12:35:36,501 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 27 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-18 12:35:41,368 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 31 from LS+wenet, 11 from Vox, 37 fro AS 2024-08-18 12:35:49,061 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 29 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-18 12:35:53,876 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.272e+01 2.417e+01 2.652e+01 4.888e+01, threshold=4.834e+01, percent-clipped=0.0 2024-08-18 12:35:59,805 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-18 12:36:06,027 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 2800, loss[loss=0.08097, beats_loss=0.01443, ecapa_loss=0.0001049, whisper_loss=0.06549, over 19788.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01044, ecapa_loss=0.0001415, whisper_loss=0.09085, over 3875079.92 frames. ], batch size: 79, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:36:09,576 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.30 vs. limit=15.0 2024-08-18 12:36:12,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3894190.0, ans=0.125 2024-08-18 12:36:19,881 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-18 12:37:00,886 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 22 from LS+wenet, 22 from Vox, 49 fro AS 2024-08-18 12:37:08,234 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 2850, loss[loss=0.06832, beats_loss=0.01283, ecapa_loss=0.0001725, whisper_loss=0.05376, over 16380.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01049, ecapa_loss=0.0001413, whisper_loss=0.09098, over 3880515.30 frames. ], batch size: 72, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:37:09,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3894690.0, ans=0.2 2024-08-18 12:37:28,554 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.81 vs. limit=15.0 2024-08-18 12:37:33,096 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3894890.0, ans=0.1 2024-08-18 12:37:36,450 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 26 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-18 12:37:38,323 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.50 vs. limit=15.0 2024-08-18 12:37:52,969 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.60 vs. limit=15.0 2024-08-18 12:37:54,870 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 20 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-18 12:37:57,125 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.725e+01 2.309e+01 2.623e+01 2.921e+01 3.884e+01, threshold=5.245e+01, percent-clipped=0.0 2024-08-18 12:37:58,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3895090.0, ans=0.0 2024-08-18 12:38:09,928 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 2900, loss[loss=0.1208, beats_loss=0.01148, ecapa_loss=0.0001359, whisper_loss=0.1079, over 23188.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01059, ecapa_loss=0.0001419, whisper_loss=0.09031, over 3898954.01 frames. ], batch size: 90, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:38:13,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3895190.0, ans=0.1 2024-08-18 12:38:17,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3895190.0, ans=0.2 2024-08-18 12:38:20,240 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.51 vs. limit=22.5 2024-08-18 12:38:25,303 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.09 vs. limit=22.5 2024-08-18 12:38:26,203 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3895290.0, ans=0.125 2024-08-18 12:39:07,009 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.78 vs. limit=15.0 2024-08-18 12:39:11,116 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 2950, loss[loss=0.1205, beats_loss=0.008931, ecapa_loss=0.0001521, whisper_loss=0.11, over 21061.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01059, ecapa_loss=0.0001428, whisper_loss=0.08995, over 3916949.19 frames. ], batch size: 80, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:39:11,285 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-18 12:39:13,422 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.81 vs. limit=15.0 2024-08-18 12:39:20,432 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-18 12:39:20,700 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3895690.0, ans=0.125 2024-08-18 12:39:22,730 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 14 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-18 12:39:28,271 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.12 vs. limit=15.0 2024-08-18 12:39:43,675 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3895890.0, ans=0.1 2024-08-18 12:39:44,857 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3895890.0, ans=0.09899494936611666 2024-08-18 12:39:51,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3895990.0, ans=0.125 2024-08-18 12:40:01,800 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.273e+01 2.616e+01 2.939e+01 5.806e+01, threshold=5.233e+01, percent-clipped=1.0 2024-08-18 12:40:06,478 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.17 vs. limit=22.5 2024-08-18 12:40:07,050 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 17 from LS+wenet, 19 from Vox, 17 fro AS 2024-08-18 12:40:14,045 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.72 vs. limit=22.5 2024-08-18 12:40:14,608 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 3000, loss[loss=0.1076, beats_loss=0.009694, ecapa_loss=0.0001804, whisper_loss=0.09612, over 14931.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01049, ecapa_loss=0.0001436, whisper_loss=0.09029, over 3861454.06 frames. ], batch size: 63, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:40:14,608 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-18 12:40:51,428 INFO [train_multi_KD3.py:1149] (2/4) Epoch 27, validation on ASR_libri: loss=0.2535, beats_loss=0, ecapa_loss=0.000526, whisper_loss=0.2482, over 922467.00 frames. 2024-08-18 12:41:06,205 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.0710, 2.4357, 2.3574, 2.2245, 2.9850, 2.7067, 2.5556, 2.3400], device='cuda:2') 2024-08-18 12:41:08,070 INFO [train_multi_KD3.py:1149] (2/4) Epoch 27, validation on SV_voxceleb1: loss=0.003954, beats_loss=0, ecapa_loss=0.0003954, whisper_loss=0, over 939242.00 frames. 2024-08-18 12:42:15,717 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8186, 4.2114, 2.9012, 4.6736], device='cuda:2') 2024-08-18 12:42:59,113 INFO [train_multi_KD3.py:1149] (2/4) Epoch 27, validation on AT_audioset: loss=0.02313, beats_loss=0.02313, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 12:42:59,117 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31584MB 2024-08-18 12:43:07,227 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-18 12:43:16,891 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.31 vs. limit=15.0 2024-08-18 12:43:17,961 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3896290.0, ans=0.0 2024-08-18 12:43:25,364 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-18 12:43:40,122 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 23 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-18 12:43:58,253 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.40 vs. limit=15.0 2024-08-18 12:43:59,144 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3896590.0, ans=0.125 2024-08-18 12:44:00,451 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3896690.0, ans=0.125 2024-08-18 12:44:01,469 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 3050, loss[loss=0.1057, beats_loss=0.0123, ecapa_loss=0.0001295, whisper_loss=0.0921, over 23563.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0106, ecapa_loss=0.0001426, whisper_loss=0.08958, over 3873535.60 frames. ], batch size: 96, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:44:02,968 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 29 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-18 12:44:19,046 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 12:44:32,930 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.50 vs. limit=12.0 2024-08-18 12:44:51,087 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.987e+01 2.420e+01 2.665e+01 2.949e+01 2.105e+02, threshold=5.329e+01, percent-clipped=1.0 2024-08-18 12:44:54,716 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-18 12:45:03,657 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 3100, loss[loss=0.08092, beats_loss=0.01223, ecapa_loss=0.0001092, whisper_loss=0.0676, over 16116.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01051, ecapa_loss=0.0001443, whisper_loss=0.08955, over 3836551.11 frames. ], batch size: 59, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:45:06,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3897190.0, ans=0.1 2024-08-18 12:45:10,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3897190.0, ans=0.125 2024-08-18 12:45:15,987 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 20 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-18 12:45:19,283 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.55 vs. limit=15.0 2024-08-18 12:45:37,398 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-18 12:45:38,265 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.89 vs. limit=10.0 2024-08-18 12:45:42,769 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3897490.0, ans=0.1 2024-08-18 12:45:43,666 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-18 12:45:51,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3897490.0, ans=0.0 2024-08-18 12:46:06,136 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 3150, loss[loss=0.06979, beats_loss=0.01232, ecapa_loss=0.0001803, whisper_loss=0.05566, over 16228.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01056, ecapa_loss=0.0001448, whisper_loss=0.08928, over 3824425.63 frames. ], batch size: 70, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:46:13,104 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.18 vs. limit=10.0 2024-08-18 12:46:14,900 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 36 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-18 12:46:20,297 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 21 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-18 12:46:23,937 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-18 12:46:30,996 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.09 vs. limit=15.0 2024-08-18 12:46:41,956 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 15 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-18 12:46:45,966 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3897990.0, ans=0.0 2024-08-18 12:46:56,624 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.328e+01 2.486e+01 2.838e+01 3.960e+01, threshold=4.973e+01, percent-clipped=0.0 2024-08-18 12:46:59,727 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3898090.0, ans=0.125 2024-08-18 12:47:07,872 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 21 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-18 12:47:08,940 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 3200, loss[loss=0.09503, beats_loss=0.01288, ecapa_loss=0.0001153, whisper_loss=0.081, over 18868.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01051, ecapa_loss=0.0001441, whisper_loss=0.09003, over 3847595.05 frames. ], batch size: 76, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:47:20,791 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 22 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-18 12:47:23,217 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3898290.0, ans=0.125 2024-08-18 12:47:28,990 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-18 12:47:31,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3898290.0, ans=0.125 2024-08-18 12:47:31,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3898290.0, ans=0.125 2024-08-18 12:47:39,286 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3898390.0, ans=0.2 2024-08-18 12:47:39,353 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3898390.0, ans=0.05 2024-08-18 12:47:45,417 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3898490.0, ans=0.0 2024-08-18 12:47:46,383 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-18 12:47:49,277 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3898490.0, ans=0.125 2024-08-18 12:47:49,483 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.63 vs. limit=10.0 2024-08-18 12:48:06,766 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.48 vs. limit=15.0 2024-08-18 12:48:11,355 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 3250, loss[loss=0.1025, beats_loss=0.01098, ecapa_loss=0.0001526, whisper_loss=0.09002, over 23038.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01048, ecapa_loss=0.0001443, whisper_loss=0.09088, over 3864103.02 frames. ], batch size: 93, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:48:15,722 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3898690.0, ans=0.125 2024-08-18 12:48:28,084 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 12:48:30,267 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-18 12:48:35,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3898890.0, ans=0.2 2024-08-18 12:48:43,999 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3898890.0, ans=0.0 2024-08-18 12:48:51,249 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 19 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-18 12:48:57,773 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3898990.0, ans=0.125 2024-08-18 12:49:00,906 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.726e+01 2.280e+01 2.573e+01 2.891e+01 1.155e+02, threshold=5.145e+01, percent-clipped=3.0 2024-08-18 12:49:13,373 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 3300, loss[loss=0.09686, beats_loss=0.01278, ecapa_loss=0.0001071, whisper_loss=0.08302, over 23512.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01046, ecapa_loss=0.0001446, whisper_loss=0.09136, over 3868638.16 frames. ], batch size: 92, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:49:16,843 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 37 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-18 12:49:58,378 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.62 vs. limit=10.0 2024-08-18 12:50:09,451 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3899590.0, ans=0.0 2024-08-18 12:50:15,143 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 3350, loss[loss=0.09257, beats_loss=0.01156, ecapa_loss=0.0001345, whisper_loss=0.07966, over 20621.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01045, ecapa_loss=0.0001453, whisper_loss=0.09118, over 3880243.17 frames. ], batch size: 83, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:50:25,038 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3899690.0, ans=0.125 2024-08-18 12:50:40,751 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 25 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-18 12:50:57,269 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3899990.0, ans=0.0 2024-08-18 12:50:58,465 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3899990.0, ans=0.125 2024-08-18 12:51:00,769 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3899990.0, ans=0.2 2024-08-18 12:51:02,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3899990.0, ans=0.0 2024-08-18 12:51:04,247 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.820e+01 2.398e+01 2.647e+01 2.975e+01 4.321e+02, threshold=5.295e+01, percent-clipped=5.0 2024-08-18 12:51:09,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3900090.0, ans=0.125 2024-08-18 12:51:09,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3900090.0, ans=0.125 2024-08-18 12:51:10,739 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3900090.0, ans=0.2 2024-08-18 12:51:14,529 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.75 vs. limit=15.0 2024-08-18 12:51:16,219 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 3400, loss[loss=0.1004, beats_loss=0.01129, ecapa_loss=0.0001401, whisper_loss=0.08771, over 22589.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01051, ecapa_loss=0.0001445, whisper_loss=0.09034, over 3903100.72 frames. ], batch size: 89, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:51:20,972 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 17 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-18 12:51:39,158 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 22 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-18 12:51:40,428 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-18 12:51:46,327 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 38 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-18 12:51:53,184 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.85 vs. limit=22.5 2024-08-18 12:52:09,643 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3900590.0, ans=0.125 2024-08-18 12:52:16,798 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 3450, loss[loss=0.1126, beats_loss=0.01101, ecapa_loss=0.0001588, whisper_loss=0.1, over 20916.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0105, ecapa_loss=0.0001453, whisper_loss=0.08983, over 3898628.63 frames. ], batch size: 85, lr: 2.28e-03, grad_scale: 1.152921504606847e+18 2024-08-18 12:52:27,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3900690.0, ans=0.2 2024-08-18 12:52:33,033 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3900790.0, ans=0.035 2024-08-18 12:52:34,021 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-18 12:52:38,363 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3900790.0, ans=0.125 2024-08-18 12:52:39,376 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-18 12:52:39,674 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 12:52:43,592 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 14 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-18 12:53:04,018 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3900990.0, ans=0.125 2024-08-18 12:53:08,726 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 37 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-18 12:53:10,845 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.235e+01 2.457e+01 2.725e+01 3.914e+01, threshold=4.915e+01, percent-clipped=0.0 2024-08-18 12:53:29,004 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 3500, loss[loss=0.1082, beats_loss=0.01182, ecapa_loss=0.0001173, whisper_loss=0.09524, over 23184.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0106, ecapa_loss=0.0001438, whisper_loss=0.08951, over 3904985.49 frames. ], batch size: 91, lr: 2.28e-03, grad_scale: 1.152921504606847e+18 2024-08-18 12:54:04,937 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-18 12:54:16,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3901490.0, ans=0.1 2024-08-18 12:54:42,557 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-18 12:54:47,434 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 3550, loss[loss=0.08228, beats_loss=0.0108, ecapa_loss=0.0001499, whisper_loss=0.06997, over 16477.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01055, ecapa_loss=0.0001447, whisper_loss=0.08982, over 3904638.60 frames. ], batch size: 68, lr: 2.28e-03, grad_scale: 1.152921504606847e+18 2024-08-18 12:54:51,345 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 19 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-18 12:54:53,531 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.01 vs. limit=22.5 2024-08-18 12:55:00,668 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3901690.0, ans=0.0 2024-08-18 12:55:17,821 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3901890.0, ans=0.125 2024-08-18 12:55:25,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=3901890.0, ans=0.02 2024-08-18 12:55:30,174 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 22 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-18 12:55:50,573 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.348e+01 2.623e+01 2.938e+01 4.839e+01, threshold=5.246e+01, percent-clipped=0.0 2024-08-18 12:55:52,441 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-18 12:55:58,063 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3902090.0, ans=0.025 2024-08-18 12:56:05,427 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 3600, loss[loss=0.1087, beats_loss=0.009106, ecapa_loss=0.0001625, whisper_loss=0.09792, over 19360.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01056, ecapa_loss=0.0001441, whisper_loss=0.08995, over 3909949.81 frames. ], batch size: 78, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:56:07,523 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3902190.0, ans=0.0 2024-08-18 12:56:35,110 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3902290.0, ans=0.125 2024-08-18 12:56:40,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3902390.0, ans=0.2 2024-08-18 12:56:59,238 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=3902490.0, ans=10.0 2024-08-18 12:57:16,789 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3902590.0, ans=0.0 2024-08-18 12:57:19,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3902590.0, ans=0.125 2024-08-18 12:57:30,350 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 3650, loss[loss=0.09457, beats_loss=0.01036, ecapa_loss=0.0001784, whisper_loss=0.08243, over 21087.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01054, ecapa_loss=0.0001449, whisper_loss=0.08996, over 3938498.73 frames. ], batch size: 91, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:57:30,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3902690.0, ans=0.1 2024-08-18 12:57:32,038 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3902690.0, ans=0.04949747468305833 2024-08-18 12:57:46,740 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3902790.0, ans=0.125 2024-08-18 12:57:47,746 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 17 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-18 12:57:49,839 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.98 vs. limit=6.0 2024-08-18 12:57:51,946 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-18 12:57:54,645 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 16 from Vox, 48 fro AS 2024-08-18 12:58:08,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3902990.0, ans=0.125 2024-08-18 12:58:14,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.whiten.whitening_limit, batch_count=3902990.0, ans=12.0 2024-08-18 12:58:17,897 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-18 12:58:21,440 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.704e+01 2.261e+01 2.422e+01 2.681e+01 4.543e+01, threshold=4.845e+01, percent-clipped=0.0 2024-08-18 12:58:22,198 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.93 vs. limit=15.0 2024-08-18 12:58:22,264 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.44 vs. limit=6.0 2024-08-18 12:58:25,880 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3903090.0, ans=0.125 2024-08-18 12:58:31,215 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3903090.0, ans=0.0 2024-08-18 12:58:33,543 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 3700, loss[loss=0.08199, beats_loss=0.00908, ecapa_loss=0.0001442, whisper_loss=0.07147, over 16308.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01056, ecapa_loss=0.0001444, whisper_loss=0.08985, over 3900183.97 frames. ], batch size: 65, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:58:44,439 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-18 12:58:48,573 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3903290.0, ans=0.0 2024-08-18 12:58:53,871 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3903290.0, ans=0.0 2024-08-18 12:58:57,243 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 17 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-18 12:59:01,833 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.88 vs. limit=10.0 2024-08-18 12:59:02,462 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 20 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-18 12:59:14,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3903490.0, ans=0.04949747468305833 2024-08-18 12:59:15,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3903490.0, ans=0.125 2024-08-18 12:59:38,939 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-18 12:59:41,516 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 3750, loss[loss=0.07833, beats_loss=0.01391, ecapa_loss=0.0001113, whisper_loss=0.06331, over 22754.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01059, ecapa_loss=0.0001434, whisper_loss=0.09021, over 3903724.06 frames. ], batch size: 94, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:59:59,882 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-18 13:00:00,243 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.35 vs. limit=15.0 2024-08-18 13:00:00,978 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 22 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-18 13:00:02,576 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 21 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-18 13:00:14,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3903890.0, ans=0.125 2024-08-18 13:00:19,174 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.009e+00 2024-08-18 13:00:20,533 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.79 vs. limit=15.0 2024-08-18 13:00:20,562 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.19 vs. limit=15.0 2024-08-18 13:00:37,139 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3903990.0, ans=0.2 2024-08-18 13:00:41,065 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 20 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-18 13:00:45,066 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.286e+01 2.566e+01 2.907e+01 8.320e+01, threshold=5.133e+01, percent-clipped=1.0 2024-08-18 13:00:55,470 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-18 13:00:59,270 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 3800, loss[loss=0.1124, beats_loss=0.009797, ecapa_loss=0.0001574, whisper_loss=0.101, over 20779.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01062, ecapa_loss=0.000144, whisper_loss=0.08996, over 3910794.22 frames. ], batch size: 82, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:01:06,862 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3904190.0, ans=0.125 2024-08-18 13:01:13,857 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 17 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-18 13:01:28,920 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 18 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-18 13:01:44,298 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 34 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-18 13:01:44,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3904490.0, ans=0.125 2024-08-18 13:02:05,437 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.87 vs. limit=15.0 2024-08-18 13:02:07,385 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 22 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-18 13:02:13,865 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 3850, loss[loss=0.107, beats_loss=0.009762, ecapa_loss=0.0001423, whisper_loss=0.09584, over 21284.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01055, ecapa_loss=0.0001438, whisper_loss=0.09015, over 3875473.84 frames. ], batch size: 84, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:02:24,748 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-18 13:02:26,659 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3904690.0, ans=0.125 2024-08-18 13:02:28,045 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=3904790.0, ans=10.0 2024-08-18 13:02:34,728 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 30 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-18 13:03:08,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3904990.0, ans=0.125 2024-08-18 13:03:12,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3905090.0, ans=0.1 2024-08-18 13:03:14,618 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.318e+01 2.599e+01 3.015e+01 2.305e+02, threshold=5.197e+01, percent-clipped=2.0 2024-08-18 13:03:23,951 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-18 13:03:27,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3905190.0, ans=0.0 2024-08-18 13:03:28,052 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 3900, loss[loss=0.09885, beats_loss=0.007781, ecapa_loss=0.0002025, whisper_loss=0.08904, over 21558.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0105, ecapa_loss=0.0001447, whisper_loss=0.09064, over 3922690.35 frames. ], batch size: 92, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:03:30,055 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3905190.0, ans=0.0 2024-08-18 13:03:31,402 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3905190.0, ans=10.0 2024-08-18 13:03:56,449 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3905290.0, ans=0.125 2024-08-18 13:04:08,618 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.34 vs. limit=12.0 2024-08-18 13:04:15,526 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-18 13:04:25,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3905490.0, ans=0.0 2024-08-18 13:04:33,570 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 18 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-18 13:04:43,073 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.08 vs. limit=15.0 2024-08-18 13:04:43,567 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 3950, loss[loss=0.1149, beats_loss=0.01062, ecapa_loss=0.0001314, whisper_loss=0.1029, over 22495.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01044, ecapa_loss=0.0001453, whisper_loss=0.09114, over 3934011.00 frames. ], batch size: 88, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:04:47,621 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 21 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-18 13:04:48,773 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-18 13:05:12,731 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3905890.0, ans=0.0 2024-08-18 13:05:15,317 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3905890.0, ans=0.04949747468305833 2024-08-18 13:05:30,207 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3905990.0, ans=0.0 2024-08-18 13:05:33,087 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3905990.0, ans=0.125 2024-08-18 13:05:33,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3905990.0, ans=0.0 2024-08-18 13:05:37,281 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 10 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-18 13:05:45,146 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.363e+01 2.538e+01 2.990e+01 4.778e+02, threshold=5.076e+01, percent-clipped=2.0 2024-08-18 13:05:48,644 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3906090.0, ans=0.0 2024-08-18 13:05:52,416 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3906090.0, ans=0.05 2024-08-18 13:05:52,612 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.96 vs. limit=15.0 2024-08-18 13:05:58,267 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 4000, loss[loss=0.109, beats_loss=0.008462, ecapa_loss=0.0001451, whisper_loss=0.09906, over 18923.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01041, ecapa_loss=0.0001459, whisper_loss=0.09132, over 3924879.99 frames. ], batch size: 75, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:06:07,302 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 20 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-18 13:06:10,777 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.67 vs. limit=15.0 2024-08-18 13:06:32,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3906390.0, ans=0.0 2024-08-18 13:06:35,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3906390.0, ans=0.125 2024-08-18 13:06:37,915 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 22 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-18 13:06:50,041 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-18 13:06:52,149 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.11 vs. limit=15.0 2024-08-18 13:07:01,537 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3906590.0, ans=0.125 2024-08-18 13:07:01,616 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3906590.0, ans=0.0 2024-08-18 13:07:08,436 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 18 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-18 13:07:09,681 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3906590.0, ans=0.125 2024-08-18 13:07:14,356 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 4050, loss[loss=0.09011, beats_loss=0.01026, ecapa_loss=0.0001454, whisper_loss=0.0784, over 17063.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01042, ecapa_loss=0.0001466, whisper_loss=0.09139, over 3877053.70 frames. ], batch size: 69, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:07:26,769 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-18 13:07:33,290 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.46 vs. limit=22.5 2024-08-18 13:07:33,758 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-18 13:07:36,135 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3906790.0, ans=0.1 2024-08-18 13:07:36,144 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3906790.0, ans=0.1 2024-08-18 13:07:37,093 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-18 13:07:49,829 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3906890.0, ans=0.0 2024-08-18 13:07:54,143 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3906890.0, ans=0.95 2024-08-18 13:07:54,392 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.17 vs. limit=12.0 2024-08-18 13:08:06,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3906990.0, ans=0.0 2024-08-18 13:08:07,808 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3906990.0, ans=0.1 2024-08-18 13:08:15,987 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.772e+01 2.302e+01 2.524e+01 2.887e+01 1.698e+02, threshold=5.047e+01, percent-clipped=2.0 2024-08-18 13:08:18,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3907090.0, ans=0.0 2024-08-18 13:08:28,892 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 4100, loss[loss=0.1037, beats_loss=0.01034, ecapa_loss=0.0001368, whisper_loss=0.09196, over 22192.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01044, ecapa_loss=0.0001467, whisper_loss=0.09084, over 3869489.29 frames. ], batch size: 89, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:08:40,588 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.88 vs. limit=15.0 2024-08-18 13:08:41,681 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.48 vs. limit=22.5 2024-08-18 13:08:43,287 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3907290.0, ans=0.125 2024-08-18 13:09:03,101 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3907390.0, ans=0.1 2024-08-18 13:09:04,503 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3907390.0, ans=0.125 2024-08-18 13:09:04,516 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3907390.0, ans=0.125 2024-08-18 13:09:12,876 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.54 vs. limit=15.0 2024-08-18 13:09:25,052 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 17 from LS+wenet, 36 from Vox, 28 fro AS 2024-08-18 13:09:27,103 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 21 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-18 13:09:28,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3907590.0, ans=0.125 2024-08-18 13:09:43,148 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3907590.0, ans=0.125 2024-08-18 13:09:45,571 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 4150, loss[loss=0.07883, beats_loss=0.01062, ecapa_loss=0.0001448, whisper_loss=0.06676, over 13233.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01045, ecapa_loss=0.0001458, whisper_loss=0.09064, over 3857771.48 frames. ], batch size: 54, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:09:56,946 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 21 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-18 13:10:05,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3907790.0, ans=0.125 2024-08-18 13:10:05,742 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3907790.0, ans=0.125 2024-08-18 13:10:22,958 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.18 vs. limit=15.0 2024-08-18 13:10:29,618 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 27 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-18 13:10:41,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3907990.0, ans=0.0 2024-08-18 13:10:41,703 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.97 vs. limit=10.0 2024-08-18 13:10:44,882 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.199e+01 2.501e+01 2.835e+01 5.919e+01, threshold=5.001e+01, percent-clipped=1.0 2024-08-18 13:10:58,573 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 4200, loss[loss=0.09975, beats_loss=0.01281, ecapa_loss=0.000136, whisper_loss=0.08558, over 22323.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01056, ecapa_loss=0.0001453, whisper_loss=0.09036, over 3893842.42 frames. ], batch size: 91, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:11:02,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3908190.0, ans=0.125 2024-08-18 13:11:10,667 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 35 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-18 13:11:18,181 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 15 from Vox, 50 fro AS 2024-08-18 13:11:33,528 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.16 vs. limit=22.5 2024-08-18 13:11:44,729 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 22 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-18 13:11:51,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3908490.0, ans=0.0 2024-08-18 13:12:06,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3908590.0, ans=0.09899494936611666 2024-08-18 13:12:07,715 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3908590.0, ans=0.0 2024-08-18 13:12:08,513 WARNING [optim.py:496] (2/4) Scaling gradients by 0.06405556201934814, model_norm_threshold=50.014076232910156 2024-08-18 13:12:08,688 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.0.norm.log_scale with proportion 0.31, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.868e+05, grad_sumsq=1.868e+05, orig_rms_sq=1.000e+00 2024-08-18 13:12:14,237 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 4250, loss[loss=0.09094, beats_loss=0.0126, ecapa_loss=0.000142, whisper_loss=0.07693, over 21958.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01057, ecapa_loss=0.0001452, whisper_loss=0.08993, over 3910372.77 frames. ], batch size: 91, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:12:14,935 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3908690.0, ans=0.0 2024-08-18 13:12:27,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3908790.0, ans=0.125 2024-08-18 13:12:28,939 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 31 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-18 13:12:33,848 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3908790.0, ans=0.2 2024-08-18 13:12:34,938 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-18 13:12:44,793 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3908890.0, ans=0.125 2024-08-18 13:12:53,362 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3908890.0, ans=0.1 2024-08-18 13:12:59,069 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3908990.0, ans=0.2 2024-08-18 13:13:17,494 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.252e+01 2.550e+01 2.768e+01 7.808e+02, threshold=5.101e+01, percent-clipped=1.0 2024-08-18 13:13:18,055 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3909090.0, ans=0.125 2024-08-18 13:13:31,598 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 4300, loss[loss=0.1119, beats_loss=0.01034, ecapa_loss=0.0001139, whisper_loss=0.1005, over 19004.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0105, ecapa_loss=0.0001449, whisper_loss=0.08998, over 3909970.11 frames. ], batch size: 69, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:13:50,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3909290.0, ans=0.0 2024-08-18 13:14:27,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3909490.0, ans=0.125 2024-08-18 13:14:34,345 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.50 vs. limit=15.0 2024-08-18 13:14:48,000 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 4350, loss[loss=0.08514, beats_loss=0.01196, ecapa_loss=0.0001457, whisper_loss=0.07172, over 18944.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01047, ecapa_loss=0.0001453, whisper_loss=0.08952, over 3881781.11 frames. ], batch size: 79, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:15:07,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3909790.0, ans=0.0 2024-08-18 13:15:23,132 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3909890.0, ans=0.09899494936611666 2024-08-18 13:15:25,011 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3909890.0, ans=0.125 2024-08-18 13:15:25,045 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3909890.0, ans=0.0 2024-08-18 13:15:28,118 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3909890.0, ans=0.125 2024-08-18 13:15:29,841 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.14 vs. limit=15.0 2024-08-18 13:15:38,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=3909990.0, ans=0.02 2024-08-18 13:15:41,059 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-18 13:15:43,214 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3909990.0, ans=0.1 2024-08-18 13:15:50,487 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3910090.0, ans=0.125 2024-08-18 13:15:51,271 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.872e+01 2.300e+01 2.560e+01 2.936e+01 6.147e+01, threshold=5.120e+01, percent-clipped=1.0 2024-08-18 13:15:52,247 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3910090.0, ans=0.1 2024-08-18 13:15:59,487 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3910090.0, ans=0.125 2024-08-18 13:16:00,922 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3910090.0, ans=0.0 2024-08-18 13:16:05,371 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 4400, loss[loss=0.103, beats_loss=0.01115, ecapa_loss=0.000183, whisper_loss=0.09004, over 22205.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01051, ecapa_loss=0.0001451, whisper_loss=0.08962, over 3886468.92 frames. ], batch size: 92, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:16:26,784 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3910290.0, ans=0.125 2024-08-18 13:16:32,415 WARNING [optim.py:496] (2/4) Scaling gradients by 0.03998252749443054, model_norm_threshold=51.19541549682617 2024-08-18 13:16:32,579 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.24, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.970e+05, grad_sumsq=3.970e+05, orig_rms_sq=1.000e+00 2024-08-18 13:16:37,516 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3910390.0, ans=0.07 2024-08-18 13:16:56,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3910490.0, ans=0.125 2024-08-18 13:17:05,698 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=3910490.0, ans=22.5 2024-08-18 13:17:06,843 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3910490.0, ans=0.125 2024-08-18 13:17:21,582 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-18 13:17:24,188 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 4450, loss[loss=0.1047, beats_loss=0.01125, ecapa_loss=0.0001391, whisper_loss=0.09206, over 23008.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01056, ecapa_loss=0.0001445, whisper_loss=0.08971, over 3863419.30 frames. ], batch size: 91, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:17:36,507 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 26 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-18 13:17:47,034 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.25 vs. limit=15.0 2024-08-18 13:17:53,704 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.99 vs. limit=10.0 2024-08-18 13:17:54,322 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-18 13:17:59,714 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.90 vs. limit=12.0 2024-08-18 13:18:02,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3910890.0, ans=0.1 2024-08-18 13:18:14,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3910990.0, ans=0.0 2024-08-18 13:18:15,869 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-18 13:18:30,116 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.358e+01 2.721e+01 3.082e+01 1.280e+03, threshold=5.441e+01, percent-clipped=5.0 2024-08-18 13:18:32,153 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3911090.0, ans=0.125 2024-08-18 13:18:33,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3911090.0, ans=0.0 2024-08-18 13:18:37,505 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-18 13:18:43,462 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 4500, loss[loss=0.09106, beats_loss=0.01056, ecapa_loss=0.0001519, whisper_loss=0.07898, over 16944.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01059, ecapa_loss=0.0001449, whisper_loss=0.08874, over 3872232.72 frames. ], batch size: 68, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:19:00,396 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 17 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-18 13:19:31,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3911490.0, ans=0.0 2024-08-18 13:19:34,708 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3911490.0, ans=0.2 2024-08-18 13:19:47,267 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 13:19:52,419 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.82 vs. limit=22.5 2024-08-18 13:20:00,653 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 4550, loss[loss=0.07394, beats_loss=0.01345, ecapa_loss=0.000122, whisper_loss=0.05927, over 15842.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01068, ecapa_loss=0.000144, whisper_loss=0.08804, over 3871880.15 frames. ], batch size: 64, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:20:03,743 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-18 13:20:06,481 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.381e+01 2024-08-18 13:20:10,970 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 13:20:23,846 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.09 vs. limit=12.0 2024-08-18 13:20:27,998 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3911790.0, ans=0.125 2024-08-18 13:20:32,188 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3911890.0, ans=0.125 2024-08-18 13:20:41,988 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.77 vs. limit=15.0 2024-08-18 13:20:57,091 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3911990.0, ans=0.0 2024-08-18 13:21:03,601 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.646e+01 2.293e+01 2.530e+01 2.882e+01 1.902e+02, threshold=5.061e+01, percent-clipped=1.0 2024-08-18 13:21:12,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3912090.0, ans=0.1 2024-08-18 13:21:17,465 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 4600, loss[loss=0.09033, beats_loss=0.01043, ecapa_loss=0.0001534, whisper_loss=0.07838, over 21467.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01067, ecapa_loss=0.0001445, whisper_loss=0.0883, over 3917035.93 frames. ], batch size: 86, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:21:22,440 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 16 from LS+wenet, 22 from Vox, 52 fro AS 2024-08-18 13:21:24,118 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3912190.0, ans=0.1 2024-08-18 13:21:44,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=3912290.0, ans=0.1 2024-08-18 13:21:56,098 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 16 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-18 13:21:59,574 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=11.27 vs. limit=12.0 2024-08-18 13:22:33,920 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 4650, loss[loss=0.1037, beats_loss=0.009723, ecapa_loss=0.0001596, whisper_loss=0.09237, over 22052.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01067, ecapa_loss=0.0001446, whisper_loss=0.08822, over 3888318.82 frames. ], batch size: 91, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:22:43,374 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.87 vs. limit=15.0 2024-08-18 13:22:53,187 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 37 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-18 13:22:55,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3912790.0, ans=0.125 2024-08-18 13:22:56,492 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-18 13:23:17,702 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3912890.0, ans=0.04949747468305833 2024-08-18 13:23:28,912 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 19 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-18 13:23:35,606 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.611e+01 2.195e+01 2.467e+01 2.773e+01 3.878e+01, threshold=4.935e+01, percent-clipped=0.0 2024-08-18 13:23:49,350 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 4700, loss[loss=0.07726, beats_loss=0.009737, ecapa_loss=0.0001811, whisper_loss=0.06571, over 15353.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.0107, ecapa_loss=0.0001446, whisper_loss=0.0882, over 3894464.58 frames. ], batch size: 68, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:24:05,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3913290.0, ans=0.125 2024-08-18 13:24:12,462 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-18 13:24:14,935 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3913290.0, ans=0.125 2024-08-18 13:24:21,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3913390.0, ans=0.125 2024-08-18 13:24:56,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3913590.0, ans=0.125 2024-08-18 13:24:56,196 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3913590.0, ans=0.0 2024-08-18 13:25:05,716 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 4750, loss[loss=0.09821, beats_loss=0.008493, ecapa_loss=0.0002181, whisper_loss=0.08753, over 13581.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01063, ecapa_loss=0.0001452, whisper_loss=0.08879, over 3881506.29 frames. ], batch size: 61, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:25:10,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3913690.0, ans=0.0 2024-08-18 13:25:12,037 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=3913690.0, ans=0.02 2024-08-18 13:25:36,784 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.61 vs. limit=15.0 2024-08-18 13:25:40,717 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3913890.0, ans=0.1 2024-08-18 13:25:53,328 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3913990.0, ans=0.125 2024-08-18 13:25:54,396 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 19 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-18 13:25:54,667 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3913990.0, ans=0.125 2024-08-18 13:26:07,654 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.735e+01 2.285e+01 2.505e+01 2.813e+01 4.108e+01, threshold=5.010e+01, percent-clipped=0.0 2024-08-18 13:26:11,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3914090.0, ans=0.125 2024-08-18 13:26:11,355 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3914090.0, ans=0.125 2024-08-18 13:26:17,391 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-18 13:26:20,988 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3914190.0, ans=0.0 2024-08-18 13:26:21,774 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 4800, loss[loss=0.1002, beats_loss=0.01225, ecapa_loss=0.0001048, whisper_loss=0.08689, over 20627.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01063, ecapa_loss=0.0001452, whisper_loss=0.08898, over 3865231.70 frames. ], batch size: 77, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:26:36,464 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3914290.0, ans=0.125 2024-08-18 13:26:44,776 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.37 vs. limit=12.0 2024-08-18 13:26:52,583 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3914390.0, ans=0.09899494936611666 2024-08-18 13:26:58,658 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.57 vs. limit=15.0 2024-08-18 13:27:17,583 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3914490.0, ans=0.2 2024-08-18 13:27:37,906 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 4850, loss[loss=0.09117, beats_loss=0.01019, ecapa_loss=0.0001786, whisper_loss=0.07919, over 17411.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01054, ecapa_loss=0.0001449, whisper_loss=0.0898, over 3855136.39 frames. ], batch size: 72, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:27:40,266 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.51 vs. limit=22.5 2024-08-18 13:27:45,427 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 12 from LS+wenet, 28 from Vox, 23 fro AS 2024-08-18 13:27:54,107 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3914790.0, ans=0.125 2024-08-18 13:27:56,829 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3914790.0, ans=0.125 2024-08-18 13:27:59,276 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-18 13:28:07,996 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3914890.0, ans=10.0 2024-08-18 13:28:10,563 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 13:28:35,923 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.142e+01 2.395e+01 2.645e+01 2.966e+01 4.545e+01, threshold=5.290e+01, percent-clipped=0.0 2024-08-18 13:28:37,532 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3915090.0, ans=0.125 2024-08-18 13:28:47,404 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 16 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-18 13:28:48,463 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 4900, loss[loss=0.09013, beats_loss=0.01204, ecapa_loss=0.0001292, whisper_loss=0.07679, over 14248.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01054, ecapa_loss=0.0001455, whisper_loss=0.08991, over 3851052.81 frames. ], batch size: 57, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:29:14,914 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 16 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-18 13:29:27,929 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3915390.0, ans=0.1 2024-08-18 13:29:33,391 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-18 13:29:38,108 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3915490.0, ans=0.125 2024-08-18 13:29:52,921 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3915590.0, ans=0.125 2024-08-18 13:30:05,164 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 4950, loss[loss=0.1116, beats_loss=0.008499, ecapa_loss=0.0001569, whisper_loss=0.1015, over 22807.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0105, ecapa_loss=0.0001453, whisper_loss=0.09026, over 3864996.86 frames. ], batch size: 89, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:30:06,853 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 16 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-18 13:30:23,035 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3915790.0, ans=0.125 2024-08-18 13:30:27,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3915790.0, ans=0.0 2024-08-18 13:30:27,709 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3915790.0, ans=0.125 2024-08-18 13:30:28,869 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3915790.0, ans=0.125 2024-08-18 13:30:42,857 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3915890.0, ans=0.0 2024-08-18 13:31:08,366 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.718e+01 2.264e+01 2.536e+01 2.797e+01 4.034e+01, threshold=5.071e+01, percent-clipped=0.0 2024-08-18 13:31:08,913 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3916090.0, ans=0.0 2024-08-18 13:31:22,252 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 5000, loss[loss=0.0745, beats_loss=0.01349, ecapa_loss=9.477e-05, whisper_loss=0.06006, over 15839.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01056, ecapa_loss=0.0001448, whisper_loss=0.08969, over 3850578.41 frames. ], batch size: 61, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:31:24,066 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 22 from LS+wenet, 20 from Vox, 50 fro AS 2024-08-18 13:31:45,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3916290.0, ans=0.0 2024-08-18 13:31:49,861 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 30 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-18 13:31:56,066 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 13:32:05,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3916390.0, ans=6.0 2024-08-18 13:32:11,804 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3916490.0, ans=0.125 2024-08-18 13:32:12,149 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3916490.0, ans=0.125 2024-08-18 13:32:20,237 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3916590.0, ans=0.04949747468305833 2024-08-18 13:32:36,514 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 5050, loss[loss=0.1151, beats_loss=0.009169, ecapa_loss=0.0001357, whisper_loss=0.1045, over 19938.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01055, ecapa_loss=0.0001454, whisper_loss=0.09057, over 3891596.78 frames. ], batch size: 77, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:32:44,340 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 22 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-18 13:32:45,046 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3916690.0, ans=0.125 2024-08-18 13:32:49,197 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 13:32:53,066 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3916790.0, ans=0.0 2024-08-18 13:32:54,045 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-18 13:32:57,685 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 13:33:07,192 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3916890.0, ans=0.125 2024-08-18 13:33:20,130 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-18 13:33:24,519 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 32 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-18 13:33:27,692 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 21 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-18 13:33:33,776 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3916990.0, ans=0.0 2024-08-18 13:33:37,478 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.311e+01 2.560e+01 2.884e+01 4.690e+01, threshold=5.121e+01, percent-clipped=0.0 2024-08-18 13:33:46,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3917090.0, ans=0.1 2024-08-18 13:33:47,564 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3917090.0, ans=0.125 2024-08-18 13:33:48,591 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 21 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-18 13:33:50,843 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3917190.0, ans=0.125 2024-08-18 13:33:51,530 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 5100, loss[loss=0.1159, beats_loss=0.01155, ecapa_loss=0.0001071, whisper_loss=0.1032, over 23946.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01053, ecapa_loss=0.0001445, whisper_loss=0.0913, over 3902174.88 frames. ], batch size: 92, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:33:57,018 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.99 vs. limit=22.5 2024-08-18 13:34:00,998 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=3917190.0, ans=0.2 2024-08-18 13:34:08,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3917290.0, ans=6.0 2024-08-18 13:34:13,692 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-18 13:34:26,082 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-18 13:34:29,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3917390.0, ans=0.0 2024-08-18 13:34:29,326 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3917390.0, ans=0.09899494936611666 2024-08-18 13:34:35,193 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-18 13:34:43,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3917490.0, ans=0.125 2024-08-18 13:34:54,785 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.95 vs. limit=15.0 2024-08-18 13:34:55,906 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.40 vs. limit=15.0 2024-08-18 13:35:07,226 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 5150, loss[loss=0.07385, beats_loss=0.01153, ecapa_loss=0.0001923, whisper_loss=0.0604, over 17181.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01053, ecapa_loss=0.0001455, whisper_loss=0.09109, over 3886786.58 frames. ], batch size: 73, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:35:18,037 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3917690.0, ans=0.1 2024-08-18 13:35:31,385 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3917790.0, ans=0.125 2024-08-18 13:35:38,395 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-18 13:35:39,680 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 13 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-18 13:36:08,171 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.267e+01 2.541e+01 2.830e+01 4.847e+01, threshold=5.083e+01, percent-clipped=0.0 2024-08-18 13:36:08,379 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-18 13:36:21,730 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 5200, loss[loss=0.1129, beats_loss=0.008465, ecapa_loss=0.0001442, whisper_loss=0.103, over 23313.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01057, ecapa_loss=0.0001442, whisper_loss=0.09067, over 3916284.21 frames. ], batch size: 93, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:36:33,960 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3918190.0, ans=0.125 2024-08-18 13:36:36,215 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3918190.0, ans=0.0 2024-08-18 13:36:37,202 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-18 13:36:38,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3918290.0, ans=0.125 2024-08-18 13:36:47,784 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3918290.0, ans=0.025 2024-08-18 13:36:50,474 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-18 13:36:54,704 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3918390.0, ans=0.2 2024-08-18 13:36:54,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3918390.0, ans=0.125 2024-08-18 13:37:31,120 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3918590.0, ans=0.0 2024-08-18 13:37:40,838 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 5250, loss[loss=0.08297, beats_loss=0.009873, ecapa_loss=0.0001273, whisper_loss=0.07182, over 22121.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01058, ecapa_loss=0.0001437, whisper_loss=0.09065, over 3928342.47 frames. ], batch size: 89, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:37:48,994 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3918690.0, ans=0.125 2024-08-18 13:38:04,411 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3918790.0, ans=0.0 2024-08-18 13:38:05,684 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.066e-03 2024-08-18 13:38:09,817 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 24 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-18 13:38:15,673 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3918890.0, ans=0.125 2024-08-18 13:38:41,723 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3919090.0, ans=0.0 2024-08-18 13:38:42,508 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.766e+01 2.331e+01 2.617e+01 2.849e+01 4.827e+01, threshold=5.234e+01, percent-clipped=0.0 2024-08-18 13:38:44,403 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-18 13:38:55,912 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 5300, loss[loss=0.08514, beats_loss=0.01222, ecapa_loss=0.0001047, whisper_loss=0.07187, over 15308.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01061, ecapa_loss=0.0001439, whisper_loss=0.08983, over 3892908.53 frames. ], batch size: 59, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:39:08,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3919190.0, ans=0.0 2024-08-18 13:39:33,304 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 27 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-18 13:39:34,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3919390.0, ans=0.125 2024-08-18 13:39:37,290 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=3919390.0, ans=0.1 2024-08-18 13:39:44,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3919490.0, ans=0.125 2024-08-18 13:39:57,465 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 17 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-18 13:40:09,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3919590.0, ans=0.125 2024-08-18 13:40:13,488 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 5350, loss[loss=0.1034, beats_loss=0.008875, ecapa_loss=0.000176, whisper_loss=0.09281, over 17624.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01061, ecapa_loss=0.0001427, whisper_loss=0.0896, over 3855113.88 frames. ], batch size: 73, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:40:21,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3919690.0, ans=0.2 2024-08-18 13:40:29,639 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 24 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-18 13:40:36,267 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.67 vs. limit=12.0 2024-08-18 13:40:52,970 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 26 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-18 13:40:53,309 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3919890.0, ans=0.0 2024-08-18 13:40:53,462 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.09 vs. limit=15.0 2024-08-18 13:41:01,107 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-18 13:41:06,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3919990.0, ans=0.0 2024-08-18 13:41:14,886 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.240e+01 2.441e+01 2.747e+01 4.165e+01, threshold=4.882e+01, percent-clipped=0.0 2024-08-18 13:41:21,132 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3920090.0, ans=0.125 2024-08-18 13:41:27,487 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 5400, loss[loss=0.1177, beats_loss=0.007272, ecapa_loss=0.0001678, whisper_loss=0.1087, over 22010.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01065, ecapa_loss=0.0001426, whisper_loss=0.08959, over 3858039.56 frames. ], batch size: 88, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:41:36,734 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3920190.0, ans=0.0 2024-08-18 13:41:54,136 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-18 13:41:56,418 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-18 13:42:02,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3920390.0, ans=0.125 2024-08-18 13:42:04,865 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.50 vs. limit=15.0 2024-08-18 13:42:10,063 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 27 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-18 13:42:18,316 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3920490.0, ans=0.04949747468305833 2024-08-18 13:42:27,058 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.22 vs. limit=12.0 2024-08-18 13:42:36,645 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 5450, loss[loss=0.1164, beats_loss=0.01187, ecapa_loss=0.0001412, whisper_loss=0.1031, over 20564.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01067, ecapa_loss=0.0001425, whisper_loss=0.08987, over 3839837.74 frames. ], batch size: 84, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:42:41,031 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 31 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-18 13:42:44,259 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.64 vs. limit=15.0 2024-08-18 13:42:44,965 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 21 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-18 13:42:55,297 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 13:43:02,062 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 26 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-18 13:43:11,671 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3920890.0, ans=0.0 2024-08-18 13:43:21,699 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3920990.0, ans=0.125 2024-08-18 13:43:34,114 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.277e+01 2.488e+01 2.860e+01 4.810e+01, threshold=4.975e+01, percent-clipped=0.0 2024-08-18 13:43:35,808 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 35 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-18 13:43:37,007 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-18 13:43:48,620 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 5500, loss[loss=0.08783, beats_loss=0.0144, ecapa_loss=0.0001268, whisper_loss=0.07216, over 22016.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01066, ecapa_loss=0.0001423, whisper_loss=0.08983, over 3861762.84 frames. ], batch size: 92, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:44:24,462 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-18 13:44:35,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3921490.0, ans=0.0 2024-08-18 13:44:37,186 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.66 vs. limit=12.0 2024-08-18 13:44:37,276 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-08-18 13:44:41,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3921490.0, ans=0.0 2024-08-18 13:44:50,722 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 16 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-18 13:44:54,357 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.41 vs. limit=15.0 2024-08-18 13:45:00,890 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-18 13:45:04,025 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 5550, loss[loss=0.1005, beats_loss=0.009407, ecapa_loss=0.0001306, whisper_loss=0.08976, over 23413.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01068, ecapa_loss=0.0001427, whisper_loss=0.08955, over 3865530.16 frames. ], batch size: 93, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:45:16,867 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-18 13:45:19,864 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.47 vs. limit=10.0 2024-08-18 13:45:24,532 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.93 vs. limit=22.5 2024-08-18 13:45:25,489 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3921790.0, ans=0.125 2024-08-18 13:45:42,429 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 23 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-18 13:45:51,521 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.74 vs. limit=22.5 2024-08-18 13:45:52,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3921990.0, ans=10.0 2024-08-18 13:45:52,604 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3921990.0, ans=0.0 2024-08-18 13:46:03,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3921990.0, ans=0.1 2024-08-18 13:46:11,125 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.872e+01 2.360e+01 2.583e+01 2.976e+01 1.161e+02, threshold=5.166e+01, percent-clipped=2.0 2024-08-18 13:46:13,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3922090.0, ans=0.125 2024-08-18 13:46:25,034 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 5600, loss[loss=0.1032, beats_loss=0.0111, ecapa_loss=0.0001569, whisper_loss=0.09048, over 22486.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01062, ecapa_loss=0.0001429, whisper_loss=0.09053, over 3884891.80 frames. ], batch size: 94, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 13:46:27,979 WARNING [optim.py:496] (2/4) Scaling gradients by 0.0688062459230423, model_norm_threshold=51.66341781616211 2024-08-18 13:46:28,145 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.17, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=9.409e+04, grad_sumsq=9.409e+04, orig_rms_sq=1.000e+00 2024-08-18 13:46:41,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3922290.0, ans=0.0 2024-08-18 13:46:56,379 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3922390.0, ans=0.125 2024-08-18 13:47:19,519 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.10 vs. limit=12.0 2024-08-18 13:47:40,546 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 5650, loss[loss=0.07907, beats_loss=0.01232, ecapa_loss=0.0001627, whisper_loss=0.06512, over 18180.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01067, ecapa_loss=0.0001435, whisper_loss=0.08957, over 3897152.07 frames. ], batch size: 79, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 13:47:58,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3922790.0, ans=0.1 2024-08-18 13:48:19,363 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3922890.0, ans=0.1 2024-08-18 13:48:34,331 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3922990.0, ans=0.125 2024-08-18 13:48:43,176 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 41 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-18 13:48:43,356 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3923090.0, ans=0.1 2024-08-18 13:48:45,597 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.965e+01 2.361e+01 2.628e+01 2.997e+01 7.509e+02, threshold=5.255e+01, percent-clipped=3.0 2024-08-18 13:48:46,955 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-18 13:48:57,924 WARNING [optim.py:496] (2/4) Scaling gradients by 0.08530262112617493, model_norm_threshold=52.552433013916016 2024-08-18 13:48:58,102 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.1.norm.log_scale with proportion 0.15, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.807e+04, grad_sumsq=5.807e+04, orig_rms_sq=1.000e+00 2024-08-18 13:48:58,130 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 5700, loss[loss=0.1257, beats_loss=0.04584, ecapa_loss=0.0001442, whisper_loss=0.07841, over 22288.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01085, ecapa_loss=0.0001445, whisper_loss=0.08951, over 3897084.66 frames. ], batch size: 89, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 13:49:01,158 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 22 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-18 13:49:14,267 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-18 13:49:24,312 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.58 vs. limit=22.5 2024-08-18 13:49:25,377 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3923290.0, ans=0.0 2024-08-18 13:49:30,768 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3923390.0, ans=0.125 2024-08-18 13:49:37,612 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-18 13:49:40,206 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3923390.0, ans=0.125 2024-08-18 13:49:52,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3923490.0, ans=0.05 2024-08-18 13:49:53,990 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-18 13:50:15,615 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 5750, loss[loss=0.08483, beats_loss=0.01031, ecapa_loss=0.0001729, whisper_loss=0.07279, over 18444.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01087, ecapa_loss=0.0001436, whisper_loss=0.08918, over 3927197.19 frames. ], batch size: 77, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 13:50:24,654 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.73 vs. limit=22.5 2024-08-18 13:50:32,697 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.77 vs. limit=15.0 2024-08-18 13:50:44,162 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-18 13:50:44,756 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3923890.0, ans=0.125 2024-08-18 13:50:56,217 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3923890.0, ans=0.0 2024-08-18 13:50:57,927 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.15 vs. limit=15.0 2024-08-18 13:51:14,344 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3924090.0, ans=0.125 2024-08-18 13:51:16,768 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.280e+01 2.579e+01 2.799e+01 6.161e+02, threshold=5.158e+01, percent-clipped=2.0 2024-08-18 13:51:19,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3924090.0, ans=0.125 2024-08-18 13:51:23,521 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-18 13:51:26,082 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 21 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-18 13:51:26,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3924090.0, ans=0.125 2024-08-18 13:51:28,389 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 5800, loss[loss=0.1009, beats_loss=0.01141, ecapa_loss=0.0001529, whisper_loss=0.08794, over 21661.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01079, ecapa_loss=0.0001441, whisper_loss=0.08978, over 3900973.67 frames. ], batch size: 90, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 13:51:40,351 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.91 vs. limit=15.0 2024-08-18 13:51:55,329 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.56 vs. limit=22.5 2024-08-18 13:52:06,566 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.91 vs. limit=22.5 2024-08-18 13:52:13,769 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.95 vs. limit=8.0 2024-08-18 13:52:23,178 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-18 13:52:33,228 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 12 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-18 13:52:35,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3924590.0, ans=0.125 2024-08-18 13:52:37,016 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.54 vs. limit=22.5 2024-08-18 13:52:40,997 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3924690.0, ans=0.2 2024-08-18 13:52:41,900 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 5850, loss[loss=0.132, beats_loss=0.008078, ecapa_loss=0.0001195, whisper_loss=0.1227, over 17823.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01069, ecapa_loss=0.0001444, whisper_loss=0.08975, over 3894493.09 frames. ], batch size: 63, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 13:52:42,362 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3924690.0, ans=0.0 2024-08-18 13:52:42,369 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3924690.0, ans=0.0 2024-08-18 13:52:49,395 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3924690.0, ans=0.04949747468305833 2024-08-18 13:52:50,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3924690.0, ans=0.125 2024-08-18 13:53:03,760 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3924790.0, ans=0.125 2024-08-18 13:53:33,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3924990.0, ans=0.0 2024-08-18 13:53:45,906 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.710e+01 2.307e+01 2.558e+01 2.890e+01 4.953e+01, threshold=5.117e+01, percent-clipped=0.0 2024-08-18 13:53:59,006 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-18 13:54:00,992 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 5900, loss[loss=0.106, beats_loss=0.00974, ecapa_loss=0.0001506, whisper_loss=0.09475, over 16205.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01069, ecapa_loss=0.0001434, whisper_loss=0.08934, over 3894413.70 frames. ], batch size: 63, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 13:54:13,757 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-18 13:55:03,960 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3925590.0, ans=0.125 2024-08-18 13:55:08,601 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 30 from LS+wenet, 13 from Vox, 43 fro AS 2024-08-18 13:55:20,947 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 5950, loss[loss=0.09591, beats_loss=0.009825, ecapa_loss=0.0001547, whisper_loss=0.08454, over 20640.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0107, ecapa_loss=0.0001432, whisper_loss=0.08917, over 3892955.16 frames. ], batch size: 83, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 13:55:37,343 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3925790.0, ans=0.125 2024-08-18 13:55:55,604 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3925890.0, ans=0.0 2024-08-18 13:56:03,477 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 19 from LS+wenet, 12 from Vox, 39 fro AS 2024-08-18 13:56:13,705 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.55 vs. limit=10.0 2024-08-18 13:56:19,048 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3925990.0, ans=0.0 2024-08-18 13:56:24,957 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.637e+01 2.192e+01 2.474e+01 2.892e+01 4.028e+01, threshold=4.947e+01, percent-clipped=0.0 2024-08-18 13:56:25,683 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3926090.0, ans=0.0 2024-08-18 13:56:38,133 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 6000, loss[loss=0.1057, beats_loss=0.01152, ecapa_loss=0.0001532, whisper_loss=0.09266, over 22171.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0107, ecapa_loss=0.0001438, whisper_loss=0.08919, over 3891110.78 frames. ], batch size: 90, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 13:56:38,134 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-18 13:57:15,571 INFO [train_multi_KD3.py:1149] (2/4) Epoch 27, validation on ASR_libri: loss=0.2529, beats_loss=0, ecapa_loss=0.0005218, whisper_loss=0.2477, over 922467.00 frames. 2024-08-18 13:57:34,386 INFO [train_multi_KD3.py:1149] (2/4) Epoch 27, validation on SV_voxceleb1: loss=0.004081, beats_loss=0, ecapa_loss=0.0004081, whisper_loss=0, over 939242.00 frames. 2024-08-18 13:58:26,740 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.3625, 4.1308, 4.2642, 4.3215], device='cuda:2') 2024-08-18 13:59:16,622 INFO [train_multi_KD3.py:1149] (2/4) Epoch 27, validation on AT_audioset: loss=0.02317, beats_loss=0.02317, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 13:59:16,627 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31584MB 2024-08-18 13:59:26,048 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.75 vs. limit=15.0 2024-08-18 13:59:30,851 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.87 vs. limit=15.0 2024-08-18 13:59:58,817 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=15.17 vs. limit=15.0 2024-08-18 14:00:32,464 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 20 from LS+wenet, 32 from Vox, 35 fro AS 2024-08-18 14:00:33,929 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 6050, loss[loss=0.09043, beats_loss=0.01106, ecapa_loss=0.000178, whisper_loss=0.07759, over 19972.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0105, ecapa_loss=0.0001439, whisper_loss=0.09082, over 3924129.49 frames. ], batch size: 87, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 14:00:36,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3926690.0, ans=0.125 2024-08-18 14:00:36,748 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3926690.0, ans=0.125 2024-08-18 14:00:56,407 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 24 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-18 14:01:04,708 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 17 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-18 14:01:12,014 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3926890.0, ans=0.125 2024-08-18 14:01:12,308 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2024-08-18 14:01:21,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3926990.0, ans=0.0 2024-08-18 14:01:38,795 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.321e+01 2.592e+01 2.870e+01 3.846e+01, threshold=5.183e+01, percent-clipped=0.0 2024-08-18 14:01:41,066 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 14 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-18 14:01:44,514 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 29 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-18 14:01:53,344 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 6100, loss[loss=0.09916, beats_loss=0.008337, ecapa_loss=0.0001898, whisper_loss=0.08892, over 15356.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01055, ecapa_loss=0.0001437, whisper_loss=0.09038, over 3901910.06 frames. ], batch size: 63, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 14:02:10,168 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3927290.0, ans=0.125 2024-08-18 14:02:11,064 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3927290.0, ans=0.125 2024-08-18 14:02:15,464 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.67 vs. limit=15.0 2024-08-18 14:02:31,493 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3927390.0, ans=0.0 2024-08-18 14:02:45,903 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.34 vs. limit=22.5 2024-08-18 14:03:01,978 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 33 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-18 14:03:10,623 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 6150, loss[loss=0.09382, beats_loss=0.01237, ecapa_loss=0.0001233, whisper_loss=0.08022, over 21746.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01058, ecapa_loss=0.0001443, whisper_loss=0.09064, over 3907482.07 frames. ], batch size: 88, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 14:03:28,514 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.67 vs. limit=15.0 2024-08-18 14:03:36,714 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-18 14:03:38,195 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3927790.0, ans=0.025 2024-08-18 14:03:54,861 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3927990.0, ans=0.0 2024-08-18 14:03:54,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3927990.0, ans=0.125 2024-08-18 14:04:11,606 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.379e+01 2.632e+01 2.794e+01 5.915e+01, threshold=5.263e+01, percent-clipped=2.0 2024-08-18 14:04:12,449 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.64 vs. limit=15.0 2024-08-18 14:04:14,605 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 22 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-18 14:04:24,778 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 6200, loss[loss=0.07726, beats_loss=0.01301, ecapa_loss=0.0001013, whisper_loss=0.06324, over 15836.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01066, ecapa_loss=0.0001439, whisper_loss=0.08967, over 3906966.50 frames. ], batch size: 64, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 14:04:50,123 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-18 14:04:59,092 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3928390.0, ans=0.2 2024-08-18 14:05:00,105 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 35 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-18 14:05:07,290 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.48 vs. limit=15.0 2024-08-18 14:05:29,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3928590.0, ans=0.1 2024-08-18 14:05:31,141 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3928590.0, ans=0.125 2024-08-18 14:05:31,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3928590.0, ans=0.0 2024-08-18 14:05:35,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3928590.0, ans=0.125 2024-08-18 14:05:41,404 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 18 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-18 14:05:43,450 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 6250, loss[loss=0.08341, beats_loss=0.01079, ecapa_loss=0.000183, whisper_loss=0.07079, over 17028.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01058, ecapa_loss=0.0001443, whisper_loss=0.08984, over 3911736.50 frames. ], batch size: 74, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 14:06:05,189 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-18 14:06:14,963 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 18 from LS+wenet, 27 from Vox, 46 fro AS 2024-08-18 14:06:27,200 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3928890.0, ans=0.1 2024-08-18 14:06:27,492 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.18 vs. limit=15.0 2024-08-18 14:06:27,520 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.30 vs. limit=15.0 2024-08-18 14:06:42,398 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-18 14:06:46,535 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.713e+01 2.311e+01 2.533e+01 2.797e+01 1.821e+02, threshold=5.065e+01, percent-clipped=2.0 2024-08-18 14:06:48,311 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3929090.0, ans=0.125 2024-08-18 14:06:49,149 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 20 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-18 14:06:57,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3929090.0, ans=0.125 2024-08-18 14:06:59,982 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 6300, loss[loss=0.1144, beats_loss=0.01058, ecapa_loss=0.0001448, whisper_loss=0.1024, over 21474.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01061, ecapa_loss=0.0001437, whisper_loss=0.09046, over 3898809.55 frames. ], batch size: 87, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 14:07:17,648 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.04 vs. limit=15.0 2024-08-18 14:07:38,387 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-18 14:07:42,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3929390.0, ans=0.0 2024-08-18 14:07:51,997 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 26 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-18 14:08:15,684 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 6350, loss[loss=0.115, beats_loss=0.009132, ecapa_loss=0.0001506, whisper_loss=0.1043, over 20277.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01061, ecapa_loss=0.0001449, whisper_loss=0.08988, over 3874073.66 frames. ], batch size: 81, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 14:08:19,541 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 18 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-18 14:08:25,342 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-18 14:08:26,398 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.07 vs. limit=22.5 2024-08-18 14:08:27,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3929690.0, ans=0.125 2024-08-18 14:08:30,856 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 23 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-18 14:08:34,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3929790.0, ans=0.125 2024-08-18 14:08:41,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3929790.0, ans=0.2 2024-08-18 14:09:01,846 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3929990.0, ans=0.125 2024-08-18 14:09:19,617 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.703e+01 2.236e+01 2.432e+01 2.687e+01 3.502e+01, threshold=4.864e+01, percent-clipped=0.0 2024-08-18 14:09:30,310 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3930090.0, ans=0.0 2024-08-18 14:09:32,179 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 6400, loss[loss=0.1222, beats_loss=0.009761, ecapa_loss=0.0001255, whisper_loss=0.1112, over 22238.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01066, ecapa_loss=0.0001433, whisper_loss=0.08986, over 3884509.48 frames. ], batch size: 83, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 14:09:38,883 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 21 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-18 14:09:47,564 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.95 vs. limit=15.0 2024-08-18 14:09:50,286 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.72 vs. limit=15.0 2024-08-18 14:09:56,784 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 15 from Vox, 48 fro AS 2024-08-18 14:09:59,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3930290.0, ans=0.0 2024-08-18 14:10:07,328 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.53 vs. limit=15.0 2024-08-18 14:10:10,130 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3930390.0, ans=0.125 2024-08-18 14:10:12,766 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 24 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-18 14:10:42,600 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-18 14:10:45,190 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 6450, loss[loss=0.101, beats_loss=0.01008, ecapa_loss=0.0001617, whisper_loss=0.08927, over 23007.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0106, ecapa_loss=0.0001438, whisper_loss=0.09067, over 3918435.85 frames. ], batch size: 93, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 14:10:48,302 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3930690.0, ans=0.0 2024-08-18 14:10:58,967 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3930790.0, ans=0.125 2024-08-18 14:11:05,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3930790.0, ans=0.0 2024-08-18 14:11:08,492 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.14 vs. limit=15.0 2024-08-18 14:11:10,561 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.25 vs. limit=15.0 2024-08-18 14:11:11,394 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 23 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-18 14:11:14,238 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 24 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-18 14:11:18,207 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.48 vs. limit=22.5 2024-08-18 14:11:23,519 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3930990.0, ans=0.2 2024-08-18 14:11:25,445 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 24 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-18 14:11:34,994 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.20 vs. limit=15.0 2024-08-18 14:11:37,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3931090.0, ans=0.0 2024-08-18 14:11:37,733 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.963e+01 2.379e+01 2.625e+01 2.941e+01 1.011e+02, threshold=5.251e+01, percent-clipped=1.0 2024-08-18 14:11:48,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3931190.0, ans=0.2 2024-08-18 14:11:49,624 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 6500, loss[loss=0.1005, beats_loss=0.0108, ecapa_loss=0.000153, whisper_loss=0.08819, over 18101.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0105, ecapa_loss=0.0001448, whisper_loss=0.09179, over 3965890.60 frames. ], batch size: 77, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 14:11:55,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3931190.0, ans=0.0 2024-08-18 14:12:05,731 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3931290.0, ans=0.125 2024-08-18 14:12:08,484 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.31 vs. limit=15.0 2024-08-18 14:12:17,789 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 19 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-18 14:12:27,768 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.66 vs. limit=15.0 2024-08-18 14:12:34,464 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3931490.0, ans=0.125 2024-08-18 14:12:47,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3931590.0, ans=0.1 2024-08-18 14:12:52,621 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 6550, loss[loss=0.1036, beats_loss=0.01076, ecapa_loss=0.000157, whisper_loss=0.09126, over 21667.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01051, ecapa_loss=0.0001445, whisper_loss=0.09148, over 3979770.58 frames. ], batch size: 88, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:13:05,420 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3931790.0, ans=0.125 2024-08-18 14:13:07,798 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3931790.0, ans=0.2 2024-08-18 14:13:17,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3931890.0, ans=0.125 2024-08-18 14:13:18,952 WARNING [optim.py:496] (2/4) Scaling gradients by 0.08532015979290009, model_norm_threshold=52.50708770751953 2024-08-18 14:13:19,129 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.11, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.293e+04, grad_sumsq=4.293e+04, orig_rms_sq=1.000e+00 2024-08-18 14:13:40,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3931990.0, ans=0.1 2024-08-18 14:13:45,292 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.883e+01 2.337e+01 2.574e+01 2.944e+01 6.154e+02, threshold=5.148e+01, percent-clipped=1.0 2024-08-18 14:13:55,097 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 6600, loss[loss=0.09007, beats_loss=0.01201, ecapa_loss=0.0001421, whisper_loss=0.07664, over 21666.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01058, ecapa_loss=0.0001449, whisper_loss=0.0914, over 3978867.90 frames. ], batch size: 92, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:13:56,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3932190.0, ans=0.125 2024-08-18 14:14:16,484 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3932290.0, ans=0.2 2024-08-18 14:14:22,224 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-18 14:14:40,928 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3932490.0, ans=0.125 2024-08-18 14:14:40,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3932490.0, ans=0.025 2024-08-18 14:14:53,011 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 24 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-18 14:14:56,461 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 6650, loss[loss=0.1157, beats_loss=0.009957, ecapa_loss=0.0001643, whisper_loss=0.1041, over 22875.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01055, ecapa_loss=0.0001452, whisper_loss=0.09127, over 3975560.78 frames. ], batch size: 92, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:14:56,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3932690.0, ans=0.125 2024-08-18 14:15:06,611 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3932690.0, ans=0.125 2024-08-18 14:15:14,335 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3932790.0, ans=0.125 2024-08-18 14:15:19,839 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.68 vs. limit=10.0 2024-08-18 14:15:21,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3932890.0, ans=0.0 2024-08-18 14:15:39,563 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3932990.0, ans=0.0 2024-08-18 14:15:45,142 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 26 from Vox, 23 fro AS 2024-08-18 14:15:49,082 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.331e+01 2.674e+01 2.933e+01 1.002e+02, threshold=5.348e+01, percent-clipped=1.0 2024-08-18 14:15:54,130 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-18 14:15:58,806 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 6700, loss[loss=0.08629, beats_loss=0.008128, ecapa_loss=0.0002099, whisper_loss=0.07606, over 15507.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01046, ecapa_loss=0.000145, whisper_loss=0.09127, over 3937782.66 frames. ], batch size: 65, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:16:04,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3933190.0, ans=0.125 2024-08-18 14:16:05,271 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 17 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-18 14:16:06,135 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3933190.0, ans=0.0 2024-08-18 14:16:11,996 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 23 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-18 14:16:15,194 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.46 vs. limit=22.5 2024-08-18 14:16:15,875 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-18 14:16:23,597 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3933390.0, ans=0.125 2024-08-18 14:16:37,292 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-18 14:16:39,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3933490.0, ans=0.1 2024-08-18 14:16:40,434 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.71 vs. limit=15.0 2024-08-18 14:17:02,441 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 6750, loss[loss=0.09618, beats_loss=0.01278, ecapa_loss=0.0001463, whisper_loss=0.08194, over 22654.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01052, ecapa_loss=0.0001446, whisper_loss=0.09162, over 3947569.98 frames. ], batch size: 91, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:17:02,806 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 14:17:11,704 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-18 14:17:19,481 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.34 vs. limit=15.0 2024-08-18 14:17:29,390 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3933890.0, ans=0.0 2024-08-18 14:17:31,498 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 13 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-18 14:17:33,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3933890.0, ans=0.0 2024-08-18 14:17:44,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3933990.0, ans=0.1 2024-08-18 14:17:51,072 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.90 vs. limit=10.0 2024-08-18 14:17:55,339 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.453e+01 2.659e+01 2.920e+01 3.778e+02, threshold=5.318e+01, percent-clipped=4.0 2024-08-18 14:18:05,719 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 6800, loss[loss=0.1016, beats_loss=0.01073, ecapa_loss=0.0001776, whisper_loss=0.08905, over 18627.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01049, ecapa_loss=0.0001454, whisper_loss=0.09142, over 3903116.13 frames. ], batch size: 80, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:18:07,755 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.25 vs. limit=6.0 2024-08-18 14:18:10,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3934190.0, ans=0.0 2024-08-18 14:18:26,141 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3934290.0, ans=0.1 2024-08-18 14:18:30,773 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-18 14:18:34,966 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3934390.0, ans=0.125 2024-08-18 14:18:34,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3934390.0, ans=10.0 2024-08-18 14:18:39,868 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 11 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-18 14:18:41,021 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 17 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-18 14:18:42,687 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3934490.0, ans=0.125 2024-08-18 14:18:50,622 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3934490.0, ans=0.0 2024-08-18 14:19:03,433 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 19 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-18 14:19:06,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3934590.0, ans=0.125 2024-08-18 14:19:09,319 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 6850, loss[loss=0.1243, beats_loss=0.009477, ecapa_loss=0.0001599, whisper_loss=0.1133, over 22191.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01056, ecapa_loss=0.0001448, whisper_loss=0.09058, over 3911859.93 frames. ], batch size: 89, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:19:24,220 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.81 vs. limit=22.5 2024-08-18 14:19:37,190 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3934890.0, ans=0.1 2024-08-18 14:19:40,565 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-18 14:19:59,352 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.41 vs. limit=22.5 2024-08-18 14:20:03,731 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.935e+01 2.389e+01 2.625e+01 3.075e+01 4.351e+02, threshold=5.250e+01, percent-clipped=2.0 2024-08-18 14:20:10,439 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-18 14:20:14,240 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 6900, loss[loss=0.1235, beats_loss=0.007981, ecapa_loss=0.0001765, whisper_loss=0.1138, over 14986.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01058, ecapa_loss=0.0001452, whisper_loss=0.09038, over 3894663.59 frames. ], batch size: 57, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:20:23,823 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3935190.0, ans=0.5 2024-08-18 14:20:28,081 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-18 14:20:36,035 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3935290.0, ans=0.0 2024-08-18 14:20:46,660 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3935390.0, ans=0.0 2024-08-18 14:21:00,277 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3935490.0, ans=0.1 2024-08-18 14:21:04,515 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.88 vs. limit=22.5 2024-08-18 14:21:17,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3935590.0, ans=0.125 2024-08-18 14:21:19,065 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 6950, loss[loss=0.1072, beats_loss=0.01039, ecapa_loss=0.0001378, whisper_loss=0.09542, over 22274.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01075, ecapa_loss=0.0001439, whisper_loss=0.08977, over 3896289.21 frames. ], batch size: 88, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:21:22,357 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3935690.0, ans=0.2 2024-08-18 14:21:36,726 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 15 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-18 14:22:12,457 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.310e+01 2.521e+01 2.778e+01 4.175e+02, threshold=5.041e+01, percent-clipped=1.0 2024-08-18 14:22:19,179 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 29 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-18 14:22:23,225 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 7000, loss[loss=0.11, beats_loss=0.01049, ecapa_loss=0.0001429, whisper_loss=0.09811, over 23359.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01059, ecapa_loss=0.0001438, whisper_loss=0.08993, over 3863804.30 frames. ], batch size: 94, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:22:23,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3936190.0, ans=0.0 2024-08-18 14:22:38,947 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-18 14:22:45,732 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3936290.0, ans=0.1 2024-08-18 14:22:47,795 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 21 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-18 14:22:50,793 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3936390.0, ans=0.1 2024-08-18 14:23:00,672 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 16 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-18 14:23:14,214 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 20 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-18 14:23:15,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3936590.0, ans=0.125 2024-08-18 14:23:16,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3936590.0, ans=0.125 2024-08-18 14:23:18,354 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.494e+01 2024-08-18 14:23:25,409 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 7050, loss[loss=0.09388, beats_loss=0.01178, ecapa_loss=0.0001176, whisper_loss=0.08093, over 21541.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01056, ecapa_loss=0.0001455, whisper_loss=0.09049, over 3854307.19 frames. ], batch size: 85, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:23:51,041 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 24 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-18 14:23:54,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3936890.0, ans=0.125 2024-08-18 14:24:12,689 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3936990.0, ans=0.2 2024-08-18 14:24:18,936 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.686e+01 2.218e+01 2.427e+01 2.693e+01 4.080e+01, threshold=4.854e+01, percent-clipped=0.0 2024-08-18 14:24:28,593 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 7100, loss[loss=0.1014, beats_loss=0.01098, ecapa_loss=0.0001255, whisper_loss=0.08912, over 22127.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01063, ecapa_loss=0.0001439, whisper_loss=0.08977, over 3868326.94 frames. ], batch size: 86, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:24:28,984 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3937190.0, ans=0.07 2024-08-18 14:24:31,158 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 17 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-18 14:24:32,405 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 29 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-18 14:24:46,075 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-18 14:25:00,679 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 27 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-18 14:25:06,230 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.17 vs. limit=15.0 2024-08-18 14:25:06,965 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 16 from Vox, 48 fro AS 2024-08-18 14:25:19,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=3937590.0, ans=0.02 2024-08-18 14:25:20,799 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3937590.0, ans=0.125 2024-08-18 14:25:25,354 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 15 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-18 14:25:25,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3937590.0, ans=0.0 2024-08-18 14:25:29,654 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3937690.0, ans=0.0 2024-08-18 14:25:30,498 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 7150, loss[loss=0.1196, beats_loss=0.00768, ecapa_loss=0.0001591, whisper_loss=0.1104, over 20930.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01061, ecapa_loss=0.0001428, whisper_loss=0.09011, over 3887838.13 frames. ], batch size: 82, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:25:30,961 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 14:25:38,087 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3937690.0, ans=0.125 2024-08-18 14:26:10,734 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-18 14:26:22,363 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.289e+01 2.542e+01 2.748e+01 4.524e+01, threshold=5.084e+01, percent-clipped=0.0 2024-08-18 14:26:22,966 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3938090.0, ans=0.125 2024-08-18 14:26:31,610 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3938190.0, ans=0.0 2024-08-18 14:26:32,331 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 7200, loss[loss=0.08234, beats_loss=0.01507, ecapa_loss=0.0001284, whisper_loss=0.06599, over 22150.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01057, ecapa_loss=0.0001419, whisper_loss=0.09096, over 3933995.92 frames. ], batch size: 94, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:26:35,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3938190.0, ans=0.0 2024-08-18 14:26:43,652 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3938290.0, ans=0.0 2024-08-18 14:26:43,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3938290.0, ans=0.125 2024-08-18 14:26:48,423 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-18 14:26:48,747 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3938290.0, ans=0.125 2024-08-18 14:26:53,286 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 17 from LS+wenet, 36 from Vox, 42 fro AS 2024-08-18 14:26:58,089 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 14:26:59,928 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2024-08-18 14:27:27,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3938590.0, ans=0.125 2024-08-18 14:27:29,677 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.98 vs. limit=10.0 2024-08-18 14:27:33,604 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 7250, loss[loss=0.09945, beats_loss=0.01044, ecapa_loss=0.0001129, whisper_loss=0.08788, over 21215.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01048, ecapa_loss=0.0001433, whisper_loss=0.09082, over 3930128.06 frames. ], batch size: 80, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:27:36,410 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 28 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-18 14:27:37,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3938690.0, ans=0.0 2024-08-18 14:27:47,945 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3938790.0, ans=0.125 2024-08-18 14:27:54,348 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.67 vs. limit=15.0 2024-08-18 14:28:00,207 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 30 from LS+wenet, 16 from Vox, 49 fro AS 2024-08-18 14:28:02,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3938890.0, ans=0.0 2024-08-18 14:28:09,088 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3938890.0, ans=0.0 2024-08-18 14:28:10,113 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-18 14:28:11,574 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.818e+05 2024-08-18 14:28:14,384 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.34 vs. limit=22.5 2024-08-18 14:28:19,328 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3938990.0, ans=0.125 2024-08-18 14:28:26,060 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.318e+01 2.608e+01 2.955e+01 6.690e+01, threshold=5.215e+01, percent-clipped=2.0 2024-08-18 14:28:28,066 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.03 vs. limit=15.0 2024-08-18 14:28:35,871 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 7300, loss[loss=0.08539, beats_loss=0.01164, ecapa_loss=0.0001511, whisper_loss=0.07225, over 17893.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01054, ecapa_loss=0.0001443, whisper_loss=0.09033, over 3919191.69 frames. ], batch size: 74, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:28:48,511 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 26 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-18 14:28:56,852 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 35 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-18 14:29:01,762 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 36 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-18 14:29:04,107 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-18 14:29:23,926 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-18 14:29:37,674 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 7350, loss[loss=0.09138, beats_loss=0.009861, ecapa_loss=0.0001351, whisper_loss=0.08017, over 19222.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01062, ecapa_loss=0.0001452, whisper_loss=0.08975, over 3934381.27 frames. ], batch size: 75, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:29:46,712 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3939690.0, ans=0.2 2024-08-18 14:29:54,074 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3939790.0, ans=0.1 2024-08-18 14:29:55,546 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.46 vs. limit=12.0 2024-08-18 14:29:56,179 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 22 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-18 14:30:05,701 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 21 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-18 14:30:07,215 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-18 14:30:09,881 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 25 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-18 14:30:11,417 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3939890.0, ans=0.0 2024-08-18 14:30:29,945 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.344e+01 2.539e+01 2.800e+01 8.685e+01, threshold=5.077e+01, percent-clipped=1.0 2024-08-18 14:30:31,872 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.73 vs. limit=10.0 2024-08-18 14:30:36,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3940090.0, ans=0.125 2024-08-18 14:30:39,145 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3940190.0, ans=0.0 2024-08-18 14:30:40,016 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 7400, loss[loss=0.1035, beats_loss=0.009678, ecapa_loss=0.0001679, whisper_loss=0.09218, over 22726.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01055, ecapa_loss=0.0001454, whisper_loss=0.08954, over 3913074.73 frames. ], batch size: 93, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:30:46,332 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3940190.0, ans=0.125 2024-08-18 14:30:52,249 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 25 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-18 14:31:11,079 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3940390.0, ans=0.2 2024-08-18 14:31:21,208 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 28 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-18 14:31:30,057 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3940590.0, ans=0.0 2024-08-18 14:31:34,309 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 32 from Vox, 34 fro AS 2024-08-18 14:31:40,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3940690.0, ans=0.0 2024-08-18 14:31:41,554 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 7450, loss[loss=0.08626, beats_loss=0.01087, ecapa_loss=0.0001883, whisper_loss=0.0735, over 21358.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01051, ecapa_loss=0.0001464, whisper_loss=0.09002, over 3927850.93 frames. ], batch size: 93, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:31:44,189 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 24 from LS+wenet, 20 from Vox, 51 fro AS 2024-08-18 14:32:33,413 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.745e+01 2.339e+01 2.554e+01 2.965e+01 5.290e+01, threshold=5.108e+01, percent-clipped=2.0 2024-08-18 14:32:34,189 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.05 vs. limit=22.5 2024-08-18 14:32:37,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3941090.0, ans=0.0 2024-08-18 14:32:43,008 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 7500, loss[loss=0.08276, beats_loss=0.01385, ecapa_loss=0.0001397, whisper_loss=0.06751, over 15247.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01053, ecapa_loss=0.0001474, whisper_loss=0.09006, over 3925733.31 frames. ], batch size: 62, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:32:46,854 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-18 14:32:49,174 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-18 14:32:49,701 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.19 vs. limit=15.0 2024-08-18 14:32:53,087 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.26 vs. limit=22.5 2024-08-18 14:33:08,907 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 19 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-18 14:33:10,070 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 35 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-18 14:33:14,694 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.50 vs. limit=22.5 2024-08-18 14:33:15,583 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3941390.0, ans=0.0 2024-08-18 14:33:32,656 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-18 14:33:34,275 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-18 14:33:47,430 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 7550, loss[loss=0.102, beats_loss=0.008569, ecapa_loss=0.0001528, whisper_loss=0.09195, over 20919.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01051, ecapa_loss=0.0001465, whisper_loss=0.0902, over 3893564.41 frames. ], batch size: 83, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:34:02,667 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3941790.0, ans=0.07 2024-08-18 14:34:12,786 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-18 14:34:23,965 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.66 vs. limit=6.0 2024-08-18 14:34:47,001 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 18 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-18 14:34:52,512 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3942090.0, ans=0.0 2024-08-18 14:34:53,218 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.776e+01 2.267e+01 2.538e+01 2.826e+01 4.465e+01, threshold=5.075e+01, percent-clipped=0.0 2024-08-18 14:34:55,070 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3942090.0, ans=0.035 2024-08-18 14:35:05,565 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 7600, loss[loss=0.1045, beats_loss=0.01091, ecapa_loss=0.000161, whisper_loss=0.092, over 21190.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01047, ecapa_loss=0.000146, whisper_loss=0.09047, over 3884417.15 frames. ], batch size: 87, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:35:09,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3942190.0, ans=0.0 2024-08-18 14:35:15,282 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-18 14:35:19,276 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3942190.0, ans=0.09899494936611666 2024-08-18 14:36:08,837 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3942490.0, ans=0.1 2024-08-18 14:36:11,220 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.46 vs. limit=12.0 2024-08-18 14:36:15,433 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 21 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-18 14:36:16,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3942590.0, ans=0.125 2024-08-18 14:36:32,725 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 7650, loss[loss=0.09938, beats_loss=0.01152, ecapa_loss=0.0001324, whisper_loss=0.08654, over 22207.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01044, ecapa_loss=0.0001457, whisper_loss=0.09033, over 3923723.99 frames. ], batch size: 89, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:36:35,780 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 24 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-18 14:36:38,996 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.86 vs. limit=8.0 2024-08-18 14:36:39,665 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3942690.0, ans=0.05 2024-08-18 14:36:47,996 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3942790.0, ans=0.125 2024-08-18 14:36:48,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3942790.0, ans=0.125 2024-08-18 14:37:01,397 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-18 14:37:29,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3942990.0, ans=10.0 2024-08-18 14:37:29,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3942990.0, ans=0.125 2024-08-18 14:37:38,035 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3942990.0, ans=0.0 2024-08-18 14:37:42,107 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3943090.0, ans=0.035 2024-08-18 14:37:44,652 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.361e+01 2.566e+01 2.916e+01 1.165e+02, threshold=5.131e+01, percent-clipped=2.0 2024-08-18 14:37:49,955 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 21 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-18 14:37:58,388 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 7700, loss[loss=0.09548, beats_loss=0.01302, ecapa_loss=0.0001662, whisper_loss=0.0808, over 14144.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01051, ecapa_loss=0.0001455, whisper_loss=0.08972, over 3898521.16 frames. ], batch size: 59, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:38:03,567 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3943190.0, ans=0.125 2024-08-18 14:38:07,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3943190.0, ans=0.0 2024-08-18 14:38:13,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3943290.0, ans=0.035 2024-08-18 14:38:15,775 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3943290.0, ans=0.07 2024-08-18 14:38:30,576 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 18 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-18 14:38:34,061 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 25 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-18 14:38:38,196 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-18 14:38:44,662 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 26 from LS+wenet, 11 from Vox, 22 fro AS 2024-08-18 14:38:44,889 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3943490.0, ans=0.0 2024-08-18 14:38:47,279 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=3943490.0, ans=0.2 2024-08-18 14:38:47,288 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3943490.0, ans=0.2 2024-08-18 14:38:53,683 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.11 vs. limit=15.0 2024-08-18 14:39:03,070 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 7750, loss[loss=0.1138, beats_loss=0.009737, ecapa_loss=0.0001791, whisper_loss=0.1023, over 21833.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01045, ecapa_loss=0.0001454, whisper_loss=0.09026, over 3878638.43 frames. ], batch size: 94, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:39:03,719 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3943690.0, ans=0.125 2024-08-18 14:39:35,940 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 14 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-18 14:39:37,517 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-18 14:39:56,140 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.698e+01 2.246e+01 2.508e+01 2.794e+01 3.157e+02, threshold=5.017e+01, percent-clipped=3.0 2024-08-18 14:39:56,570 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3944090.0, ans=0.125 2024-08-18 14:39:57,673 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3944090.0, ans=0.125 2024-08-18 14:40:06,234 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 7800, loss[loss=0.1018, beats_loss=0.01068, ecapa_loss=0.0001513, whisper_loss=0.08963, over 21946.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01044, ecapa_loss=0.0001458, whisper_loss=0.09003, over 3871214.68 frames. ], batch size: 91, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:40:14,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3944190.0, ans=0.1 2024-08-18 14:40:16,040 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3944190.0, ans=0.0 2024-08-18 14:40:17,071 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 10 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-18 14:40:22,965 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3944290.0, ans=0.0 2024-08-18 14:40:26,536 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 26 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-18 14:40:32,600 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 15 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-18 14:40:41,607 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-18 14:40:47,001 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3944490.0, ans=0.2 2024-08-18 14:41:11,785 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 7850, loss[loss=0.1063, beats_loss=0.01063, ecapa_loss=0.0001407, whisper_loss=0.09431, over 17851.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01046, ecapa_loss=0.0001453, whisper_loss=0.09019, over 3876673.74 frames. ], batch size: 71, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:41:16,050 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3944690.0, ans=0.2 2024-08-18 14:41:18,465 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 23 from LS+wenet, 13 from Vox, 42 fro AS 2024-08-18 14:41:23,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3944690.0, ans=0.125 2024-08-18 14:41:29,623 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 31 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-18 14:41:45,680 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3944890.0, ans=0.125 2024-08-18 14:41:46,576 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-18 14:41:48,390 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.59 vs. limit=10.0 2024-08-18 14:41:52,187 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3944990.0, ans=0.0 2024-08-18 14:42:05,247 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 24 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-18 14:42:09,805 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.960e+01 2.337e+01 2.593e+01 2.821e+01 2.204e+02, threshold=5.186e+01, percent-clipped=1.0 2024-08-18 14:42:19,891 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 7900, loss[loss=0.1051, beats_loss=0.01238, ecapa_loss=0.000129, whisper_loss=0.09147, over 22477.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01055, ecapa_loss=0.0001449, whisper_loss=0.08995, over 3870286.78 frames. ], batch size: 91, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:42:22,617 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.32 vs. limit=6.0 2024-08-18 14:42:45,481 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3945390.0, ans=0.125 2024-08-18 14:42:54,004 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-18 14:43:03,777 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.51 vs. limit=15.0 2024-08-18 14:43:13,198 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-18 14:43:19,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3945590.0, ans=0.1 2024-08-18 14:43:24,346 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 7950, loss[loss=0.08988, beats_loss=0.01187, ecapa_loss=0.0001329, whisper_loss=0.07668, over 15519.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0106, ecapa_loss=0.0001453, whisper_loss=0.08948, over 3886231.15 frames. ], batch size: 63, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:43:30,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3945690.0, ans=0.0 2024-08-18 14:43:33,318 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 17 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-18 14:43:33,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3945690.0, ans=0.125 2024-08-18 14:43:35,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3945790.0, ans=0.125 2024-08-18 14:43:55,561 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.17 vs. limit=15.0 2024-08-18 14:44:11,817 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3945990.0, ans=0.2 2024-08-18 14:44:18,407 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.738e+01 2.253e+01 2.458e+01 2.855e+01 4.177e+01, threshold=4.916e+01, percent-clipped=0.0 2024-08-18 14:44:28,685 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 8000, loss[loss=0.1175, beats_loss=0.009508, ecapa_loss=0.0001231, whisper_loss=0.1067, over 24456.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0106, ecapa_loss=0.0001447, whisper_loss=0.09011, over 3867218.72 frames. ], batch size: 94, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:44:39,512 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.95 vs. limit=22.5 2024-08-18 14:44:55,285 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-18 14:45:04,991 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.07 vs. limit=15.0 2024-08-18 14:45:21,681 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.28 vs. limit=15.0 2024-08-18 14:45:31,475 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 8050, loss[loss=0.09724, beats_loss=0.01227, ecapa_loss=0.0001095, whisper_loss=0.08387, over 23678.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01057, ecapa_loss=0.0001456, whisper_loss=0.09019, over 3865122.28 frames. ], batch size: 92, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:45:34,640 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3946690.0, ans=0.0 2024-08-18 14:45:34,716 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3946690.0, ans=0.0 2024-08-18 14:45:38,847 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.07 vs. limit=10.0 2024-08-18 14:45:48,008 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 11 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-18 14:45:48,344 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3946790.0, ans=0.125 2024-08-18 14:45:49,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3946790.0, ans=0.125 2024-08-18 14:45:55,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3946890.0, ans=0.125 2024-08-18 14:46:07,129 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-18 14:46:08,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3946990.0, ans=0.1 2024-08-18 14:46:12,374 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3946990.0, ans=0.125 2024-08-18 14:46:24,657 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.326e+01 2.639e+01 3.174e+01 1.521e+02, threshold=5.277e+01, percent-clipped=3.0 2024-08-18 14:46:35,144 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 8100, loss[loss=0.1018, beats_loss=0.01025, ecapa_loss=0.0001107, whisper_loss=0.09045, over 20957.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0106, ecapa_loss=0.0001447, whisper_loss=0.09013, over 3875035.59 frames. ], batch size: 81, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:46:39,635 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 19 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-18 14:47:15,026 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 20 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-18 14:47:23,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3947490.0, ans=0.1 2024-08-18 14:47:26,662 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-18 14:47:28,404 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.36 vs. limit=22.5 2024-08-18 14:47:33,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3947590.0, ans=0.125 2024-08-18 14:47:41,930 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 8150, loss[loss=0.1002, beats_loss=0.009088, ecapa_loss=0.000166, whisper_loss=0.08941, over 22419.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01059, ecapa_loss=0.0001446, whisper_loss=0.08931, over 3846809.41 frames. ], batch size: 92, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:47:55,535 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-18 14:47:55,789 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3947790.0, ans=0.0 2024-08-18 14:47:58,409 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3947790.0, ans=0.0 2024-08-18 14:48:05,646 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-18 14:48:08,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3947890.0, ans=0.125 2024-08-18 14:48:09,254 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 28 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-18 14:48:30,380 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3947990.0, ans=0.125 2024-08-18 14:48:32,594 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-18 14:48:32,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3948090.0, ans=0.2 2024-08-18 14:48:35,011 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.970e+01 2.257e+01 2.558e+01 2.766e+01 4.647e+01, threshold=5.117e+01, percent-clipped=0.0 2024-08-18 14:48:43,067 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-18 14:48:45,283 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 8200, loss[loss=0.1018, beats_loss=0.01028, ecapa_loss=0.0001807, whisper_loss=0.0897, over 18406.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01049, ecapa_loss=0.0001458, whisper_loss=0.09004, over 3877318.67 frames. ], batch size: 79, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:48:46,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3948190.0, ans=0.125 2024-08-18 14:48:59,924 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3948290.0, ans=0.125 2024-08-18 14:49:11,134 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3948390.0, ans=0.0 2024-08-18 14:49:43,063 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 22 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-18 14:49:44,547 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 22 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-18 14:49:44,906 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3948590.0, ans=0.2 2024-08-18 14:49:49,327 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 8250, loss[loss=0.101, beats_loss=0.009313, ecapa_loss=0.0001572, whisper_loss=0.0901, over 21964.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01049, ecapa_loss=0.0001446, whisper_loss=0.08991, over 3871298.02 frames. ], batch size: 91, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:49:55,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3948690.0, ans=0.2 2024-08-18 14:50:15,188 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3948890.0, ans=0.09899494936611666 2024-08-18 14:50:16,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3948890.0, ans=0.125 2024-08-18 14:50:17,585 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-18 14:50:18,725 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-18 14:50:34,227 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-18 14:50:41,551 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.14 vs. limit=15.0 2024-08-18 14:50:44,686 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.276e+01 2.490e+01 2.765e+01 6.193e+01, threshold=4.979e+01, percent-clipped=1.0 2024-08-18 14:50:52,734 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3949090.0, ans=0.07 2024-08-18 14:50:53,713 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 19 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-18 14:50:54,760 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 8300, loss[loss=0.1005, beats_loss=0.009841, ecapa_loss=0.0001605, whisper_loss=0.0891, over 17081.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01047, ecapa_loss=0.0001444, whisper_loss=0.09005, over 3868778.66 frames. ], batch size: 67, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:51:03,549 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3949190.0, ans=0.125 2024-08-18 14:51:03,771 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.62 vs. limit=12.0 2024-08-18 14:51:07,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3949290.0, ans=0.0 2024-08-18 14:51:19,622 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 19 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-18 14:51:38,066 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 21 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-18 14:51:44,140 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-18 14:51:46,319 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.77 vs. limit=15.0 2024-08-18 14:51:56,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3949690.0, ans=0.2 2024-08-18 14:51:57,032 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 8350, loss[loss=0.1163, beats_loss=0.01134, ecapa_loss=0.0001035, whisper_loss=0.1039, over 23497.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01056, ecapa_loss=0.0001439, whisper_loss=0.08976, over 3876596.06 frames. ], batch size: 90, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:52:10,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3949790.0, ans=0.2 2024-08-18 14:52:16,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3949790.0, ans=0.1 2024-08-18 14:52:20,306 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3949790.0, ans=0.125 2024-08-18 14:52:22,445 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 32 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-18 14:52:36,461 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3949990.0, ans=0.07 2024-08-18 14:52:45,549 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-18 14:52:50,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3950090.0, ans=0.125 2024-08-18 14:52:52,732 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.711e+01 2.289e+01 2.579e+01 2.822e+01 1.067e+02, threshold=5.159e+01, percent-clipped=1.0 2024-08-18 14:53:04,044 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 8400, loss[loss=0.09714, beats_loss=0.01013, ecapa_loss=0.0001382, whisper_loss=0.08563, over 18335.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01056, ecapa_loss=0.0001426, whisper_loss=0.08959, over 3899769.67 frames. ], batch size: 70, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:53:11,687 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3950190.0, ans=0.125 2024-08-18 14:53:12,405 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 16 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-18 14:53:14,455 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.65 vs. limit=22.5 2024-08-18 14:53:15,134 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3950190.0, ans=0.125 2024-08-18 14:53:45,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3950490.0, ans=0.1 2024-08-18 14:53:45,369 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3950490.0, ans=0.0 2024-08-18 14:53:56,378 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3950590.0, ans=0.125 2024-08-18 14:54:02,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3950590.0, ans=6.0 2024-08-18 14:54:09,907 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.75 vs. limit=12.0 2024-08-18 14:54:10,528 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 8450, loss[loss=0.09101, beats_loss=0.01004, ecapa_loss=0.0001386, whisper_loss=0.07958, over 13977.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01045, ecapa_loss=0.0001433, whisper_loss=0.08984, over 3877050.31 frames. ], batch size: 57, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:54:11,324 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.06 vs. limit=15.0 2024-08-18 14:54:13,164 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 15 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-18 14:54:20,722 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 35 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-18 14:54:21,163 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3950690.0, ans=0.1 2024-08-18 14:54:23,165 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.39 vs. limit=15.0 2024-08-18 14:54:23,469 WARNING [optim.py:496] (2/4) Scaling gradients by 0.09940025955438614, model_norm_threshold=51.58603286743164 2024-08-18 14:54:23,640 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.22, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.828e+04, grad_sumsq=5.828e+04, orig_rms_sq=1.000e+00 2024-08-18 14:54:35,950 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3950890.0, ans=0.125 2024-08-18 14:54:39,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=3950890.0, ans=0.2 2024-08-18 14:54:42,420 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3950890.0, ans=0.1 2024-08-18 14:54:53,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3950990.0, ans=0.125 2024-08-18 14:54:54,879 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3950990.0, ans=0.125 2024-08-18 14:55:03,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3951090.0, ans=0.5 2024-08-18 14:55:04,301 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.244e+01 2.462e+01 2.718e+01 5.190e+02, threshold=4.924e+01, percent-clipped=1.0 2024-08-18 14:55:04,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3951090.0, ans=0.125 2024-08-18 14:55:11,263 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3951090.0, ans=0.2 2024-08-18 14:55:14,718 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 8500, loss[loss=0.09027, beats_loss=0.01135, ecapa_loss=0.000133, whisper_loss=0.07759, over 21954.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01048, ecapa_loss=0.0001435, whisper_loss=0.08986, over 3910360.18 frames. ], batch size: 87, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:55:24,810 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-18 14:55:26,044 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 39 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-18 14:55:26,877 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.99 vs. limit=22.5 2024-08-18 14:55:31,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3951290.0, ans=0.125 2024-08-18 14:55:40,708 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 24 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-18 14:56:10,267 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.74 vs. limit=22.5 2024-08-18 14:56:16,273 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.22 vs. limit=15.0 2024-08-18 14:56:16,870 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 8550, loss[loss=0.109, beats_loss=0.0103, ecapa_loss=0.0001297, whisper_loss=0.0974, over 23190.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01043, ecapa_loss=0.0001435, whisper_loss=0.08995, over 3865516.94 frames. ], batch size: 93, lr: 2.26e-03, grad_scale: 1.152921504606847e+18 2024-08-18 14:56:19,424 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 20 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-18 14:56:24,759 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=3951690.0, ans=0.05 2024-08-18 14:56:25,011 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.95 vs. limit=15.0 2024-08-18 14:56:32,435 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3951790.0, ans=10.0 2024-08-18 14:56:33,447 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-18 14:56:33,667 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3951790.0, ans=0.09899494936611666 2024-08-18 14:56:38,757 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.67 vs. limit=15.0 2024-08-18 14:56:45,636 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 20 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-18 14:56:47,851 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.51 vs. limit=15.0 2024-08-18 14:56:48,312 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-18 14:56:57,642 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.98 vs. limit=22.5 2024-08-18 14:57:02,494 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=12.14 vs. limit=12.0 2024-08-18 14:57:04,527 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 25 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-18 14:57:04,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3951990.0, ans=0.1 2024-08-18 14:57:10,809 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.987e+01 2.534e+01 2.714e+01 3.031e+01 4.468e+01, threshold=5.428e+01, percent-clipped=0.0 2024-08-18 14:57:19,461 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 8600, loss[loss=0.1144, beats_loss=0.01027, ecapa_loss=0.0001404, whisper_loss=0.1027, over 22422.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0104, ecapa_loss=0.0001437, whisper_loss=0.09098, over 3895641.53 frames. ], batch size: 90, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:57:33,913 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3952290.0, ans=6.0 2024-08-18 14:57:36,056 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3952290.0, ans=0.0 2024-08-18 14:57:39,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3952290.0, ans=0.2 2024-08-18 14:57:48,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3952390.0, ans=0.125 2024-08-18 14:57:53,383 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-18 14:57:54,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3952390.0, ans=0.125 2024-08-18 14:58:21,617 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 8650, loss[loss=0.09784, beats_loss=0.01054, ecapa_loss=0.0001744, whisper_loss=0.08556, over 20511.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01043, ecapa_loss=0.0001434, whisper_loss=0.09042, over 3893157.86 frames. ], batch size: 89, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:58:25,479 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-18 14:58:29,178 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 19 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-18 14:58:52,399 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 22 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-18 14:58:55,434 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3952890.0, ans=0.125 2024-08-18 14:59:12,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3953090.0, ans=0.2 2024-08-18 14:59:14,431 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3953090.0, ans=0.125 2024-08-18 14:59:15,072 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.252e+01 2.415e+01 2.736e+01 4.412e+01, threshold=4.831e+01, percent-clipped=0.0 2024-08-18 14:59:20,777 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.15 vs. limit=15.0 2024-08-18 14:59:21,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3953090.0, ans=0.95 2024-08-18 14:59:23,865 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 8700, loss[loss=0.1093, beats_loss=0.009003, ecapa_loss=0.0001437, whisper_loss=0.09885, over 15379.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01045, ecapa_loss=0.0001441, whisper_loss=0.08999, over 3909752.14 frames. ], batch size: 60, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:59:29,228 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.83 vs. limit=12.0 2024-08-18 14:59:57,567 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3953390.0, ans=0.0 2024-08-18 15:00:10,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3953490.0, ans=0.125 2024-08-18 15:00:13,747 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-18 15:00:25,990 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 8750, loss[loss=0.09729, beats_loss=0.009862, ecapa_loss=0.0001544, whisper_loss=0.08588, over 21218.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01042, ecapa_loss=0.0001445, whisper_loss=0.09037, over 3910123.20 frames. ], batch size: 83, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:00:35,309 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.08 vs. limit=22.5 2024-08-18 15:00:46,174 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-18 15:00:47,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3953790.0, ans=0.125 2024-08-18 15:00:58,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=3953890.0, ans=10.0 2024-08-18 15:01:06,682 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.46 vs. limit=15.0 2024-08-18 15:01:19,351 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+01 2.312e+01 2.474e+01 2.763e+01 1.199e+02, threshold=4.947e+01, percent-clipped=1.0 2024-08-18 15:01:28,069 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 8800, loss[loss=0.1018, beats_loss=0.0106, ecapa_loss=0.0001476, whisper_loss=0.08973, over 17737.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01046, ecapa_loss=0.0001443, whisper_loss=0.09015, over 3870680.30 frames. ], batch size: 69, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:01:41,375 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3954290.0, ans=0.2 2024-08-18 15:01:47,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3954290.0, ans=0.025 2024-08-18 15:02:08,183 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3954490.0, ans=0.0 2024-08-18 15:02:21,879 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 21 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-18 15:02:24,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3954590.0, ans=0.125 2024-08-18 15:02:27,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3954590.0, ans=0.0 2024-08-18 15:02:30,610 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 8850, loss[loss=0.1135, beats_loss=0.01108, ecapa_loss=0.000124, whisper_loss=0.1011, over 23212.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01053, ecapa_loss=0.0001434, whisper_loss=0.09019, over 3858034.27 frames. ], batch size: 90, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:02:48,175 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 20 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-18 15:02:48,403 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3954790.0, ans=0.0 2024-08-18 15:02:51,602 WARNING [optim.py:496] (2/4) Scaling gradients by 0.04036853462457657, model_norm_threshold=49.47042465209961 2024-08-18 15:02:51,769 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.10, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.484e+05, grad_sumsq=1.484e+05, orig_rms_sq=1.000e+00 2024-08-18 15:03:13,523 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 24 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-18 15:03:13,812 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3954990.0, ans=0.125 2024-08-18 15:03:17,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3954990.0, ans=0.0 2024-08-18 15:03:24,437 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.312e+01 2.617e+01 3.003e+01 1.225e+03, threshold=5.234e+01, percent-clipped=1.0 2024-08-18 15:03:30,790 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 26 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-18 15:03:33,263 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 8900, loss[loss=0.08438, beats_loss=0.01197, ecapa_loss=0.0001476, whisper_loss=0.07093, over 21118.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01058, ecapa_loss=0.000144, whisper_loss=0.08954, over 3838070.81 frames. ], batch size: 88, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:03:35,858 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-18 15:03:53,108 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 21 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-18 15:03:53,339 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3955290.0, ans=0.5 2024-08-18 15:03:58,714 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.14 vs. limit=6.0 2024-08-18 15:04:04,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3955390.0, ans=0.1 2024-08-18 15:04:17,442 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3955490.0, ans=0.1 2024-08-18 15:04:19,513 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 39 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-18 15:04:21,234 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3955490.0, ans=0.125 2024-08-18 15:04:28,412 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 31 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-18 15:04:30,152 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.84 vs. limit=15.0 2024-08-18 15:04:35,956 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 8950, loss[loss=0.1008, beats_loss=0.01183, ecapa_loss=0.0001351, whisper_loss=0.08758, over 21578.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01056, ecapa_loss=0.0001436, whisper_loss=0.08966, over 3844812.62 frames. ], batch size: 88, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:04:38,372 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-18 15:04:48,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3955790.0, ans=0.0 2024-08-18 15:04:53,766 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3955790.0, ans=0.125 2024-08-18 15:04:58,399 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 18 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-18 15:05:04,593 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3955890.0, ans=0.125 2024-08-18 15:05:05,649 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3955890.0, ans=0.125 2024-08-18 15:05:22,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3955990.0, ans=0.0 2024-08-18 15:05:24,901 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.75 vs. limit=15.0 2024-08-18 15:05:28,255 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 18 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-18 15:05:28,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3956090.0, ans=0.2 2024-08-18 15:05:29,628 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.620e+01 2.255e+01 2.650e+01 2.885e+01 7.175e+01, threshold=5.300e+01, percent-clipped=2.0 2024-08-18 15:05:30,897 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 25 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-18 15:05:31,139 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3956090.0, ans=0.0 2024-08-18 15:05:38,070 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 9000, loss[loss=0.0883, beats_loss=0.01145, ecapa_loss=0.0001529, whisper_loss=0.07532, over 22121.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01059, ecapa_loss=0.0001431, whisper_loss=0.08947, over 3866625.69 frames. ], batch size: 91, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:05:38,071 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-18 15:06:15,529 INFO [train_multi_KD3.py:1149] (2/4) Epoch 27, validation on ASR_libri: loss=0.2517, beats_loss=0, ecapa_loss=0.0005205, whisper_loss=0.2465, over 922467.00 frames. 2024-08-18 15:06:34,032 INFO [train_multi_KD3.py:1149] (2/4) Epoch 27, validation on SV_voxceleb1: loss=0.004102, beats_loss=0, ecapa_loss=0.0004102, whisper_loss=0, over 939242.00 frames. 2024-08-18 15:08:24,129 INFO [train_multi_KD3.py:1149] (2/4) Epoch 27, validation on AT_audioset: loss=0.02313, beats_loss=0.02313, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 15:08:24,133 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31584MB 2024-08-18 15:08:24,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3956190.0, ans=0.125 2024-08-18 15:08:26,937 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 28 from LS+wenet, 27 from Vox, 23 fro AS 2024-08-18 15:08:27,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3956190.0, ans=0.1 2024-08-18 15:08:33,071 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 32 from Vox, 27 fro AS 2024-08-18 15:08:54,378 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3956390.0, ans=0.0 2024-08-18 15:09:19,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3956590.0, ans=0.125 2024-08-18 15:09:20,743 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3956590.0, ans=0.2 2024-08-18 15:09:21,688 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 15 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-18 15:09:23,256 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3956590.0, ans=0.125 2024-08-18 15:09:26,669 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 9050, loss[loss=0.0811, beats_loss=0.01179, ecapa_loss=0.0001608, whisper_loss=0.0677, over 15396.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01053, ecapa_loss=0.0001442, whisper_loss=0.08932, over 3875379.44 frames. ], batch size: 64, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:09:27,610 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.58 vs. limit=22.5 2024-08-18 15:09:30,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3956690.0, ans=0.1 2024-08-18 15:09:30,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3956690.0, ans=0.125 2024-08-18 15:09:39,308 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3956790.0, ans=0.1 2024-08-18 15:09:43,085 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 28 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-18 15:09:49,227 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-18 15:09:50,318 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-18 15:09:56,683 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3956890.0, ans=0.0 2024-08-18 15:10:00,311 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 22 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-18 15:10:01,547 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 35 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-18 15:10:19,810 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.267e+01 2.520e+01 2.853e+01 4.367e+01, threshold=5.041e+01, percent-clipped=0.0 2024-08-18 15:10:21,129 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 32 from LS+wenet, 31 from Vox, 24 fro AS 2024-08-18 15:10:23,873 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-18 15:10:28,985 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 9100, loss[loss=0.1093, beats_loss=0.009076, ecapa_loss=0.000163, whisper_loss=0.09864, over 22130.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01045, ecapa_loss=0.0001461, whisper_loss=0.09046, over 3895356.01 frames. ], batch size: 90, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:10:31,547 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 14 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-18 15:10:50,359 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 23 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-18 15:10:51,537 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-18 15:11:16,758 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3957490.0, ans=0.09899494936611666 2024-08-18 15:11:30,909 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 9150, loss[loss=0.09372, beats_loss=0.01156, ecapa_loss=0.0001388, whisper_loss=0.08077, over 22528.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01046, ecapa_loss=0.0001457, whisper_loss=0.09004, over 3888810.45 frames. ], batch size: 92, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:11:45,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3957790.0, ans=0.1 2024-08-18 15:12:00,227 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3957890.0, ans=0.0 2024-08-18 15:12:24,950 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.708e+01 2.284e+01 2.490e+01 2.890e+01 6.008e+01, threshold=4.980e+01, percent-clipped=1.0 2024-08-18 15:12:33,865 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 9200, loss[loss=0.09944, beats_loss=0.01111, ecapa_loss=0.0001643, whisper_loss=0.08668, over 16508.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01056, ecapa_loss=0.0001454, whisper_loss=0.09003, over 3897258.60 frames. ], batch size: 69, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:12:36,392 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-18 15:13:13,259 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3958490.0, ans=0.125 2024-08-18 15:13:16,772 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 28 from Vox, 18 fro AS 2024-08-18 15:13:31,001 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.68 vs. limit=10.0 2024-08-18 15:13:34,369 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3958690.0, ans=0.0 2024-08-18 15:13:35,080 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 9250, loss[loss=0.09593, beats_loss=0.01163, ecapa_loss=0.0001584, whisper_loss=0.08272, over 17338.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0106, ecapa_loss=0.0001455, whisper_loss=0.08955, over 3901471.17 frames. ], batch size: 72, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:13:37,224 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.77 vs. limit=15.0 2024-08-18 15:13:37,715 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 28 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-18 15:13:39,008 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 37 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-18 15:13:43,148 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3958690.0, ans=0.0 2024-08-18 15:13:47,635 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 30 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-18 15:13:51,092 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.89 vs. limit=22.5 2024-08-18 15:13:54,268 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3958790.0, ans=0.1 2024-08-18 15:14:01,857 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3958890.0, ans=0.125 2024-08-18 15:14:05,585 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.30 vs. limit=15.0 2024-08-18 15:14:21,899 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.69 vs. limit=15.0 2024-08-18 15:14:23,823 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-18 15:14:25,058 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 32 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-18 15:14:26,565 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3959090.0, ans=0.2 2024-08-18 15:14:28,623 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.320e+01 2.629e+01 2.972e+01 5.900e+01, threshold=5.258e+01, percent-clipped=2.0 2024-08-18 15:14:28,793 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 32 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-18 15:14:31,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3959090.0, ans=0.1 2024-08-18 15:14:34,279 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.28 vs. limit=12.0 2024-08-18 15:14:37,203 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 9300, loss[loss=0.103, beats_loss=0.01185, ecapa_loss=0.000116, whisper_loss=0.08994, over 21072.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01053, ecapa_loss=0.0001445, whisper_loss=0.09021, over 3947949.31 frames. ], batch size: 82, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:14:37,719 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3959190.0, ans=0.125 2024-08-18 15:14:44,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3959190.0, ans=0.1 2024-08-18 15:14:46,647 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3959190.0, ans=0.0 2024-08-18 15:14:48,088 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3959190.0, ans=0.0 2024-08-18 15:14:51,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3959290.0, ans=0.125 2024-08-18 15:14:57,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3959290.0, ans=0.1 2024-08-18 15:15:10,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3959390.0, ans=0.0 2024-08-18 15:15:22,040 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3959490.0, ans=0.1 2024-08-18 15:15:41,440 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 9350, loss[loss=0.1037, beats_loss=0.0103, ecapa_loss=0.0001301, whisper_loss=0.09208, over 22558.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01051, ecapa_loss=0.000145, whisper_loss=0.08992, over 3915800.19 frames. ], batch size: 90, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:15:57,556 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 11 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-18 15:16:07,750 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 29 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-18 15:16:09,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3959890.0, ans=0.09899494936611666 2024-08-18 15:16:12,164 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3959890.0, ans=0.2 2024-08-18 15:16:29,354 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3959990.0, ans=0.125 2024-08-18 15:16:34,059 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 30 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-18 15:16:37,866 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.319e+01 2.506e+01 2.733e+01 5.206e+01, threshold=5.011e+01, percent-clipped=0.0 2024-08-18 15:16:47,469 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 9400, loss[loss=0.09897, beats_loss=0.01257, ecapa_loss=0.0001259, whisper_loss=0.08514, over 21896.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0104, ecapa_loss=0.0001457, whisper_loss=0.09033, over 3911685.77 frames. ], batch size: 88, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:17:03,507 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 18 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-18 15:17:11,137 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 20 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-18 15:17:21,776 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 37 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-18 15:17:23,319 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3960390.0, ans=0.2 2024-08-18 15:17:49,641 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3960590.0, ans=0.0 2024-08-18 15:17:53,131 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 9450, loss[loss=0.1033, beats_loss=0.01076, ecapa_loss=0.0001303, whisper_loss=0.09125, over 20739.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01043, ecapa_loss=0.0001446, whisper_loss=0.09017, over 3846842.34 frames. ], batch size: 81, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:18:05,551 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.62 vs. limit=22.5 2024-08-18 15:18:10,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3960790.0, ans=0.125 2024-08-18 15:18:23,037 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3960890.0, ans=0.2 2024-08-18 15:18:26,939 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-18 15:18:45,567 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 35 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-18 15:18:49,610 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3961090.0, ans=0.1 2024-08-18 15:18:49,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3961090.0, ans=0.07 2024-08-18 15:18:50,285 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.723e+01 2.322e+01 2.578e+01 2.897e+01 2.615e+02, threshold=5.157e+01, percent-clipped=1.0 2024-08-18 15:18:56,146 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3961090.0, ans=0.2 2024-08-18 15:18:59,859 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 9500, loss[loss=0.1225, beats_loss=0.009522, ecapa_loss=0.0001435, whisper_loss=0.1115, over 24018.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0104, ecapa_loss=0.0001451, whisper_loss=0.09044, over 3841645.19 frames. ], batch size: 90, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:19:24,442 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 27 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-18 15:19:25,121 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3961290.0, ans=0.125 2024-08-18 15:19:30,861 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3961390.0, ans=0.0 2024-08-18 15:19:32,110 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.55 vs. limit=22.5 2024-08-18 15:20:11,228 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 9550, loss[loss=0.06961, beats_loss=0.01152, ecapa_loss=0.0001483, whisper_loss=0.0566, over 15387.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01049, ecapa_loss=0.0001452, whisper_loss=0.0899, over 3860714.76 frames. ], batch size: 67, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:20:40,215 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3961890.0, ans=0.035 2024-08-18 15:20:48,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3961890.0, ans=0.0 2024-08-18 15:21:00,832 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 19 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-18 15:21:10,242 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.297e+01 2.564e+01 2.916e+01 8.592e+01, threshold=5.127e+01, percent-clipped=1.0 2024-08-18 15:21:10,710 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3962090.0, ans=0.1 2024-08-18 15:21:20,408 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 9600, loss[loss=0.1086, beats_loss=0.009526, ecapa_loss=0.0001384, whisper_loss=0.09768, over 23786.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01042, ecapa_loss=0.0001454, whisper_loss=0.09044, over 3841766.98 frames. ], batch size: 91, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:21:41,618 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-18 15:22:04,753 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.67 vs. limit=15.0 2024-08-18 15:22:05,329 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-18 15:22:27,670 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-18 15:22:28,989 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 9650, loss[loss=0.1097, beats_loss=0.008427, ecapa_loss=0.0001472, whisper_loss=0.09981, over 18853.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01038, ecapa_loss=0.0001459, whisper_loss=0.09047, over 3837140.23 frames. ], batch size: 75, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:22:31,958 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3962690.0, ans=0.2 2024-08-18 15:22:32,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3962690.0, ans=0.0 2024-08-18 15:22:41,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3962790.0, ans=0.125 2024-08-18 15:22:46,730 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.01 vs. limit=15.0 2024-08-18 15:23:00,643 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-18 15:23:02,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3962890.0, ans=0.1 2024-08-18 15:23:07,587 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3962990.0, ans=0.0 2024-08-18 15:23:24,855 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3963090.0, ans=0.1 2024-08-18 15:23:26,916 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.385e+01 2.565e+01 2.930e+01 2.026e+02, threshold=5.129e+01, percent-clipped=1.0 2024-08-18 15:23:30,562 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3963090.0, ans=0.0 2024-08-18 15:23:37,098 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 9700, loss[loss=0.08628, beats_loss=0.01209, ecapa_loss=8.698e-05, whisper_loss=0.07332, over 17868.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01035, ecapa_loss=0.0001454, whisper_loss=0.09038, over 3841045.37 frames. ], batch size: 66, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:23:44,381 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 21 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-18 15:23:50,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3963290.0, ans=0.2 2024-08-18 15:23:56,594 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.whiten.whitening_limit, batch_count=3963290.0, ans=12.0 2024-08-18 15:23:57,915 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.23 vs. limit=15.0 2024-08-18 15:24:10,391 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.55 vs. limit=15.0 2024-08-18 15:24:14,187 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 11 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-18 15:24:31,100 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 23 from LS+wenet, 20 from Vox, 49 fro AS 2024-08-18 15:24:50,579 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 9750, loss[loss=0.07883, beats_loss=0.01138, ecapa_loss=0.0001049, whisper_loss=0.0664, over 16088.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01051, ecapa_loss=0.000144, whisper_loss=0.08899, over 3835256.63 frames. ], batch size: 60, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:24:55,654 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3963690.0, ans=0.125 2024-08-18 15:24:57,572 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.62 vs. limit=10.0 2024-08-18 15:25:12,829 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3963790.0, ans=0.125 2024-08-18 15:25:31,465 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3963890.0, ans=0.125 2024-08-18 15:25:35,233 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-18 15:25:43,592 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3963990.0, ans=0.125 2024-08-18 15:25:51,340 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.207e+01 2.474e+01 2.731e+01 4.379e+01, threshold=4.949e+01, percent-clipped=0.0 2024-08-18 15:25:59,645 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 36 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-18 15:26:00,839 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 9800, loss[loss=0.1304, beats_loss=0.009159, ecapa_loss=0.0001387, whisper_loss=0.1198, over 23307.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01046, ecapa_loss=0.0001441, whisper_loss=0.08996, over 3851436.70 frames. ], batch size: 89, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:26:08,301 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.74 vs. limit=15.0 2024-08-18 15:26:13,621 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3964290.0, ans=0.1 2024-08-18 15:26:14,983 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3964290.0, ans=0.1 2024-08-18 15:26:20,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3964290.0, ans=0.09899494936611666 2024-08-18 15:26:23,198 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.93 vs. limit=15.0 2024-08-18 15:26:27,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3964390.0, ans=0.125 2024-08-18 15:26:35,311 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.85 vs. limit=12.0 2024-08-18 15:26:35,554 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.99 vs. limit=15.0 2024-08-18 15:26:38,587 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3964390.0, ans=0.0 2024-08-18 15:26:44,659 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-18 15:27:07,716 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.42 vs. limit=22.5 2024-08-18 15:27:09,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3964690.0, ans=0.0 2024-08-18 15:27:10,813 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 9850, loss[loss=0.08971, beats_loss=0.01421, ecapa_loss=0.0001651, whisper_loss=0.07385, over 15463.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01048, ecapa_loss=0.0001437, whisper_loss=0.0906, over 3878091.49 frames. ], batch size: 64, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:27:17,113 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.12 vs. limit=12.0 2024-08-18 15:27:24,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3964790.0, ans=0.0 2024-08-18 15:27:39,690 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-08-18 15:27:48,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3964890.0, ans=0.04949747468305833 2024-08-18 15:27:59,463 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-18 15:28:02,683 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.51 vs. limit=15.0 2024-08-18 15:28:02,841 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.38 vs. limit=6.0 2024-08-18 15:28:08,517 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.27 vs. limit=15.0 2024-08-18 15:28:09,194 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.740e+01 2.298e+01 2.545e+01 2.787e+01 5.182e+01, threshold=5.091e+01, percent-clipped=2.0 2024-08-18 15:28:16,872 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.35 vs. limit=10.0 2024-08-18 15:28:18,347 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 9900, loss[loss=0.09991, beats_loss=0.01051, ecapa_loss=0.0001479, whisper_loss=0.08793, over 22352.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01057, ecapa_loss=0.0001432, whisper_loss=0.09065, over 3919069.89 frames. ], batch size: 88, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:28:41,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3965290.0, ans=0.2 2024-08-18 15:29:02,934 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-18 15:29:18,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3965590.0, ans=0.125 2024-08-18 15:29:25,005 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 9950, loss[loss=0.07735, beats_loss=0.01248, ecapa_loss=0.0001559, whisper_loss=0.06332, over 21365.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01056, ecapa_loss=0.0001435, whisper_loss=0.09095, over 3927618.20 frames. ], batch size: 90, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:29:52,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3965890.0, ans=0.0 2024-08-18 15:30:00,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3965890.0, ans=0.125 2024-08-18 15:30:21,498 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.940e+01 2.257e+01 2.444e+01 2.772e+01 4.436e+01, threshold=4.888e+01, percent-clipped=0.0 2024-08-18 15:30:29,726 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 15:30:30,490 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 10000, loss[loss=0.1038, beats_loss=0.01173, ecapa_loss=0.0001497, whisper_loss=0.09053, over 20239.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01059, ecapa_loss=0.0001442, whisper_loss=0.09057, over 3908043.45 frames. ], batch size: 84, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:30:34,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3966190.0, ans=0.125 2024-08-18 15:30:36,385 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3966190.0, ans=0.125 2024-08-18 15:31:01,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3966390.0, ans=0.015 2024-08-18 15:31:02,616 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3966390.0, ans=0.95 2024-08-18 15:31:04,075 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.13 vs. limit=15.0 2024-08-18 15:31:04,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3966390.0, ans=0.125 2024-08-18 15:31:10,237 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3966490.0, ans=0.1 2024-08-18 15:31:15,980 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 15:31:31,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3966590.0, ans=0.125 2024-08-18 15:31:36,878 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 10050, loss[loss=0.09421, beats_loss=0.009233, ecapa_loss=0.0001697, whisper_loss=0.08328, over 13585.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0105, ecapa_loss=0.000145, whisper_loss=0.09038, over 3878898.99 frames. ], batch size: 55, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:31:41,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3966690.0, ans=0.5 2024-08-18 15:31:47,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3966690.0, ans=0.0 2024-08-18 15:32:03,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3966890.0, ans=0.125 2024-08-18 15:32:06,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3966890.0, ans=0.125 2024-08-18 15:32:34,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3967090.0, ans=0.125 2024-08-18 15:32:35,667 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.258e+01 2.497e+01 2.785e+01 5.136e+01, threshold=4.994e+01, percent-clipped=1.0 2024-08-18 15:32:40,493 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3967090.0, ans=0.125 2024-08-18 15:32:45,157 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 10100, loss[loss=0.09423, beats_loss=0.01046, ecapa_loss=0.0001424, whisper_loss=0.08235, over 19401.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01055, ecapa_loss=0.0001443, whisper_loss=0.09005, over 3865981.33 frames. ], batch size: 77, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:32:47,993 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 34 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-18 15:32:48,489 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.94 vs. limit=22.5 2024-08-18 15:32:51,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3967190.0, ans=0.125 2024-08-18 15:32:56,108 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.99 vs. limit=15.0 2024-08-18 15:33:00,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3967290.0, ans=0.125 2024-08-18 15:33:12,924 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 22 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-18 15:33:50,953 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 10150, loss[loss=0.128, beats_loss=0.006981, ecapa_loss=0.0001658, whisper_loss=0.1193, over 15421.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01047, ecapa_loss=0.0001455, whisper_loss=0.0902, over 3852513.13 frames. ], batch size: 61, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:34:27,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3967890.0, ans=0.125 2024-08-18 15:34:35,371 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-18 15:34:39,421 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 22 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-18 15:34:43,483 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-18 15:34:50,169 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.305e+01 2.555e+01 2.872e+01 4.370e+01, threshold=5.110e+01, percent-clipped=0.0 2024-08-18 15:34:51,619 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-18 15:34:53,463 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3968090.0, ans=0.0 2024-08-18 15:34:59,492 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 10200, loss[loss=0.09164, beats_loss=0.01204, ecapa_loss=0.0001286, whisper_loss=0.07831, over 23399.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01039, ecapa_loss=0.0001457, whisper_loss=0.09046, over 3872858.19 frames. ], batch size: 93, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:35:09,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3968190.0, ans=0.125 2024-08-18 15:35:11,168 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3968290.0, ans=0.0 2024-08-18 15:35:24,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3968390.0, ans=0.125 2024-08-18 15:35:33,889 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 31 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-18 15:35:45,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3968490.0, ans=0.1 2024-08-18 15:36:04,962 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 10250, loss[loss=0.09385, beats_loss=0.01034, ecapa_loss=0.0001414, whisper_loss=0.08209, over 22372.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01037, ecapa_loss=0.0001456, whisper_loss=0.09043, over 3869882.65 frames. ], batch size: 88, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:36:07,892 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 23 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-18 15:36:10,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3968690.0, ans=0.1 2024-08-18 15:36:38,316 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3968890.0, ans=0.125 2024-08-18 15:36:50,578 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 27 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-18 15:37:01,342 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.987e+01 2.331e+01 2.519e+01 2.797e+01 4.005e+01, threshold=5.039e+01, percent-clipped=0.0 2024-08-18 15:37:11,004 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 10300, loss[loss=0.08089, beats_loss=0.01294, ecapa_loss=0.0001163, whisper_loss=0.06679, over 20205.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01048, ecapa_loss=0.0001448, whisper_loss=0.08996, over 3897615.28 frames. ], batch size: 80, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:37:11,269 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-18 15:37:14,107 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3969190.0, ans=0.0 2024-08-18 15:37:18,340 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3969190.0, ans=0.125 2024-08-18 15:37:19,294 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 20 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-18 15:37:20,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3969190.0, ans=0.0 2024-08-18 15:37:23,679 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3969290.0, ans=0.0 2024-08-18 15:37:48,464 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.39 vs. limit=22.5 2024-08-18 15:38:04,391 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3969490.0, ans=0.07 2024-08-18 15:38:12,097 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 30 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-18 15:38:13,829 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3969590.0, ans=0.125 2024-08-18 15:38:20,138 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 10350, loss[loss=0.1118, beats_loss=0.01045, ecapa_loss=0.0001277, whisper_loss=0.1001, over 23681.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01054, ecapa_loss=0.0001442, whisper_loss=0.08962, over 3909565.63 frames. ], batch size: 94, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:38:29,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3969690.0, ans=0.09899494936611666 2024-08-18 15:38:39,183 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 28 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-18 15:38:56,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=3969890.0, ans=22.5 2024-08-18 15:39:02,287 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.36 vs. limit=15.0 2024-08-18 15:39:20,515 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.307e+01 2.571e+01 2.921e+01 7.269e+01, threshold=5.142e+01, percent-clipped=1.0 2024-08-18 15:39:30,278 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 10400, loss[loss=0.07923, beats_loss=0.01193, ecapa_loss=0.0001072, whisper_loss=0.06623, over 17111.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01055, ecapa_loss=0.0001445, whisper_loss=0.08938, over 3903731.66 frames. ], batch size: 67, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:39:43,013 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.16 vs. limit=15.0 2024-08-18 15:40:16,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3970490.0, ans=0.2 2024-08-18 15:40:17,060 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 24 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-18 15:40:33,089 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 15 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-18 15:40:38,019 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 10450, loss[loss=0.09585, beats_loss=0.01202, ecapa_loss=0.000141, whisper_loss=0.08243, over 22528.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.0105, ecapa_loss=0.0001446, whisper_loss=0.08898, over 3882109.91 frames. ], batch size: 93, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:40:38,143 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-18 15:40:47,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3970690.0, ans=0.2 2024-08-18 15:40:51,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3970790.0, ans=0.125 2024-08-18 15:40:52,154 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.96 vs. limit=15.0 2024-08-18 15:40:52,623 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-18 15:40:52,795 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3970790.0, ans=0.2 2024-08-18 15:40:53,056 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2024-08-18 15:40:56,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3970790.0, ans=0.125 2024-08-18 15:41:02,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3970890.0, ans=0.125 2024-08-18 15:41:04,764 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.08 vs. limit=15.0 2024-08-18 15:41:13,644 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-18 15:41:15,819 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3970890.0, ans=0.2 2024-08-18 15:41:21,566 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 36 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-18 15:41:23,153 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=9.348e+00 2024-08-18 15:41:34,985 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.245e+01 2.442e+01 2.727e+01 4.233e+01, threshold=4.884e+01, percent-clipped=0.0 2024-08-18 15:41:36,478 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-18 15:41:36,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3971090.0, ans=0.1 2024-08-18 15:41:36,983 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.73 vs. limit=15.0 2024-08-18 15:41:44,289 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 10500, loss[loss=0.09701, beats_loss=0.01125, ecapa_loss=0.0001409, whisper_loss=0.08435, over 21315.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01046, ecapa_loss=0.0001457, whisper_loss=0.08952, over 3895878.73 frames. ], batch size: 87, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:41:49,664 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 17 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-18 15:41:55,825 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-18 15:42:29,340 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.738e-01 2024-08-18 15:42:33,424 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 15:42:45,016 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-18 15:42:50,228 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 10550, loss[loss=0.09897, beats_loss=0.01178, ecapa_loss=0.0001404, whisper_loss=0.08578, over 21890.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0105, ecapa_loss=0.0001448, whisper_loss=0.08941, over 3903241.22 frames. ], batch size: 91, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:42:57,733 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.18 vs. limit=15.0 2024-08-18 15:43:09,637 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 30 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-18 15:43:16,415 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3971890.0, ans=0.125 2024-08-18 15:43:16,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3971890.0, ans=0.125 2024-08-18 15:43:21,593 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3971890.0, ans=0.125 2024-08-18 15:43:45,876 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.358e+01 2.575e+01 2.904e+01 4.365e+01, threshold=5.151e+01, percent-clipped=0.0 2024-08-18 15:43:52,098 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 15:43:55,683 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 10600, loss[loss=0.09797, beats_loss=0.009194, ecapa_loss=0.0001184, whisper_loss=0.0876, over 15538.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01044, ecapa_loss=0.0001457, whisper_loss=0.0893, over 3905156.71 frames. ], batch size: 59, lr: 2.26e-03, grad_scale: 1.152921504606847e+18 2024-08-18 15:44:01,390 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3972190.0, ans=0.05 2024-08-18 15:44:03,653 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 37 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-18 15:44:04,081 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3972190.0, ans=0.0 2024-08-18 15:44:06,916 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3972190.0, ans=0.0 2024-08-18 15:44:14,578 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-18 15:44:26,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3972390.0, ans=0.09899494936611666 2024-08-18 15:45:02,375 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 10650, loss[loss=0.09037, beats_loss=0.01151, ecapa_loss=0.0001437, whisper_loss=0.07742, over 20754.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01043, ecapa_loss=0.0001444, whisper_loss=0.09008, over 3928953.34 frames. ], batch size: 84, lr: 2.26e-03, grad_scale: 1.152921504606847e+18 2024-08-18 15:45:05,010 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 24 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-18 15:45:08,503 WARNING [optim.py:496] (2/4) Scaling gradients by 0.027766374871134758, model_norm_threshold=51.50757598876953 2024-08-18 15:45:08,672 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.457e+05, grad_sumsq=1.339e+05, orig_rms_sq=3.328e+00 2024-08-18 15:45:12,786 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-18 15:45:16,273 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.67 vs. limit=15.0 2024-08-18 15:45:19,520 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-18 15:45:19,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3972790.0, ans=0.1 2024-08-18 15:45:23,272 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3972790.0, ans=0.125 2024-08-18 15:45:25,774 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 21 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-18 15:45:26,485 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3972790.0, ans=0.125 2024-08-18 15:45:33,219 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-18 15:45:44,804 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3972990.0, ans=0.125 2024-08-18 15:46:01,419 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.310e+01 2.515e+01 2.823e+01 1.855e+03, threshold=5.029e+01, percent-clipped=1.0 2024-08-18 15:46:01,572 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-18 15:46:10,793 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 10700, loss[loss=0.1015, beats_loss=0.01291, ecapa_loss=0.0001237, whisper_loss=0.08738, over 22564.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01038, ecapa_loss=0.0001435, whisper_loss=0.09111, over 3942784.56 frames. ], batch size: 92, lr: 2.26e-03, grad_scale: 1.152921504606847e+18 2024-08-18 15:46:14,810 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 21 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-18 15:46:44,760 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 22 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-18 15:46:45,075 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3973390.0, ans=0.1 2024-08-18 15:46:55,183 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 11 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-18 15:47:01,834 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-18 15:47:21,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3973690.0, ans=0.0 2024-08-18 15:47:22,606 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 10750, loss[loss=0.1163, beats_loss=0.008279, ecapa_loss=0.0001966, whisper_loss=0.1061, over 18209.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01041, ecapa_loss=0.0001434, whisper_loss=0.09099, over 3935458.76 frames. ], batch size: 76, lr: 2.26e-03, grad_scale: 1.152921504606847e+18 2024-08-18 15:47:23,098 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3973690.0, ans=0.2 2024-08-18 15:47:23,128 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3973690.0, ans=0.0 2024-08-18 15:47:29,371 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.26 vs. limit=15.0 2024-08-18 15:47:30,365 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=3973690.0, ans=0.05 2024-08-18 15:47:50,326 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.34 vs. limit=22.5 2024-08-18 15:47:50,937 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 26 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-18 15:47:54,702 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 19 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-18 15:48:27,964 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.366e+01 2.579e+01 2.828e+01 3.318e+02, threshold=5.158e+01, percent-clipped=2.0 2024-08-18 15:48:35,623 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3974090.0, ans=0.125 2024-08-18 15:48:36,890 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 35 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-18 15:48:38,064 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 10800, loss[loss=0.1267, beats_loss=0.009471, ecapa_loss=0.0001385, whisper_loss=0.1158, over 24022.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01057, ecapa_loss=0.0001425, whisper_loss=0.09061, over 3917217.29 frames. ], batch size: 91, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:48:47,265 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.72 vs. limit=12.0 2024-08-18 15:48:58,942 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-18 15:49:06,935 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3974390.0, ans=0.1 2024-08-18 15:49:22,455 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.42 vs. limit=10.0 2024-08-18 15:49:54,539 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 10850, loss[loss=0.1169, beats_loss=0.009711, ecapa_loss=0.0001305, whisper_loss=0.1059, over 22767.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01053, ecapa_loss=0.0001422, whisper_loss=0.09159, over 3934572.90 frames. ], batch size: 88, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:49:55,446 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3974690.0, ans=0.125 2024-08-18 15:50:03,592 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 19 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-18 15:50:31,493 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3974890.0, ans=0.0 2024-08-18 15:51:00,531 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3975090.0, ans=0.125 2024-08-18 15:51:00,546 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3975090.0, ans=0.07 2024-08-18 15:51:01,346 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.249e+01 2.453e+01 2.667e+01 3.964e+01, threshold=4.906e+01, percent-clipped=0.0 2024-08-18 15:51:10,416 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 10900, loss[loss=0.1009, beats_loss=0.01072, ecapa_loss=0.0001497, whisper_loss=0.0887, over 18872.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0105, ecapa_loss=0.0001429, whisper_loss=0.09153, over 3931964.97 frames. ], batch size: 73, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:51:10,635 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-18 15:51:20,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3975190.0, ans=0.125 2024-08-18 15:51:23,498 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3975190.0, ans=0.2 2024-08-18 15:51:28,199 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-18 15:51:59,407 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.32 vs. limit=15.0 2024-08-18 15:52:04,621 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 15:52:27,000 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 10950, loss[loss=0.123, beats_loss=0.01039, ecapa_loss=0.0001162, whisper_loss=0.1114, over 23152.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01046, ecapa_loss=0.0001435, whisper_loss=0.09207, over 3928215.53 frames. ], batch size: 89, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:52:28,703 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3975690.0, ans=0.125 2024-08-18 15:52:30,273 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3975690.0, ans=0.1 2024-08-18 15:52:50,214 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 15:53:02,344 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-18 15:53:33,582 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.325e+01 2.540e+01 2.829e+01 5.122e+01, threshold=5.080e+01, percent-clipped=1.0 2024-08-18 15:53:33,794 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 25 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-18 15:53:43,352 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 11000, loss[loss=0.1052, beats_loss=0.01069, ecapa_loss=0.0001445, whisper_loss=0.09303, over 22733.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01046, ecapa_loss=0.0001447, whisper_loss=0.09149, over 3942270.42 frames. ], batch size: 92, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:53:47,879 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3976190.0, ans=0.125 2024-08-18 15:53:51,300 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3976190.0, ans=0.125 2024-08-18 15:54:05,764 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.20 vs. limit=15.0 2024-08-18 15:54:33,464 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3976490.0, ans=0.125 2024-08-18 15:54:36,520 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3976490.0, ans=0.125 2024-08-18 15:54:39,552 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 28 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-18 15:55:05,546 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 11050, loss[loss=0.1001, beats_loss=0.01262, ecapa_loss=0.0001315, whisper_loss=0.08618, over 21562.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01037, ecapa_loss=0.000145, whisper_loss=0.09165, over 3900672.58 frames. ], batch size: 86, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:55:05,746 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-18 15:55:06,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3976690.0, ans=0.2 2024-08-18 15:55:07,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3976690.0, ans=0.125 2024-08-18 15:55:11,110 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3976690.0, ans=0.07 2024-08-18 15:55:45,485 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-18 15:56:02,789 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3976990.0, ans=0.125 2024-08-18 15:56:12,634 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.281e+01 2.516e+01 2.908e+01 1.267e+02, threshold=5.032e+01, percent-clipped=1.0 2024-08-18 15:56:20,815 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 37 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-18 15:56:21,879 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 11100, loss[loss=0.1308, beats_loss=0.007811, ecapa_loss=0.0001782, whisper_loss=0.1212, over 22994.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01034, ecapa_loss=0.0001462, whisper_loss=0.09145, over 3882677.34 frames. ], batch size: 91, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:56:21,979 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 37 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-18 15:56:33,199 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3977190.0, ans=0.0 2024-08-18 15:56:42,936 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 28 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-18 15:56:52,762 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 21 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-18 15:56:57,612 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 33 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-18 15:57:08,392 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3977490.0, ans=0.125 2024-08-18 15:57:17,251 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3977490.0, ans=0.125 2024-08-18 15:57:25,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3977590.0, ans=0.125 2024-08-18 15:57:35,331 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 11150, loss[loss=0.1147, beats_loss=0.008879, ecapa_loss=0.0001534, whisper_loss=0.1043, over 19710.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01037, ecapa_loss=0.0001457, whisper_loss=0.09177, over 3889592.13 frames. ], batch size: 79, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:57:45,985 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3977690.0, ans=0.125 2024-08-18 15:57:48,510 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-18 15:57:57,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=3977790.0, ans=15.0 2024-08-18 15:58:14,943 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.75 vs. limit=15.0 2024-08-18 15:58:21,412 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 24 from LS+wenet, 20 from Vox, 17 fro AS 2024-08-18 15:58:38,482 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.725e+01 2.330e+01 2.608e+01 2.859e+01 1.941e+02, threshold=5.216e+01, percent-clipped=1.0 2024-08-18 15:58:43,774 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3978090.0, ans=0.2 2024-08-18 15:58:47,721 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 11200, loss[loss=0.1111, beats_loss=0.01123, ecapa_loss=0.0001069, whisper_loss=0.09877, over 15935.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01039, ecapa_loss=0.0001449, whisper_loss=0.09111, over 3854596.00 frames. ], batch size: 59, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:58:58,685 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 15 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-18 15:59:06,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3978290.0, ans=15.0 2024-08-18 15:59:09,594 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3978290.0, ans=0.0 2024-08-18 15:59:13,871 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3978290.0, ans=0.5 2024-08-18 15:59:36,998 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3978490.0, ans=0.0 2024-08-18 15:59:40,844 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.78 vs. limit=12.0 2024-08-18 15:59:45,160 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3978490.0, ans=0.0 2024-08-18 15:59:54,497 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.13 vs. limit=15.0 2024-08-18 16:00:06,965 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 11250, loss[loss=0.08845, beats_loss=0.01032, ecapa_loss=0.0001014, whisper_loss=0.07712, over 14600.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01047, ecapa_loss=0.0001435, whisper_loss=0.09139, over 3846441.34 frames. ], batch size: 54, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:00:08,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3978690.0, ans=0.0 2024-08-18 16:00:34,414 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3978790.0, ans=0.0 2024-08-18 16:00:35,372 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-18 16:00:39,083 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3978890.0, ans=0.125 2024-08-18 16:00:39,362 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.67 vs. limit=15.0 2024-08-18 16:00:43,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3978890.0, ans=0.125 2024-08-18 16:00:46,114 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3978890.0, ans=0.0 2024-08-18 16:00:52,347 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3978990.0, ans=0.125 2024-08-18 16:01:01,174 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 28 from LS+wenet, 20 from Vox, 14 fro AS 2024-08-18 16:01:12,820 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.408e+01 2.624e+01 3.093e+01 2.615e+02, threshold=5.248e+01, percent-clipped=2.0 2024-08-18 16:01:20,693 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 19 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-18 16:01:22,312 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 11300, loss[loss=0.1038, beats_loss=0.009767, ecapa_loss=0.0001472, whisper_loss=0.0926, over 16076.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01043, ecapa_loss=0.0001445, whisper_loss=0.09119, over 3846351.04 frames. ], batch size: 61, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:01:42,142 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3979290.0, ans=0.125 2024-08-18 16:02:03,806 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3979390.0, ans=0.09899494936611666 2024-08-18 16:02:11,011 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3979490.0, ans=0.1 2024-08-18 16:02:14,107 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 19 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-18 16:02:38,435 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 11350, loss[loss=0.1239, beats_loss=0.0099, ecapa_loss=0.000195, whisper_loss=0.1121, over 20679.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0104, ecapa_loss=0.0001447, whisper_loss=0.09098, over 3850759.88 frames. ], batch size: 88, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:02:57,108 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3979790.0, ans=0.125 2024-08-18 16:03:05,402 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 24 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-18 16:03:13,525 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 16 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-18 16:03:20,946 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 21 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-18 16:03:24,215 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3979990.0, ans=0.0 2024-08-18 16:03:25,104 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 26 from LS+wenet, 26 from Vox, 21 fro AS 2024-08-18 16:03:27,364 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3979990.0, ans=0.125 2024-08-18 16:03:34,114 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3979990.0, ans=0.125 2024-08-18 16:03:37,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3979990.0, ans=0.0 2024-08-18 16:03:46,649 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+01 2.287e+01 2.489e+01 2.829e+01 3.988e+01, threshold=4.979e+01, percent-clipped=0.0 2024-08-18 16:03:55,501 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 11400, loss[loss=0.1094, beats_loss=0.01041, ecapa_loss=0.0001569, whisper_loss=0.09739, over 21913.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01039, ecapa_loss=0.0001448, whisper_loss=0.09111, over 3876023.92 frames. ], batch size: 90, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:04:07,087 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3980190.0, ans=0.0 2024-08-18 16:04:10,225 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3980190.0, ans=0.125 2024-08-18 16:04:10,537 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.65 vs. limit=12.0 2024-08-18 16:04:17,522 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3980290.0, ans=0.1 2024-08-18 16:04:42,363 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-18 16:05:08,390 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.27 vs. limit=15.0 2024-08-18 16:05:13,279 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 11450, loss[loss=0.1034, beats_loss=0.01014, ecapa_loss=0.0001327, whisper_loss=0.09197, over 22936.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01034, ecapa_loss=0.0001448, whisper_loss=0.09163, over 3893921.49 frames. ], batch size: 88, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:05:13,553 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-18 16:05:16,299 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 22 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-18 16:05:23,174 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3980690.0, ans=0.125 2024-08-18 16:05:28,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3980790.0, ans=0.0 2024-08-18 16:05:32,732 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-18 16:05:39,894 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.430e+05 2024-08-18 16:06:07,665 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3980990.0, ans=0.0 2024-08-18 16:06:27,329 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.18 vs. limit=6.0 2024-08-18 16:06:27,850 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.932e+01 2.331e+01 2.551e+01 2.848e+01 4.379e+01, threshold=5.102e+01, percent-clipped=0.0 2024-08-18 16:06:37,183 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 11500, loss[loss=0.09856, beats_loss=0.008965, ecapa_loss=0.0001759, whisper_loss=0.08784, over 19830.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0103, ecapa_loss=0.0001441, whisper_loss=0.0919, over 3891429.54 frames. ], batch size: 81, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:06:48,500 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 16:06:53,094 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.37 vs. limit=15.0 2024-08-18 16:06:58,112 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-18 16:06:59,332 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3981290.0, ans=0.0 2024-08-18 16:07:09,484 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3981290.0, ans=0.0 2024-08-18 16:07:11,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3981390.0, ans=0.0 2024-08-18 16:07:20,490 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3981390.0, ans=0.1 2024-08-18 16:07:48,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3981490.0, ans=0.125 2024-08-18 16:07:49,847 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-18 16:07:51,698 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 17 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-18 16:07:55,934 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 15 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-18 16:08:04,622 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3981590.0, ans=0.125 2024-08-18 16:08:10,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3981590.0, ans=0.125 2024-08-18 16:08:18,257 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 11550, loss[loss=0.1123, beats_loss=0.01052, ecapa_loss=0.0001452, whisper_loss=0.1004, over 19882.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01022, ecapa_loss=0.0001451, whisper_loss=0.09179, over 3881208.08 frames. ], batch size: 81, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:08:31,437 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-18 16:08:43,950 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3981790.0, ans=0.0 2024-08-18 16:08:46,788 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 24 from LS+wenet, 10 from Vox, 33 fro AS 2024-08-18 16:08:50,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=3981790.0, ans=15.0 2024-08-18 16:09:02,650 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 22 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-18 16:09:04,490 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3981890.0, ans=10.0 2024-08-18 16:09:16,165 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3981890.0, ans=0.125 2024-08-18 16:09:27,062 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2024-08-18 16:09:36,589 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3981990.0, ans=0.1 2024-08-18 16:09:38,766 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.73 vs. limit=12.0 2024-08-18 16:09:41,533 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 17 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-18 16:09:53,869 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.363e+01 2.578e+01 2.890e+01 4.329e+01, threshold=5.155e+01, percent-clipped=0.0 2024-08-18 16:10:08,494 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 11600, loss[loss=0.09597, beats_loss=0.01057, ecapa_loss=0.0001329, whisper_loss=0.08406, over 22061.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01029, ecapa_loss=0.0001445, whisper_loss=0.091, over 3892346.31 frames. ], batch size: 90, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:10:27,485 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3982190.0, ans=0.0 2024-08-18 16:10:30,456 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 16 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-18 16:10:36,506 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-18 16:10:41,432 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 20 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-18 16:11:18,212 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3982390.0, ans=0.0 2024-08-18 16:11:18,496 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.15 vs. limit=15.0 2024-08-18 16:11:41,924 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3982490.0, ans=0.1 2024-08-18 16:11:42,722 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 15 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-18 16:12:06,971 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3982590.0, ans=0.2 2024-08-18 16:12:10,428 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 11650, loss[loss=0.1098, beats_loss=0.008735, ecapa_loss=0.0001331, whisper_loss=0.09974, over 22327.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01028, ecapa_loss=0.0001447, whisper_loss=0.09117, over 3891656.12 frames. ], batch size: 87, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:12:19,457 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.60 vs. limit=12.0 2024-08-18 16:12:20,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3982690.0, ans=0.125 2024-08-18 16:12:53,823 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-18 16:12:59,870 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.490e+01 2024-08-18 16:13:08,775 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 22 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-18 16:13:16,587 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.56 vs. limit=12.0 2024-08-18 16:13:44,409 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3983090.0, ans=0.125 2024-08-18 16:13:50,205 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3983090.0, ans=0.125 2024-08-18 16:13:51,004 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.304e+01 2.600e+01 2.905e+01 3.001e+02, threshold=5.199e+01, percent-clipped=1.0 2024-08-18 16:14:04,555 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 11700, loss[loss=0.09412, beats_loss=0.01337, ecapa_loss=0.0001204, whisper_loss=0.07955, over 17142.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01039, ecapa_loss=0.0001444, whisper_loss=0.09114, over 3904893.96 frames. ], batch size: 70, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:14:13,854 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 28 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-18 16:14:34,533 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3983290.0, ans=0.2 2024-08-18 16:14:42,631 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3983290.0, ans=0.05 2024-08-18 16:14:56,132 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2024-08-18 16:15:03,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3983490.0, ans=0.125 2024-08-18 16:15:07,639 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3983490.0, ans=0.125 2024-08-18 16:15:26,139 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.877e+01 2024-08-18 16:15:30,000 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 11750, loss[loss=0.06823, beats_loss=0.01603, ecapa_loss=0.000132, whisper_loss=0.05088, over 14778.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01052, ecapa_loss=0.0001433, whisper_loss=0.09045, over 3916457.98 frames. ], batch size: 64, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:15:30,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3983690.0, ans=0.125 2024-08-18 16:15:34,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3983690.0, ans=0.2 2024-08-18 16:15:54,945 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 33 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-18 16:16:08,972 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-18 16:16:12,765 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3983890.0, ans=0.0 2024-08-18 16:16:13,746 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 14 from Vox, 52 fro AS 2024-08-18 16:16:16,771 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-18 16:16:19,851 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.36 vs. limit=12.0 2024-08-18 16:16:38,255 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3984090.0, ans=0.0 2024-08-18 16:16:39,129 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.018e+01 2.396e+01 2.657e+01 3.044e+01 4.817e+01, threshold=5.315e+01, percent-clipped=0.0 2024-08-18 16:16:44,163 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 20 from LS+wenet, 26 from Vox, 48 fro AS 2024-08-18 16:16:48,235 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 11800, loss[loss=0.06931, beats_loss=0.0125, ecapa_loss=0.0001576, whisper_loss=0.05524, over 16255.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01054, ecapa_loss=0.0001443, whisper_loss=0.09041, over 3911764.06 frames. ], batch size: 70, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:16:57,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3984190.0, ans=0.125 2024-08-18 16:17:09,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3984290.0, ans=0.2 2024-08-18 16:17:11,705 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.97 vs. limit=15.0 2024-08-18 16:17:26,676 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3984390.0, ans=0.125 2024-08-18 16:17:31,033 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3984390.0, ans=0.015 2024-08-18 16:17:44,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3984490.0, ans=0.0 2024-08-18 16:17:45,317 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.91 vs. limit=10.0 2024-08-18 16:17:49,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3984490.0, ans=0.125 2024-08-18 16:17:58,250 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 22 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-18 16:18:01,400 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3984590.0, ans=0.125 2024-08-18 16:18:09,387 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 11850, loss[loss=0.1072, beats_loss=0.01086, ecapa_loss=0.0001481, whisper_loss=0.09482, over 23157.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01062, ecapa_loss=0.0001429, whisper_loss=0.09004, over 3940199.75 frames. ], batch size: 91, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:18:09,502 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 19 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-18 16:18:11,152 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-18 16:18:13,153 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 19 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-18 16:18:17,437 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 32 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-18 16:18:18,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3984690.0, ans=0.2 2024-08-18 16:18:35,740 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3984790.0, ans=0.125 2024-08-18 16:18:56,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3984990.0, ans=0.125 2024-08-18 16:19:06,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3984990.0, ans=0.09899494936611666 2024-08-18 16:19:17,967 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.743e+01 2.255e+01 2.535e+01 2.793e+01 4.854e+01, threshold=5.071e+01, percent-clipped=0.0 2024-08-18 16:19:18,098 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-18 16:19:26,144 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 11900, loss[loss=0.08645, beats_loss=0.0111, ecapa_loss=0.0001877, whisper_loss=0.07347, over 18074.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01056, ecapa_loss=0.0001444, whisper_loss=0.08992, over 3943014.75 frames. ], batch size: 80, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:19:41,197 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-18 16:19:46,883 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.74 vs. limit=15.0 2024-08-18 16:19:54,550 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=13.33 vs. limit=15.0 2024-08-18 16:20:29,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3985590.0, ans=0.0 2024-08-18 16:20:29,355 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3985590.0, ans=0.0 2024-08-18 16:20:38,290 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3985590.0, ans=0.125 2024-08-18 16:20:42,504 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 11950, loss[loss=0.1075, beats_loss=0.008347, ecapa_loss=0.0001955, whisper_loss=0.09723, over 16784.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01048, ecapa_loss=0.0001446, whisper_loss=0.08987, over 3897574.30 frames. ], batch size: 70, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:20:49,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3985690.0, ans=0.125 2024-08-18 16:20:54,168 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 18 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-18 16:21:16,270 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 19 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-18 16:21:31,397 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 27 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-18 16:21:43,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3985990.0, ans=0.07 2024-08-18 16:21:44,754 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-18 16:21:50,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3986090.0, ans=0.1 2024-08-18 16:21:54,293 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.727e+01 2.273e+01 2.566e+01 2.844e+01 1.117e+02, threshold=5.132e+01, percent-clipped=1.0 2024-08-18 16:21:55,990 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-18 16:22:03,598 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 12000, loss[loss=0.08396, beats_loss=0.01191, ecapa_loss=0.0001454, whisper_loss=0.0706, over 13877.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01051, ecapa_loss=0.0001444, whisper_loss=0.08953, over 3872002.59 frames. ], batch size: 57, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:22:03,599 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-18 16:22:37,106 INFO [train_multi_KD3.py:1149] (2/4) Epoch 27, validation on ASR_libri: loss=0.2532, beats_loss=0, ecapa_loss=0.0005101, whisper_loss=0.2481, over 922467.00 frames. 2024-08-18 16:22:55,573 INFO [train_multi_KD3.py:1149] (2/4) Epoch 27, validation on SV_voxceleb1: loss=0.004067, beats_loss=0, ecapa_loss=0.0004067, whisper_loss=0, over 939242.00 frames. 2024-08-18 16:24:34,657 INFO [train_multi_KD3.py:1149] (2/4) Epoch 27, validation on AT_audioset: loss=0.02313, beats_loss=0.02313, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 16:24:34,661 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31584MB 2024-08-18 16:24:51,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3986290.0, ans=0.125 2024-08-18 16:24:52,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3986290.0, ans=0.1 2024-08-18 16:24:54,710 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.61 vs. limit=12.0 2024-08-18 16:25:08,509 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3986390.0, ans=0.125 2024-08-18 16:25:11,876 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 28 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-18 16:25:14,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3986390.0, ans=0.07 2024-08-18 16:25:25,741 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 19 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-18 16:25:52,574 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 12050, loss[loss=0.09513, beats_loss=0.01071, ecapa_loss=0.0001275, whisper_loss=0.08314, over 22421.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0105, ecapa_loss=0.0001447, whisper_loss=0.08941, over 3831183.58 frames. ], batch size: 88, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:25:53,589 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3986690.0, ans=0.0 2024-08-18 16:26:20,541 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3986790.0, ans=0.1 2024-08-18 16:26:25,049 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3986890.0, ans=0.0 2024-08-18 16:26:31,307 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-18 16:26:32,048 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3986890.0, ans=0.5 2024-08-18 16:26:36,848 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3986890.0, ans=0.2 2024-08-18 16:26:42,885 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 12 from Vox, 38 fro AS 2024-08-18 16:26:47,868 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.297e+05 2024-08-18 16:26:51,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3986990.0, ans=0.0 2024-08-18 16:27:02,261 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.347e+01 2.564e+01 2.841e+01 2.951e+02, threshold=5.127e+01, percent-clipped=2.0 2024-08-18 16:27:06,674 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-18 16:27:11,247 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 12100, loss[loss=0.1163, beats_loss=0.009496, ecapa_loss=0.0001263, whisper_loss=0.1056, over 16125.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01054, ecapa_loss=0.0001444, whisper_loss=0.08918, over 3831038.90 frames. ], batch size: 61, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:27:11,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3987190.0, ans=0.125 2024-08-18 16:27:13,606 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3987190.0, ans=0.2 2024-08-18 16:27:38,594 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-18 16:27:43,711 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.53 vs. limit=15.0 2024-08-18 16:27:52,308 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-18 16:27:53,726 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.26 vs. limit=15.0 2024-08-18 16:27:57,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3987490.0, ans=0.125 2024-08-18 16:27:58,795 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 22 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-18 16:28:11,671 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 27 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-18 16:28:13,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3987590.0, ans=0.125 2024-08-18 16:28:17,760 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 15 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-18 16:28:23,729 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 16 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-18 16:28:29,504 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 12150, loss[loss=0.1117, beats_loss=0.0106, ecapa_loss=0.0001477, whisper_loss=0.09963, over 21004.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01055, ecapa_loss=0.0001448, whisper_loss=0.08916, over 3827952.18 frames. ], batch size: 85, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:28:31,582 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 22 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-18 16:28:32,992 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 24 from LS+wenet, 32 from Vox, 38 fro AS 2024-08-18 16:28:34,987 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3987690.0, ans=0.125 2024-08-18 16:28:40,055 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3987690.0, ans=0.125 2024-08-18 16:28:42,343 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.07 vs. limit=22.5 2024-08-18 16:28:43,594 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3987690.0, ans=0.125 2024-08-18 16:28:54,101 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.50 vs. limit=15.0 2024-08-18 16:28:54,680 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-18 16:29:23,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3987990.0, ans=0.0 2024-08-18 16:29:26,187 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3987990.0, ans=0.0 2024-08-18 16:29:31,405 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3987990.0, ans=0.125 2024-08-18 16:29:39,150 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.283e+01 2.537e+01 2.740e+01 4.505e+01, threshold=5.074e+01, percent-clipped=0.0 2024-08-18 16:29:47,834 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 12200, loss[loss=0.1017, beats_loss=0.01405, ecapa_loss=9.91e-05, whisper_loss=0.08661, over 23145.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01051, ecapa_loss=0.0001449, whisper_loss=0.08914, over 3796992.77 frames. ], batch size: 90, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:29:48,847 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3988190.0, ans=0.125 2024-08-18 16:29:53,002 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.45 vs. limit=12.0 2024-08-18 16:30:01,374 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3988290.0, ans=0.0 2024-08-18 16:30:07,670 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=5.494e+00 2024-08-18 16:30:20,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3988390.0, ans=0.2 2024-08-18 16:30:23,181 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.380e+05 2024-08-18 16:30:52,211 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-18 16:31:00,548 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 12250, loss[loss=0.09977, beats_loss=0.008735, ecapa_loss=0.0002087, whisper_loss=0.08895, over 17853.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01048, ecapa_loss=0.0001451, whisper_loss=0.08959, over 3841344.07 frames. ], batch size: 78, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:31:15,453 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-18 16:31:19,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3988790.0, ans=0.04949747468305833 2024-08-18 16:31:48,516 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3988990.0, ans=0.1 2024-08-18 16:32:03,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3989090.0, ans=0.125 2024-08-18 16:32:04,738 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 16 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-18 16:32:07,793 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.312e+01 2.540e+01 2.795e+01 3.669e+01, threshold=5.080e+01, percent-clipped=0.0 2024-08-18 16:32:08,924 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3989090.0, ans=0.125 2024-08-18 16:32:16,074 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 24 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-18 16:32:17,489 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 12300, loss[loss=0.101, beats_loss=0.01204, ecapa_loss=0.0001645, whisper_loss=0.08735, over 18961.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0105, ecapa_loss=0.0001446, whisper_loss=0.08957, over 3868000.25 frames. ], batch size: 78, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:32:31,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3989290.0, ans=0.0 2024-08-18 16:32:36,367 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-18 16:32:43,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3989290.0, ans=0.125 2024-08-18 16:32:44,795 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3989290.0, ans=0.125 2024-08-18 16:33:00,811 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3989490.0, ans=0.125 2024-08-18 16:33:04,120 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-18 16:33:09,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=3989490.0, ans=0.1 2024-08-18 16:33:09,262 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3989490.0, ans=0.0 2024-08-18 16:33:17,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3989590.0, ans=0.0 2024-08-18 16:33:26,243 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-18 16:33:29,895 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 12350, loss[loss=0.1071, beats_loss=0.007496, ecapa_loss=0.000151, whisper_loss=0.09814, over 15143.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01049, ecapa_loss=0.0001445, whisper_loss=0.08996, over 3829767.77 frames. ], batch size: 60, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:33:40,567 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 24 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-18 16:33:57,158 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-18 16:34:00,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3989890.0, ans=0.125 2024-08-18 16:34:10,105 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 19 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-18 16:34:18,142 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 24 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-18 16:34:37,312 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.315e+01 2.645e+01 2.969e+01 4.976e+01, threshold=5.289e+01, percent-clipped=0.0 2024-08-18 16:34:38,423 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3990090.0, ans=0.0 2024-08-18 16:34:45,902 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 12400, loss[loss=0.113, beats_loss=0.01207, ecapa_loss=0.0001477, whisper_loss=0.09948, over 19621.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0104, ecapa_loss=0.0001444, whisper_loss=0.09054, over 3846566.79 frames. ], batch size: 80, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:34:47,904 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 27 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-18 16:34:54,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3990190.0, ans=0.1 2024-08-18 16:34:58,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=3990190.0, ans=0.025 2024-08-18 16:35:02,612 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3990290.0, ans=0.125 2024-08-18 16:35:02,626 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3990290.0, ans=0.0 2024-08-18 16:35:14,490 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 26 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-18 16:35:27,937 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 32 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-18 16:35:34,926 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-18 16:35:42,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3990590.0, ans=0.125 2024-08-18 16:35:44,126 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.71 vs. limit=22.5 2024-08-18 16:35:53,016 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.86 vs. limit=15.0 2024-08-18 16:35:56,181 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 12450, loss[loss=0.1173, beats_loss=0.006459, ecapa_loss=0.0001217, whisper_loss=0.1096, over 14521.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01035, ecapa_loss=0.0001455, whisper_loss=0.09064, over 3859900.72 frames. ], batch size: 53, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:36:34,717 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3990890.0, ans=0.09899494936611666 2024-08-18 16:36:44,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2.whitening_limit, batch_count=3990990.0, ans=15.0 2024-08-18 16:37:01,212 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.300e+01 2.585e+01 2.837e+01 4.531e+01, threshold=5.170e+01, percent-clipped=0.0 2024-08-18 16:37:05,811 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 18 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-18 16:37:10,415 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 12500, loss[loss=0.1278, beats_loss=0.009326, ecapa_loss=0.0001136, whisper_loss=0.1173, over 22900.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01031, ecapa_loss=0.0001455, whisper_loss=0.09109, over 3850619.31 frames. ], batch size: 84, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:37:12,946 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.25 vs. limit=15.0 2024-08-18 16:37:21,104 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 21 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-18 16:37:29,738 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 12 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-18 16:37:30,157 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3991290.0, ans=0.0 2024-08-18 16:37:33,657 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.02 vs. limit=12.0 2024-08-18 16:37:34,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3991290.0, ans=0.125 2024-08-18 16:37:47,673 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3991390.0, ans=0.125 2024-08-18 16:37:57,266 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 23 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-18 16:38:23,443 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 12550, loss[loss=0.116, beats_loss=0.008856, ecapa_loss=0.000157, whisper_loss=0.1056, over 15644.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01033, ecapa_loss=0.0001455, whisper_loss=0.09129, over 3874658.46 frames. ], batch size: 66, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:38:26,959 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.24 vs. limit=22.5 2024-08-18 16:38:32,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3991690.0, ans=0.0 2024-08-18 16:38:32,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3991690.0, ans=0.0 2024-08-18 16:38:34,055 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3991690.0, ans=0.125 2024-08-18 16:38:34,996 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-18 16:38:42,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3991790.0, ans=0.0 2024-08-18 16:38:59,422 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 16:39:10,123 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3991990.0, ans=0.0 2024-08-18 16:39:25,054 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.976e+01 2.342e+01 2.582e+01 2.915e+01 4.894e+01, threshold=5.164e+01, percent-clipped=0.0 2024-08-18 16:39:33,191 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 12600, loss[loss=0.1034, beats_loss=0.01109, ecapa_loss=0.0001572, whisper_loss=0.09073, over 20187.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01039, ecapa_loss=0.000145, whisper_loss=0.09127, over 3916656.92 frames. ], batch size: 79, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:39:46,263 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 16 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-18 16:39:48,655 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-18 16:39:50,417 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-18 16:39:58,961 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3992290.0, ans=0.1 2024-08-18 16:40:05,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3992390.0, ans=0.2 2024-08-18 16:40:12,645 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3992390.0, ans=0.0 2024-08-18 16:40:15,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3992490.0, ans=0.0 2024-08-18 16:40:31,271 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 20 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-18 16:40:43,884 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 12650, loss[loss=0.1167, beats_loss=0.01069, ecapa_loss=0.0001408, whisper_loss=0.1046, over 23153.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01054, ecapa_loss=0.0001447, whisper_loss=0.09052, over 3908694.89 frames. ], batch size: 91, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:40:44,007 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-18 16:41:10,801 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3992790.0, ans=0.125 2024-08-18 16:41:18,132 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 26 from LS+wenet, 27 from Vox, 24 fro AS 2024-08-18 16:41:45,652 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.778e+01 2.296e+01 2.534e+01 2.850e+01 7.549e+01, threshold=5.068e+01, percent-clipped=1.0 2024-08-18 16:41:51,554 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3993090.0, ans=0.125 2024-08-18 16:41:54,032 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 12700, loss[loss=0.1087, beats_loss=0.00887, ecapa_loss=0.0001715, whisper_loss=0.09813, over 18333.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01051, ecapa_loss=0.0001447, whisper_loss=0.09075, over 3911106.10 frames. ], batch size: 76, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:42:07,823 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 21 from LS+wenet, 25 from Vox, 50 fro AS 2024-08-18 16:42:13,931 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.16 vs. limit=15.0 2024-08-18 16:42:26,709 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3993390.0, ans=0.125 2024-08-18 16:42:36,556 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.530e+01 2024-08-18 16:42:42,911 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3993490.0, ans=0.125 2024-08-18 16:42:53,928 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3993590.0, ans=0.1 2024-08-18 16:42:59,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3993590.0, ans=0.0 2024-08-18 16:43:02,267 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 29 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-18 16:43:06,273 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 12750, loss[loss=0.08886, beats_loss=0.01311, ecapa_loss=0.0001231, whisper_loss=0.07452, over 19772.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01056, ecapa_loss=0.0001449, whisper_loss=0.09053, over 3886252.37 frames. ], batch size: 81, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:43:08,001 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=3993690.0, ans=0.2 2024-08-18 16:43:08,973 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-18 16:43:09,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3993690.0, ans=0.1 2024-08-18 16:43:32,157 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-18 16:43:35,403 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3993890.0, ans=0.125 2024-08-18 16:43:44,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3993890.0, ans=0.1 2024-08-18 16:43:46,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3993890.0, ans=0.0 2024-08-18 16:43:59,091 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3993990.0, ans=0.0 2024-08-18 16:44:00,807 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3993990.0, ans=0.0 2024-08-18 16:44:03,171 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.960e+00 2024-08-18 16:44:05,689 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-18 16:44:09,992 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.299e+01 2.591e+01 2.848e+01 5.343e+01, threshold=5.182e+01, percent-clipped=2.0 2024-08-18 16:44:14,049 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 19 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-18 16:44:18,390 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 12800, loss[loss=0.1025, beats_loss=0.009708, ecapa_loss=0.0001498, whisper_loss=0.09127, over 21549.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01063, ecapa_loss=0.0001446, whisper_loss=0.09016, over 3887841.07 frames. ], batch size: 89, lr: 2.25e-03, grad_scale: 1.152921504606847e+18 2024-08-18 16:44:19,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3994190.0, ans=0.125 2024-08-18 16:44:20,023 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 19 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-18 16:44:29,858 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3994190.0, ans=0.125 2024-08-18 16:44:31,276 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3994290.0, ans=0.125 2024-08-18 16:44:42,435 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-18 16:45:13,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3994590.0, ans=0.125 2024-08-18 16:45:27,116 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 12850, loss[loss=0.0937, beats_loss=0.01139, ecapa_loss=0.000107, whisper_loss=0.08124, over 14002.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01059, ecapa_loss=0.0001445, whisper_loss=0.09047, over 3910090.41 frames. ], batch size: 54, lr: 2.25e-03, grad_scale: 1.152921504606847e+18 2024-08-18 16:45:44,898 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3994790.0, ans=0.125 2024-08-18 16:45:49,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3994790.0, ans=0.125 2024-08-18 16:45:52,639 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 25 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-18 16:45:57,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3994890.0, ans=0.1 2024-08-18 16:45:59,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3994890.0, ans=0.125 2024-08-18 16:46:03,627 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-18 16:46:05,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3994890.0, ans=0.125 2024-08-18 16:46:27,629 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.743e+01 2.211e+01 2.399e+01 2.704e+01 4.332e+02, threshold=4.798e+01, percent-clipped=1.0 2024-08-18 16:46:36,145 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 12900, loss[loss=0.1138, beats_loss=0.01078, ecapa_loss=0.0001205, whisper_loss=0.1018, over 21559.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01066, ecapa_loss=0.0001443, whisper_loss=0.08963, over 3875975.42 frames. ], batch size: 85, lr: 2.25e-03, grad_scale: 1.152921504606847e+18 2024-08-18 16:46:37,534 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 23 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-18 16:46:57,045 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 29 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-18 16:47:21,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3995490.0, ans=0.125 2024-08-18 16:47:30,619 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 34 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-18 16:47:36,676 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-18 16:47:47,759 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 12950, loss[loss=0.125, beats_loss=0.009232, ecapa_loss=0.0001558, whisper_loss=0.1142, over 21317.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01054, ecapa_loss=0.0001454, whisper_loss=0.0903, over 3891213.73 frames. ], batch size: 82, lr: 2.25e-03, grad_scale: 1.152921504606847e+18 2024-08-18 16:47:48,909 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2024-08-18 16:47:54,063 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3995690.0, ans=0.1 2024-08-18 16:47:58,611 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3995690.0, ans=0.125 2024-08-18 16:48:11,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3995790.0, ans=0.0 2024-08-18 16:48:13,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3995790.0, ans=0.1 2024-08-18 16:48:18,909 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3995890.0, ans=0.0 2024-08-18 16:48:34,780 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.247e+00 2024-08-18 16:48:38,479 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=15.0 2024-08-18 16:48:48,465 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.384e+01 2.658e+01 2.947e+01 5.232e+01, threshold=5.316e+01, percent-clipped=1.0 2024-08-18 16:48:49,901 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-18 16:48:53,590 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3996090.0, ans=0.0 2024-08-18 16:48:57,694 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 13000, loss[loss=0.1051, beats_loss=0.007517, ecapa_loss=0.0001923, whisper_loss=0.09562, over 18584.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01051, ecapa_loss=0.0001448, whisper_loss=0.09103, over 3914064.20 frames. ], batch size: 78, lr: 2.25e-03, grad_scale: 1.152921504606847e+18 2024-08-18 16:49:02,248 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3996190.0, ans=0.0 2024-08-18 16:49:08,414 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 22 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-18 16:49:12,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3996290.0, ans=0.1 2024-08-18 16:49:14,616 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3996290.0, ans=0.125 2024-08-18 16:49:14,667 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3996290.0, ans=10.0 2024-08-18 16:49:18,281 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 23 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-18 16:49:25,549 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3996390.0, ans=0.125 2024-08-18 16:49:32,620 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 17 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-18 16:49:33,094 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3996390.0, ans=10.0 2024-08-18 16:49:52,702 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 31 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-18 16:49:54,074 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-18 16:49:59,114 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3996590.0, ans=0.125 2024-08-18 16:50:05,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3996690.0, ans=0.125 2024-08-18 16:50:06,165 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 13050, loss[loss=0.1012, beats_loss=0.01038, ecapa_loss=0.0001641, whisper_loss=0.08913, over 19469.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01053, ecapa_loss=0.0001434, whisper_loss=0.09056, over 3880707.24 frames. ], batch size: 81, lr: 2.25e-03, grad_scale: 1.152921504606847e+18 2024-08-18 16:50:47,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3996990.0, ans=0.125 2024-08-18 16:50:51,985 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.55 vs. limit=15.0 2024-08-18 16:50:55,547 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-18 16:50:57,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3996990.0, ans=0.125 2024-08-18 16:51:00,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3997090.0, ans=0.2 2024-08-18 16:51:06,637 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.279e+01 2.532e+01 2.847e+01 6.978e+01, threshold=5.064e+01, percent-clipped=1.0 2024-08-18 16:51:07,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3997090.0, ans=0.125 2024-08-18 16:51:11,549 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3997090.0, ans=0.1 2024-08-18 16:51:14,956 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 13100, loss[loss=0.0828, beats_loss=0.009368, ecapa_loss=0.0001506, whisper_loss=0.07193, over 16846.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01057, ecapa_loss=0.0001429, whisper_loss=0.08991, over 3882536.21 frames. ], batch size: 68, lr: 2.25e-03, grad_scale: 1.152921504606847e+18 2024-08-18 16:51:15,426 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3997190.0, ans=0.125 2024-08-18 16:51:15,465 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3997190.0, ans=0.0 2024-08-18 16:51:46,144 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 21 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-18 16:51:46,798 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3997390.0, ans=0.5 2024-08-18 16:52:13,651 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.41 vs. limit=22.5 2024-08-18 16:52:18,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3997590.0, ans=0.125 2024-08-18 16:52:21,873 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.910e+00 2024-08-18 16:52:25,051 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3997690.0, ans=0.0 2024-08-18 16:52:25,746 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 13150, loss[loss=0.09623, beats_loss=0.00821, ecapa_loss=0.0001532, whisper_loss=0.08649, over 18369.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01055, ecapa_loss=0.0001427, whisper_loss=0.09002, over 3885338.38 frames. ], batch size: 72, lr: 2.25e-03, grad_scale: 1.152921504606847e+18 2024-08-18 16:52:41,299 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 32 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-18 16:52:52,319 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 20 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-18 16:52:54,091 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3997890.0, ans=0.1 2024-08-18 16:53:25,119 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.700e+01 2.321e+01 2.625e+01 2.876e+01 3.803e+01, threshold=5.250e+01, percent-clipped=0.0 2024-08-18 16:53:26,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3998090.0, ans=0.1 2024-08-18 16:53:33,265 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 13200, loss[loss=0.1243, beats_loss=0.007953, ecapa_loss=0.0001621, whisper_loss=0.1147, over 22581.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01042, ecapa_loss=0.0001436, whisper_loss=0.09148, over 3856850.91 frames. ], batch size: 89, lr: 2.25e-03, grad_scale: 1.152921504606847e+18 2024-08-18 16:53:39,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3998190.0, ans=0.125 2024-08-18 16:53:42,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3998190.0, ans=0.0 2024-08-18 16:53:43,832 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.65 vs. limit=22.5 2024-08-18 16:54:09,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3998390.0, ans=0.125 2024-08-18 16:54:21,974 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3998490.0, ans=0.125 2024-08-18 16:54:38,748 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3998690.0, ans=0.125 2024-08-18 16:54:39,489 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 13250, loss[loss=0.07864, beats_loss=0.01151, ecapa_loss=0.0001685, whisper_loss=0.06545, over 21142.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01035, ecapa_loss=0.000146, whisper_loss=0.09215, over 3869343.24 frames. ], batch size: 94, lr: 2.25e-03, grad_scale: 1.152921504606847e+18 2024-08-18 16:55:17,474 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.87 vs. limit=15.0 2024-08-18 16:55:19,208 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-18 16:55:23,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3998990.0, ans=0.0 2024-08-18 16:55:27,970 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3998990.0, ans=0.1 2024-08-18 16:55:40,318 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.403e+01 2.676e+01 3.041e+01 3.478e+02, threshold=5.351e+01, percent-clipped=3.0 2024-08-18 16:55:46,849 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.81 vs. limit=15.0 2024-08-18 16:55:48,180 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 13300, loss[loss=0.1088, beats_loss=0.01266, ecapa_loss=0.0001182, whisper_loss=0.095, over 19470.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01031, ecapa_loss=0.000145, whisper_loss=0.09215, over 3879274.54 frames. ], batch size: 77, lr: 2.25e-03, grad_scale: 1.152921504606847e+18 2024-08-18 16:55:58,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3999190.0, ans=0.0 2024-08-18 16:56:17,739 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3999390.0, ans=0.0 2024-08-18 16:56:52,946 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 13350, loss[loss=0.0735, beats_loss=0.01254, ecapa_loss=0.0001415, whisper_loss=0.05955, over 16627.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01035, ecapa_loss=0.0001444, whisper_loss=0.09217, over 3901816.53 frames. ], batch size: 70, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:56:53,390 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 40 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-18 16:56:57,268 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.12 vs. limit=6.0 2024-08-18 16:57:03,807 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.31 vs. limit=15.0 2024-08-18 16:57:12,101 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 15 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-18 16:57:42,562 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3999990.0, ans=0.2 2024-08-18 16:57:42,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3999990.0, ans=0.04949747468305833 2024-08-18 16:57:46,364 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3999990.0, ans=0.0 2024-08-18 16:57:51,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4000090.0, ans=0.0 2024-08-18 16:57:55,103 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-18 16:57:56,428 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.674e+01 2.330e+01 2.620e+01 2.943e+01 5.095e+01, threshold=5.239e+01, percent-clipped=0.0 2024-08-18 16:58:03,317 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 13400, loss[loss=0.1142, beats_loss=0.008164, ecapa_loss=0.0001444, whisper_loss=0.1046, over 23404.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01038, ecapa_loss=0.0001449, whisper_loss=0.09108, over 3875367.37 frames. ], batch size: 91, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:58:25,821 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 19 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-18 16:58:36,501 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 29 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-18 16:58:38,379 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-08-18 16:58:42,358 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 19 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-18 16:58:43,337 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.67 vs. limit=8.0 2024-08-18 16:58:55,239 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.61 vs. limit=15.0 2024-08-18 16:59:15,441 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 13450, loss[loss=0.0793, beats_loss=0.009842, ecapa_loss=0.0001821, whisper_loss=0.06764, over 13408.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01044, ecapa_loss=0.0001442, whisper_loss=0.09065, over 3853181.08 frames. ], batch size: 56, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:59:28,071 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 17 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-18 16:59:28,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4000690.0, ans=0.0 2024-08-18 16:59:45,244 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.24 vs. limit=12.0 2024-08-18 16:59:49,017 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.67 vs. limit=15.0 2024-08-18 17:00:16,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4000990.0, ans=0.125 2024-08-18 17:00:29,110 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.614e+01 2.273e+01 2.579e+01 2.870e+01 2.046e+02, threshold=5.158e+01, percent-clipped=2.0 2024-08-18 17:00:32,276 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4001090.0, ans=0.125 2024-08-18 17:00:36,122 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 13500, loss[loss=0.1194, beats_loss=0.009208, ecapa_loss=0.0001595, whisper_loss=0.1086, over 21986.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01041, ecapa_loss=0.0001445, whisper_loss=0.09092, over 3851102.15 frames. ], batch size: 89, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:00:58,017 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.81 vs. limit=15.0 2024-08-18 17:01:26,212 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.04 vs. limit=22.5 2024-08-18 17:01:38,180 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4001590.0, ans=0.1 2024-08-18 17:01:54,591 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 13550, loss[loss=0.09985, beats_loss=0.009759, ecapa_loss=0.00012, whisper_loss=0.08889, over 22452.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01054, ecapa_loss=0.0001434, whisper_loss=0.09004, over 3873767.36 frames. ], batch size: 87, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:02:23,049 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4001890.0, ans=0.0 2024-08-18 17:02:36,774 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-18 17:02:51,244 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=4001990.0, ans=15.0 2024-08-18 17:02:55,066 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 32 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-18 17:02:57,241 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 24 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-18 17:03:04,834 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.302e+01 2.500e+01 2.862e+01 4.462e+01, threshold=4.999e+01, percent-clipped=0.0 2024-08-18 17:03:11,795 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4002190.0, ans=0.0 2024-08-18 17:03:12,558 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 13600, loss[loss=0.1076, beats_loss=0.01131, ecapa_loss=0.0001397, whisper_loss=0.0949, over 22851.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01051, ecapa_loss=0.0001423, whisper_loss=0.09086, over 3867754.09 frames. ], batch size: 89, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:03:21,472 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=4002190.0, ans=15.0 2024-08-18 17:03:27,444 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-18 17:03:27,587 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4002290.0, ans=0.125 2024-08-18 17:03:30,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4002290.0, ans=0.2 2024-08-18 17:03:35,399 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 32 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-18 17:03:36,875 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-18 17:03:37,882 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-18 17:04:18,345 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 13650, loss[loss=0.1207, beats_loss=0.008441, ecapa_loss=0.0001346, whisper_loss=0.1109, over 15829.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01057, ecapa_loss=0.0001425, whisper_loss=0.09071, over 3884411.22 frames. ], batch size: 60, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:04:18,855 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.138e+01 2024-08-18 17:04:36,496 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-18 17:04:51,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4002890.0, ans=0.125 2024-08-18 17:04:58,945 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4002990.0, ans=0.1 2024-08-18 17:05:10,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4003090.0, ans=0.125 2024-08-18 17:05:13,243 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.369e+01 2.636e+01 3.031e+01 4.631e+02, threshold=5.273e+01, percent-clipped=3.0 2024-08-18 17:05:13,479 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-18 17:05:19,308 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 13700, loss[loss=0.1016, beats_loss=0.01005, ecapa_loss=0.0001371, whisper_loss=0.09013, over 14704.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01054, ecapa_loss=0.0001434, whisper_loss=0.09098, over 3896238.41 frames. ], batch size: 54, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:05:29,563 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 25 from LS+wenet, 11 from Vox, 20 fro AS 2024-08-18 17:05:42,890 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 33 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-18 17:05:50,254 WARNING [optim.py:496] (2/4) Scaling gradients by 0.08100207895040512, model_norm_threshold=52.72901916503906 2024-08-18 17:05:50,436 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.24, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.003e+05, grad_sumsq=1.003e+05, orig_rms_sq=1.000e+00 2024-08-18 17:06:00,750 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-08-18 17:06:04,034 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 17 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-18 17:06:08,165 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 16 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-18 17:06:20,858 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 13750, loss[loss=0.09922, beats_loss=0.01054, ecapa_loss=0.0001528, whisper_loss=0.08716, over 21257.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01051, ecapa_loss=0.0001444, whisper_loss=0.09111, over 3861092.80 frames. ], batch size: 89, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:06:33,905 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4003790.0, ans=0.1 2024-08-18 17:06:38,520 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4003790.0, ans=0.95 2024-08-18 17:06:39,561 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-18 17:07:02,296 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4003990.0, ans=0.0 2024-08-18 17:07:03,439 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 13 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-18 17:07:16,831 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.418e+01 2.607e+01 2.994e+01 6.510e+02, threshold=5.215e+01, percent-clipped=3.0 2024-08-18 17:07:18,284 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-18 17:07:21,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4004090.0, ans=0.0 2024-08-18 17:07:23,410 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 13800, loss[loss=0.111, beats_loss=0.008515, ecapa_loss=0.00016, whisper_loss=0.1009, over 22539.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01043, ecapa_loss=0.0001441, whisper_loss=0.09135, over 3852066.51 frames. ], batch size: 90, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:07:23,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4004190.0, ans=0.95 2024-08-18 17:07:38,942 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 28 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-18 17:07:47,911 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-18 17:07:59,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4004390.0, ans=0.125 2024-08-18 17:08:02,025 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 13 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-18 17:08:08,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4004490.0, ans=0.125 2024-08-18 17:08:17,439 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.59 vs. limit=10.0 2024-08-18 17:08:24,147 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.27 vs. limit=15.0 2024-08-18 17:08:26,997 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 13850, loss[loss=0.09212, beats_loss=0.01223, ecapa_loss=0.00014, whisper_loss=0.07849, over 15973.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01048, ecapa_loss=0.0001427, whisper_loss=0.09094, over 3853336.08 frames. ], batch size: 64, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:08:31,957 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.36 vs. limit=15.0 2024-08-18 17:08:47,163 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4004790.0, ans=0.125 2024-08-18 17:08:53,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4004890.0, ans=0.125 2024-08-18 17:09:07,755 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4004990.0, ans=0.125 2024-08-18 17:09:10,149 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4004990.0, ans=0.125 2024-08-18 17:09:24,458 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.330e+01 2.560e+01 2.931e+01 5.468e+01, threshold=5.120e+01, percent-clipped=1.0 2024-08-18 17:09:30,738 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 13900, loss[loss=0.09692, beats_loss=0.01237, ecapa_loss=0.0001458, whisper_loss=0.0831, over 17592.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0105, ecapa_loss=0.0001426, whisper_loss=0.09047, over 3853910.63 frames. ], batch size: 73, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:09:43,624 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 17 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-18 17:09:45,242 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4005290.0, ans=0.125 2024-08-18 17:09:49,809 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-18 17:10:10,297 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4005490.0, ans=0.1 2024-08-18 17:10:36,322 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 13950, loss[loss=0.1158, beats_loss=0.009661, ecapa_loss=0.0001698, whisper_loss=0.1044, over 22419.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01053, ecapa_loss=0.0001422, whisper_loss=0.0904, over 3886447.83 frames. ], batch size: 90, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:10:41,805 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-18 17:10:47,810 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 39 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-18 17:10:52,805 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 29 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-18 17:10:53,235 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.62 vs. limit=22.5 2024-08-18 17:10:59,388 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4005790.0, ans=0.125 2024-08-18 17:11:12,299 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.77 vs. limit=6.0 2024-08-18 17:11:33,983 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.844e+01 2.308e+01 2.540e+01 2.874e+01 4.374e+01, threshold=5.080e+01, percent-clipped=0.0 2024-08-18 17:11:34,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4006090.0, ans=0.125 2024-08-18 17:11:40,684 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 14000, loss[loss=0.1149, beats_loss=0.006502, ecapa_loss=0.0001534, whisper_loss=0.1068, over 20699.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01046, ecapa_loss=0.0001422, whisper_loss=0.09107, over 3905454.16 frames. ], batch size: 80, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:11:45,729 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 17 from Vox, 51 fro AS 2024-08-18 17:11:48,439 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4006190.0, ans=0.2 2024-08-18 17:12:08,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4006390.0, ans=0.0 2024-08-18 17:12:10,209 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4006390.0, ans=0.125 2024-08-18 17:12:14,156 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4006390.0, ans=0.05 2024-08-18 17:12:26,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4006490.0, ans=0.0 2024-08-18 17:12:29,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4006490.0, ans=0.125 2024-08-18 17:12:44,372 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 14050, loss[loss=0.111, beats_loss=0.008394, ecapa_loss=0.0001445, whisper_loss=0.1011, over 15319.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01046, ecapa_loss=0.000142, whisper_loss=0.09122, over 3908195.23 frames. ], batch size: 58, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:12:46,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4006690.0, ans=0.0 2024-08-18 17:12:49,480 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 28 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-18 17:13:05,871 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-18 17:13:31,621 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4006990.0, ans=0.2 2024-08-18 17:13:34,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4007090.0, ans=0.125 2024-08-18 17:13:39,373 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 27 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-18 17:13:40,111 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.whiten.whitening_limit, batch_count=4007090.0, ans=12.0 2024-08-18 17:13:41,652 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.335e+01 2.606e+01 2.930e+01 4.821e+01, threshold=5.212e+01, percent-clipped=0.0 2024-08-18 17:13:47,602 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 14100, loss[loss=0.1123, beats_loss=0.008441, ecapa_loss=0.0002027, whisper_loss=0.1019, over 21652.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01044, ecapa_loss=0.000142, whisper_loss=0.09205, over 3903591.76 frames. ], batch size: 90, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:14:05,326 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.32 vs. limit=15.0 2024-08-18 17:14:16,215 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-18 17:14:21,908 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.35 vs. limit=22.5 2024-08-18 17:14:27,921 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4007490.0, ans=0.125 2024-08-18 17:14:34,755 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.39 vs. limit=22.5 2024-08-18 17:14:43,836 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.33 vs. limit=15.0 2024-08-18 17:14:48,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4007590.0, ans=0.125 2024-08-18 17:14:50,411 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 14150, loss[loss=0.0876, beats_loss=0.009667, ecapa_loss=0.000133, whisper_loss=0.07661, over 17158.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01049, ecapa_loss=0.0001414, whisper_loss=0.09111, over 3886898.91 frames. ], batch size: 65, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:15:02,669 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4007790.0, ans=0.125 2024-08-18 17:15:06,426 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4007790.0, ans=0.0 2024-08-18 17:15:09,910 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-18 17:15:17,709 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 20 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-18 17:15:21,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4007890.0, ans=0.125 2024-08-18 17:15:46,832 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-18 17:15:47,914 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.262e+01 2.524e+01 2.804e+01 4.310e+01, threshold=5.048e+01, percent-clipped=0.0 2024-08-18 17:15:54,606 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 14200, loss[loss=0.07657, beats_loss=0.01132, ecapa_loss=0.0001703, whisper_loss=0.06355, over 19708.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0105, ecapa_loss=0.0001423, whisper_loss=0.0906, over 3890794.44 frames. ], batch size: 82, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:15:54,778 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-18 17:15:56,607 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.55 vs. limit=15.0 2024-08-18 17:15:57,188 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 23 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-18 17:16:03,622 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-18 17:16:03,918 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4008190.0, ans=0.2 2024-08-18 17:16:07,399 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4008290.0, ans=0.1 2024-08-18 17:16:12,118 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 30 from LS+wenet, 29 from Vox, 27 fro AS 2024-08-18 17:16:16,978 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 24 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-18 17:16:29,756 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 18 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-18 17:16:31,173 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-18 17:16:38,902 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4008490.0, ans=0.125 2024-08-18 17:16:47,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4008590.0, ans=0.07 2024-08-18 17:16:56,650 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4008690.0, ans=0.125 2024-08-18 17:16:57,491 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 14250, loss[loss=0.1229, beats_loss=0.00901, ecapa_loss=0.0001294, whisper_loss=0.1126, over 20134.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01046, ecapa_loss=0.0001431, whisper_loss=0.091, over 3905828.08 frames. ], batch size: 76, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:17:00,488 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4008690.0, ans=0.0 2024-08-18 17:17:06,402 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4008690.0, ans=0.125 2024-08-18 17:17:30,128 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 20 from LS+wenet, 13 from Vox, 44 fro AS 2024-08-18 17:17:31,410 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 19 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-18 17:17:38,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4008990.0, ans=0.125 2024-08-18 17:17:46,600 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 24 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-18 17:17:53,458 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-18 17:17:55,602 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.883e+01 2.243e+01 2.470e+01 2.770e+01 7.680e+01, threshold=4.941e+01, percent-clipped=2.0 2024-08-18 17:18:01,588 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 14300, loss[loss=0.1012, beats_loss=0.01145, ecapa_loss=0.0001329, whisper_loss=0.08843, over 21314.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01054, ecapa_loss=0.0001436, whisper_loss=0.08973, over 3912194.40 frames. ], batch size: 86, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:18:15,973 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4009290.0, ans=0.125 2024-08-18 17:18:58,916 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4009590.0, ans=0.125 2024-08-18 17:19:05,876 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 14350, loss[loss=0.09306, beats_loss=0.01192, ecapa_loss=0.0001443, whisper_loss=0.0797, over 18506.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01058, ecapa_loss=0.0001434, whisper_loss=0.08952, over 3906561.61 frames. ], batch size: 75, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:19:18,513 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 14 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-18 17:19:53,263 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4009990.0, ans=0.0 2024-08-18 17:19:54,409 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 32 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-18 17:20:05,975 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.040e+01 2.366e+01 2.598e+01 2.848e+01 4.928e+01, threshold=5.195e+01, percent-clipped=0.0 2024-08-18 17:20:13,173 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 14400, loss[loss=0.09673, beats_loss=0.01323, ecapa_loss=0.0001389, whisper_loss=0.08211, over 20915.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01056, ecapa_loss=0.0001435, whisper_loss=0.08948, over 3873513.62 frames. ], batch size: 89, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:20:29,589 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 31 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-18 17:21:21,362 INFO [train_multi_KD3.py:1116] (2/4) Epoch 27, batch 14450, loss[loss=0.103, beats_loss=0.008451, ecapa_loss=0.0001947, whisper_loss=0.09262, over 17035.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01055, ecapa_loss=0.0001442, whisper_loss=0.08971, over 3900696.92 frames. ], batch size: 71, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:21:52,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4010890.0, ans=0.0 2024-08-18 17:21:57,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=4010890.0, ans=0.025 2024-08-18 17:22:35,125 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 0, loss[loss=0.1082, beats_loss=0.007481, ecapa_loss=0.0001535, whisper_loss=0.09923, over 19371.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.007481, ecapa_loss=0.0001535, whisper_loss=0.09923, over 19371.00 frames. ], batch size: 72, lr: 2.21e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:22:35,125 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-18 17:23:13,194 INFO [train_multi_KD3.py:1149] (2/4) Epoch 28, validation on ASR_libri: loss=0.253, beats_loss=0, ecapa_loss=0.000516, whisper_loss=0.2479, over 922467.00 frames. 2024-08-18 17:23:27,347 INFO [train_multi_KD3.py:1149] (2/4) Epoch 28, validation on SV_voxceleb1: loss=0.004085, beats_loss=0, ecapa_loss=0.0004085, whisper_loss=0, over 939242.00 frames. 2024-08-18 17:24:34,880 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([0.0008, 0.0433, 0.0008, 0.0207, 0.0019, 0.0997, 0.0264, 0.0650], device='cuda:2') 2024-08-18 17:25:15,867 INFO [train_multi_KD3.py:1149] (2/4) Epoch 28, validation on AT_audioset: loss=0.02306, beats_loss=0.02306, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 17:25:15,870 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31584MB 2024-08-18 17:25:27,289 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-18 17:25:30,068 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.806e+01 2.404e+01 2.660e+01 3.020e+01 3.509e+02, threshold=5.320e+01, percent-clipped=1.0 2024-08-18 17:25:40,019 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4011180.0, ans=0.125 2024-08-18 17:25:56,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4011180.0, ans=0.125 2024-08-18 17:26:49,037 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-18 17:26:55,216 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 24 from LS+wenet, 15 from Vox, 15 fro AS 2024-08-18 17:27:14,542 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 50, loss[loss=0.0934, beats_loss=0.01111, ecapa_loss=0.0001248, whisper_loss=0.08105, over 20402.00 frames. ], tot_loss[loss=0.09929, beats_loss=0.009465, ecapa_loss=0.0001485, whisper_loss=0.08834, over 889772.44 frames. ], batch size: 83, lr: 2.21e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:27:32,709 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4011580.0, ans=0.1 2024-08-18 17:27:54,923 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-18 17:28:07,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4011780.0, ans=0.125 2024-08-18 17:28:12,071 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4011780.0, ans=0.1 2024-08-18 17:28:25,805 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.72 vs. limit=15.0 2024-08-18 17:28:30,129 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.whiten.whitening_limit, batch_count=4011880.0, ans=12.0 2024-08-18 17:28:49,380 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4011980.0, ans=0.125 2024-08-18 17:29:03,713 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 100, loss[loss=0.1048, beats_loss=0.007814, ecapa_loss=0.0001555, whisper_loss=0.09545, over 19225.00 frames. ], tot_loss[loss=0.09801, beats_loss=0.009505, ecapa_loss=0.0001447, whisper_loss=0.08706, over 1539747.07 frames. ], batch size: 74, lr: 2.21e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:29:04,570 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 23 from LS+wenet, 30 from Vox, 23 fro AS 2024-08-18 17:29:15,949 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.274e+01 2.549e+01 2.774e+01 3.166e+01 3.794e+01, threshold=5.547e+01, percent-clipped=0.0 2024-08-18 17:29:23,164 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4012180.0, ans=0.0 2024-08-18 17:29:47,966 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.13 vs. limit=22.5 2024-08-18 17:29:53,823 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=4012280.0, ans=0.5 2024-08-18 17:29:53,843 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4012280.0, ans=0.1 2024-08-18 17:30:16,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4012380.0, ans=0.1 2024-08-18 17:30:26,846 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 16 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-18 17:30:28,214 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 14 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-18 17:30:41,431 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 150, loss[loss=0.105, beats_loss=0.01072, ecapa_loss=0.0001448, whisper_loss=0.0928, over 22828.00 frames. ], tot_loss[loss=0.09854, beats_loss=0.009648, ecapa_loss=0.0001433, whisper_loss=0.08746, over 2022990.63 frames. ], batch size: 90, lr: 2.21e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:30:56,200 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4012580.0, ans=0.0 2024-08-18 17:31:16,124 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 10 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-18 17:31:58,458 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4013080.0, ans=0.125 2024-08-18 17:31:59,199 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 200, loss[loss=0.09771, beats_loss=0.01044, ecapa_loss=0.0001348, whisper_loss=0.08592, over 14680.00 frames. ], tot_loss[loss=0.09961, beats_loss=0.009896, ecapa_loss=0.0001439, whisper_loss=0.08828, over 2396678.28 frames. ], batch size: 60, lr: 2.21e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:32:08,137 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.367e+01 2.619e+01 2.925e+01 1.442e+02, threshold=5.239e+01, percent-clipped=3.0 2024-08-18 17:32:08,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4013080.0, ans=0.1 2024-08-18 17:32:15,240 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 34 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-18 17:32:26,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4013280.0, ans=0.125 2024-08-18 17:32:34,333 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-18 17:33:01,038 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.29 vs. limit=5.0 2024-08-18 17:33:08,136 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 250, loss[loss=0.104, beats_loss=0.01079, ecapa_loss=0.0001167, whisper_loss=0.09209, over 18628.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.00998, ecapa_loss=0.0001441, whisper_loss=0.0892, over 2687924.98 frames. ], batch size: 72, lr: 2.21e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:33:34,287 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 21 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-18 17:33:37,468 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.11 vs. limit=15.0 2024-08-18 17:33:52,248 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-18 17:33:58,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4013880.0, ans=0.0 2024-08-18 17:34:08,418 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 18 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-18 17:34:09,374 WARNING [optim.py:496] (2/4) Scaling gradients by 0.03464806452393532, model_norm_threshold=52.38976287841797 2024-08-18 17:34:09,544 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.19, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.434e+05, grad_sumsq=4.434e+05, orig_rms_sq=1.000e+00 2024-08-18 17:34:15,892 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 300, loss[loss=0.09913, beats_loss=0.009962, ecapa_loss=0.0001547, whisper_loss=0.08762, over 22438.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01009, ecapa_loss=0.0001458, whisper_loss=0.0889, over 2945132.02 frames. ], batch size: 90, lr: 2.21e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:34:20,331 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 27 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-18 17:34:21,855 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4014080.0, ans=0.125 2024-08-18 17:34:23,788 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+01 2.275e+01 2.490e+01 2.772e+01 1.512e+03, threshold=4.979e+01, percent-clipped=1.0 2024-08-18 17:34:34,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4014180.0, ans=0.125 2024-08-18 17:34:43,666 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.78 vs. limit=15.0 2024-08-18 17:34:56,063 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4014380.0, ans=0.0 2024-08-18 17:35:05,919 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 20 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-18 17:35:09,554 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 15 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-18 17:35:14,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4014480.0, ans=0.125 2024-08-18 17:35:19,500 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 350, loss[loss=0.08195, beats_loss=0.01076, ecapa_loss=0.0001228, whisper_loss=0.06996, over 17142.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01015, ecapa_loss=0.000145, whisper_loss=0.08907, over 3096139.84 frames. ], batch size: 64, lr: 2.21e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:35:26,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4014580.0, ans=0.125 2024-08-18 17:35:28,794 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4014580.0, ans=0.125 2024-08-18 17:35:29,390 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.01 vs. limit=6.0 2024-08-18 17:35:47,968 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-18 17:36:01,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4014880.0, ans=0.125 2024-08-18 17:36:07,594 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 19 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-18 17:36:10,360 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4014980.0, ans=0.125 2024-08-18 17:36:20,947 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 400, loss[loss=0.1054, beats_loss=0.01022, ecapa_loss=0.0001263, whisper_loss=0.09395, over 22042.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01014, ecapa_loss=0.0001444, whisper_loss=0.08937, over 3258596.16 frames. ], batch size: 84, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:36:24,992 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4015080.0, ans=0.2 2024-08-18 17:36:28,117 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.258e+01 2.514e+01 2.865e+01 8.622e+01, threshold=5.028e+01, percent-clipped=3.0 2024-08-18 17:37:04,126 WARNING [optim.py:496] (2/4) Scaling gradients by 0.06247842684388161, model_norm_threshold=50.280113220214844 2024-08-18 17:37:04,285 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.0.norm.log_scale with proportion 0.15, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.000e+05, grad_sumsq=1.000e+05, orig_rms_sq=1.000e+00 2024-08-18 17:37:06,968 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 31 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-18 17:37:18,843 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4015480.0, ans=0.0 2024-08-18 17:37:21,170 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4015480.0, ans=0.125 2024-08-18 17:37:23,118 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 450, loss[loss=0.08474, beats_loss=0.01257, ecapa_loss=0.0001281, whisper_loss=0.07089, over 18647.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01023, ecapa_loss=0.0001434, whisper_loss=0.08972, over 3385169.51 frames. ], batch size: 78, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:37:24,826 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4015580.0, ans=0.5 2024-08-18 17:37:26,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4015580.0, ans=0.0 2024-08-18 17:37:42,506 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4015680.0, ans=0.0 2024-08-18 17:37:43,699 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4015680.0, ans=0.125 2024-08-18 17:37:49,187 WARNING [optim.py:496] (2/4) Scaling gradients by 0.02706790715456009, model_norm_threshold=50.280113220214844 2024-08-18 17:37:49,358 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.11, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.902e+05, grad_sumsq=3.776e+07, orig_rms_sq=1.033e-02 2024-08-18 17:37:57,332 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4015780.0, ans=0.0 2024-08-18 17:38:13,423 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 17:38:18,420 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4015980.0, ans=0.125 2024-08-18 17:38:19,303 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-18 17:38:22,026 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4015980.0, ans=0.125 2024-08-18 17:38:25,134 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 500, loss[loss=0.09788, beats_loss=0.01038, ecapa_loss=0.0001713, whisper_loss=0.08578, over 21903.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01027, ecapa_loss=0.0001432, whisper_loss=0.08949, over 3497225.39 frames. ], batch size: 93, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:38:32,627 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.374e+01 2.639e+01 2.877e+01 1.858e+03, threshold=5.278e+01, percent-clipped=3.0 2024-08-18 17:38:44,035 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4016180.0, ans=0.0 2024-08-18 17:38:55,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4016280.0, ans=10.0 2024-08-18 17:39:02,622 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4016380.0, ans=0.2 2024-08-18 17:39:03,024 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.81 vs. limit=15.0 2024-08-18 17:39:05,363 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.83 vs. limit=6.0 2024-08-18 17:39:17,028 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.09 vs. limit=15.0 2024-08-18 17:39:20,240 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4016480.0, ans=0.0 2024-08-18 17:39:27,548 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 550, loss[loss=0.09587, beats_loss=0.01266, ecapa_loss=0.0001506, whisper_loss=0.0817, over 19364.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01027, ecapa_loss=0.000143, whisper_loss=0.08991, over 3608018.54 frames. ], batch size: 81, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:39:28,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=4016580.0, ans=15.0 2024-08-18 17:39:30,906 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.97 vs. limit=15.0 2024-08-18 17:39:37,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4016580.0, ans=0.125 2024-08-18 17:39:38,483 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.39 vs. limit=15.0 2024-08-18 17:39:42,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4016680.0, ans=0.0 2024-08-18 17:39:55,244 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4016780.0, ans=0.1 2024-08-18 17:40:10,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4016880.0, ans=0.1 2024-08-18 17:40:16,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4016980.0, ans=0.09899494936611666 2024-08-18 17:40:16,473 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.96 vs. limit=12.0 2024-08-18 17:40:28,763 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-18 17:40:29,767 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 600, loss[loss=0.1052, beats_loss=0.0122, ecapa_loss=0.0001683, whisper_loss=0.09134, over 22019.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01031, ecapa_loss=0.0001423, whisper_loss=0.08956, over 3668734.62 frames. ], batch size: 91, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:40:36,911 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.771e+01 2.365e+01 2.589e+01 2.843e+01 3.555e+01, threshold=5.178e+01, percent-clipped=0.0 2024-08-18 17:40:43,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4017180.0, ans=0.09899494936611666 2024-08-18 17:40:55,805 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-18 17:41:04,039 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4017280.0, ans=0.2 2024-08-18 17:41:05,344 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.23 vs. limit=15.0 2024-08-18 17:41:15,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4017380.0, ans=0.1 2024-08-18 17:41:16,191 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4017380.0, ans=0.125 2024-08-18 17:41:31,746 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 650, loss[loss=0.106, beats_loss=0.009167, ecapa_loss=0.0001566, whisper_loss=0.09522, over 16139.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01037, ecapa_loss=0.0001418, whisper_loss=0.08939, over 3693809.20 frames. ], batch size: 64, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:41:32,704 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.83 vs. limit=15.0 2024-08-18 17:41:40,224 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 25 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-18 17:41:42,748 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.91 vs. limit=15.0 2024-08-18 17:41:47,645 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4017680.0, ans=0.1 2024-08-18 17:41:48,763 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4017680.0, ans=0.125 2024-08-18 17:41:57,849 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-18 17:42:02,972 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 27 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-18 17:42:12,179 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.609e-03 2024-08-18 17:42:14,564 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4017880.0, ans=0.125 2024-08-18 17:42:15,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4017880.0, ans=10.0 2024-08-18 17:42:17,422 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.66 vs. limit=22.5 2024-08-18 17:42:34,219 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 700, loss[loss=0.0677, beats_loss=0.01327, ecapa_loss=0.0001068, whisper_loss=0.05336, over 15160.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01034, ecapa_loss=0.0001418, whisper_loss=0.08957, over 3703650.37 frames. ], batch size: 61, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:42:35,793 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4018080.0, ans=0.0 2024-08-18 17:42:41,477 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.261e+01 2.566e+01 2.914e+01 5.332e+01, threshold=5.131e+01, percent-clipped=1.0 2024-08-18 17:43:09,940 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=7.948e-01 2024-08-18 17:43:14,543 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4018380.0, ans=0.1 2024-08-18 17:43:16,862 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4018380.0, ans=0.09899494936611666 2024-08-18 17:43:19,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4018380.0, ans=0.2 2024-08-18 17:43:20,527 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4018380.0, ans=0.125 2024-08-18 17:43:25,287 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 18 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-18 17:43:33,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4018480.0, ans=0.0 2024-08-18 17:43:35,010 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 17 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-18 17:43:36,046 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 750, loss[loss=0.08952, beats_loss=0.01359, ecapa_loss=0.0001219, whisper_loss=0.07471, over 16540.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01038, ecapa_loss=0.0001423, whisper_loss=0.08943, over 3740398.86 frames. ], batch size: 66, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:43:37,333 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 21 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-18 17:43:42,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4018580.0, ans=0.0 2024-08-18 17:43:47,475 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 27 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-18 17:44:02,569 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4018780.0, ans=0.125 2024-08-18 17:44:08,534 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-18 17:44:09,420 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2024-08-18 17:44:13,699 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4018880.0, ans=0.0 2024-08-18 17:44:29,857 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4018980.0, ans=0.2 2024-08-18 17:44:38,376 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 800, loss[loss=0.1005, beats_loss=0.01034, ecapa_loss=0.000166, whisper_loss=0.08849, over 21505.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01038, ecapa_loss=0.0001418, whisper_loss=0.0892, over 3781605.00 frames. ], batch size: 92, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:44:39,701 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-18 17:44:40,359 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.24 vs. limit=15.0 2024-08-18 17:44:45,672 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.732e+01 2.204e+01 2.473e+01 2.754e+01 3.605e+01, threshold=4.946e+01, percent-clipped=0.0 2024-08-18 17:44:45,897 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 8 from Vox, 32 fro AS 2024-08-18 17:44:52,602 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.23 vs. limit=6.0 2024-08-18 17:45:11,223 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4019280.0, ans=0.125 2024-08-18 17:45:19,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4019380.0, ans=0.0 2024-08-18 17:45:22,332 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=4019380.0, ans=0.2 2024-08-18 17:45:32,317 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=4019480.0, ans=10.0 2024-08-18 17:45:35,950 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 23 from LS+wenet, 14 from Vox, 16 fro AS 2024-08-18 17:45:40,594 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 850, loss[loss=0.0887, beats_loss=0.01009, ecapa_loss=0.0001243, whisper_loss=0.07737, over 16188.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01027, ecapa_loss=0.0001421, whisper_loss=0.08942, over 3787681.10 frames. ], batch size: 63, lr: 2.20e-03, grad_scale: 1.152921504606847e+18 2024-08-18 17:45:48,313 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 24 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-18 17:45:59,652 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4019680.0, ans=0.0 2024-08-18 17:46:00,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4019680.0, ans=0.2 2024-08-18 17:46:05,328 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.08 vs. limit=15.0 2024-08-18 17:46:08,449 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4019780.0, ans=0.125 2024-08-18 17:46:14,914 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4019780.0, ans=0.125 2024-08-18 17:46:21,858 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-18 17:46:25,959 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.990e+00 2024-08-18 17:46:29,475 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 27 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-18 17:46:36,823 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 21 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-18 17:46:40,494 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 17 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-18 17:46:42,978 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 900, loss[loss=0.108, beats_loss=0.009654, ecapa_loss=0.000164, whisper_loss=0.09666, over 22160.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01037, ecapa_loss=0.0001408, whisper_loss=0.08915, over 3809151.84 frames. ], batch size: 90, lr: 2.20e-03, grad_scale: 1.152921504606847e+18 2024-08-18 17:46:44,592 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.029e+01 2024-08-18 17:46:47,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4020080.0, ans=0.2 2024-08-18 17:46:48,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4020080.0, ans=0.0 2024-08-18 17:46:50,213 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.258e+01 2.407e+01 2.605e+01 4.279e+01, threshold=4.815e+01, percent-clipped=0.0 2024-08-18 17:46:51,577 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 21 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-18 17:46:52,070 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4020080.0, ans=0.1 2024-08-18 17:46:52,141 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4020080.0, ans=0.05 2024-08-18 17:46:58,770 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.89 vs. limit=15.0 2024-08-18 17:46:59,155 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 25 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-18 17:47:03,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4020180.0, ans=0.1 2024-08-18 17:47:15,134 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.43 vs. limit=15.0 2024-08-18 17:47:15,940 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4020280.0, ans=0.125 2024-08-18 17:47:36,964 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4020480.0, ans=0.125 2024-08-18 17:47:45,253 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 950, loss[loss=0.1037, beats_loss=0.01006, ecapa_loss=0.0001286, whisper_loss=0.09233, over 23399.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01041, ecapa_loss=0.0001396, whisper_loss=0.08881, over 3822488.20 frames. ], batch size: 90, lr: 2.20e-03, grad_scale: 1.152921504606847e+18 2024-08-18 17:47:45,654 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4020580.0, ans=0.0 2024-08-18 17:47:48,413 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.03 vs. limit=15.0 2024-08-18 17:47:50,714 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4020580.0, ans=0.2 2024-08-18 17:47:57,521 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.36 vs. limit=22.5 2024-08-18 17:47:59,131 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 33 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-18 17:48:00,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4020680.0, ans=0.125 2024-08-18 17:48:05,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4020680.0, ans=0.125 2024-08-18 17:48:05,817 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4020680.0, ans=0.04949747468305833 2024-08-18 17:48:14,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4020780.0, ans=0.5 2024-08-18 17:48:28,665 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 14 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-18 17:48:41,297 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 23 from LS+wenet, 22 from Vox, 13 fro AS 2024-08-18 17:48:46,439 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4021080.0, ans=0.125 2024-08-18 17:48:46,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4021080.0, ans=0.07 2024-08-18 17:48:47,227 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 1000, loss[loss=0.1123, beats_loss=0.009382, ecapa_loss=0.000113, whisper_loss=0.1018, over 17980.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01046, ecapa_loss=0.0001398, whisper_loss=0.0887, over 3815311.62 frames. ], batch size: 68, lr: 2.20e-03, grad_scale: 1.152921504606847e+18 2024-08-18 17:48:47,403 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 30 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-18 17:48:49,906 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 12 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-18 17:48:53,087 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=15.36 vs. limit=15.0 2024-08-18 17:48:54,812 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.721e+01 2.179e+01 2.463e+01 2.751e+01 3.706e+01, threshold=4.926e+01, percent-clipped=0.0 2024-08-18 17:48:58,115 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.02 vs. limit=15.0 2024-08-18 17:49:00,382 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4021180.0, ans=0.2 2024-08-18 17:49:30,771 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4021380.0, ans=0.125 2024-08-18 17:49:38,377 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4021480.0, ans=0.0 2024-08-18 17:49:42,200 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4021480.0, ans=0.125 2024-08-18 17:49:47,069 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4021480.0, ans=0.125 2024-08-18 17:49:47,889 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-18 17:49:50,271 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 1050, loss[loss=0.1081, beats_loss=0.008046, ecapa_loss=0.0001612, whisper_loss=0.09846, over 23533.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01046, ecapa_loss=0.0001397, whisper_loss=0.08862, over 3821399.28 frames. ], batch size: 91, lr: 2.20e-03, grad_scale: 1.152921504606847e+18 2024-08-18 17:49:53,326 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4021580.0, ans=0.0 2024-08-18 17:50:02,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4021680.0, ans=0.1 2024-08-18 17:50:04,320 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4021680.0, ans=0.125 2024-08-18 17:50:05,610 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4021680.0, ans=0.125 2024-08-18 17:50:17,922 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 26 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-18 17:50:19,468 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4021780.0, ans=0.2 2024-08-18 17:50:45,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4021980.0, ans=0.125 2024-08-18 17:50:54,000 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 1100, loss[loss=0.07828, beats_loss=0.01254, ecapa_loss=0.000144, whisper_loss=0.0643, over 18958.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01047, ecapa_loss=0.0001392, whisper_loss=0.08904, over 3819839.67 frames. ], batch size: 77, lr: 2.20e-03, grad_scale: 1.152921504606847e+18 2024-08-18 17:50:58,687 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4022080.0, ans=0.125 2024-08-18 17:51:01,043 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 31 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-18 17:51:01,992 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.946e+01 2.358e+01 2.562e+01 2.926e+01 4.573e+02, threshold=5.124e+01, percent-clipped=2.0 2024-08-18 17:51:03,303 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-18 17:51:06,107 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4022180.0, ans=0.125 2024-08-18 17:51:25,516 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-18 17:51:30,838 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-18 17:51:38,541 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4022380.0, ans=0.0 2024-08-18 17:51:57,521 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-18 17:51:57,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4022580.0, ans=0.0 2024-08-18 17:51:57,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4022580.0, ans=0.1 2024-08-18 17:51:58,588 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 1150, loss[loss=0.1001, beats_loss=0.01087, ecapa_loss=0.0001481, whisper_loss=0.0877, over 21578.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01052, ecapa_loss=0.0001386, whisper_loss=0.08864, over 3804933.21 frames. ], batch size: 90, lr: 2.20e-03, grad_scale: 1.152921504606847e+18 2024-08-18 17:52:00,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=4022580.0, ans=15.0 2024-08-18 17:52:01,235 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 13 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-18 17:52:22,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4022680.0, ans=0.125 2024-08-18 17:52:41,972 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 31 from Vox, 38 fro AS 2024-08-18 17:52:47,309 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 28 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-18 17:52:48,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4022880.0, ans=0.125 2024-08-18 17:52:51,360 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-18 17:52:53,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4022980.0, ans=0.0 2024-08-18 17:52:55,774 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-18 17:53:05,531 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 1200, loss[loss=0.0987, beats_loss=0.01156, ecapa_loss=0.0001304, whisper_loss=0.08584, over 18358.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01053, ecapa_loss=0.0001392, whisper_loss=0.08874, over 3803809.20 frames. ], batch size: 75, lr: 2.20e-03, grad_scale: 1.152921504606847e+18 2024-08-18 17:53:13,805 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.256e+01 2.483e+01 2.794e+01 3.745e+01, threshold=4.967e+01, percent-clipped=0.0 2024-08-18 17:53:17,008 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-18 17:53:36,679 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=4023280.0, ans=0.5 2024-08-18 17:53:39,788 WARNING [optim.py:496] (2/4) Scaling gradients by 0.03793781250715256, model_norm_threshold=49.66889572143555 2024-08-18 17:53:39,954 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.273e+05, grad_sumsq=2.197e+07, orig_rms_sq=1.035e-02 2024-08-18 17:53:43,417 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4023280.0, ans=0.0 2024-08-18 17:54:14,831 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 1250, loss[loss=0.1052, beats_loss=0.01023, ecapa_loss=0.0001382, whisper_loss=0.09359, over 16246.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01062, ecapa_loss=0.0001392, whisper_loss=0.0888, over 3802606.48 frames. ], batch size: 63, lr: 2.20e-03, grad_scale: 1.152921504606847e+18 2024-08-18 17:54:31,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4023680.0, ans=0.95 2024-08-18 17:54:36,793 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4023680.0, ans=0.1 2024-08-18 17:54:52,206 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4023780.0, ans=0.125 2024-08-18 17:55:04,804 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.74 vs. limit=15.0 2024-08-18 17:55:09,157 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 21 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-18 17:55:09,472 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4023880.0, ans=0.125 2024-08-18 17:55:29,674 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 1300, loss[loss=0.08226, beats_loss=0.01169, ecapa_loss=0.0001622, whisper_loss=0.06895, over 18778.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01059, ecapa_loss=0.0001403, whisper_loss=0.08842, over 3805223.36 frames. ], batch size: 75, lr: 2.20e-03, grad_scale: 1.152921504606847e+18 2024-08-18 17:55:36,051 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 23 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-18 17:55:36,931 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4024080.0, ans=0.125 2024-08-18 17:55:38,805 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.331e+01 2.596e+01 3.040e+01 1.309e+03, threshold=5.193e+01, percent-clipped=2.0 2024-08-18 17:56:06,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=4024280.0, ans=0.05 2024-08-18 17:56:11,899 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 27 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-18 17:56:13,990 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4024380.0, ans=0.125 2024-08-18 17:56:34,174 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=4024480.0, ans=0.05 2024-08-18 17:56:42,453 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 1350, loss[loss=0.1161, beats_loss=0.0113, ecapa_loss=0.0001226, whisper_loss=0.1036, over 23418.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.0106, ecapa_loss=0.0001392, whisper_loss=0.08909, over 3845458.78 frames. ], batch size: 92, lr: 2.20e-03, grad_scale: 1.152921504606847e+18 2024-08-18 17:56:56,139 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4024680.0, ans=0.125 2024-08-18 17:56:57,652 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.51 vs. limit=10.0 2024-08-18 17:57:00,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4024680.0, ans=0.0 2024-08-18 17:57:10,492 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 21 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-18 17:57:13,089 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-18 17:57:17,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4024780.0, ans=0.0 2024-08-18 17:57:46,935 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 28 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-18 17:57:54,317 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 1400, loss[loss=0.08679, beats_loss=0.01337, ecapa_loss=0.0001269, whisper_loss=0.07215, over 21958.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01055, ecapa_loss=0.0001391, whisper_loss=0.089, over 3843712.83 frames. ], batch size: 88, lr: 2.20e-03, grad_scale: 1.152921504606847e+18 2024-08-18 17:57:59,525 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.15 vs. limit=15.0 2024-08-18 17:58:02,911 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.705e+01 2.174e+01 2.386e+01 2.635e+01 4.112e+01, threshold=4.772e+01, percent-clipped=0.0 2024-08-18 17:58:08,809 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4025180.0, ans=0.1 2024-08-18 17:58:18,657 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 24 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-18 17:58:24,067 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=4025280.0, ans=15.0 2024-08-18 17:58:27,667 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 21 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-18 17:58:31,693 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-18 17:58:34,029 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.785e+05 2024-08-18 17:58:35,226 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 12 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-18 17:58:49,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4025380.0, ans=0.125 2024-08-18 17:59:06,148 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4025580.0, ans=0.125 2024-08-18 17:59:06,907 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 1450, loss[loss=0.1239, beats_loss=0.009655, ecapa_loss=0.000136, whisper_loss=0.1129, over 23917.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01046, ecapa_loss=0.0001399, whisper_loss=0.08932, over 3837916.21 frames. ], batch size: 91, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:59:47,451 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.71 vs. limit=15.0 2024-08-18 18:00:01,163 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4025680.0, ans=0.125 2024-08-18 18:00:11,871 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4025780.0, ans=0.125 2024-08-18 18:00:11,876 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4025780.0, ans=0.2 2024-08-18 18:00:46,946 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 1500, loss[loss=0.1189, beats_loss=0.01031, ecapa_loss=0.0001269, whisper_loss=0.1073, over 24539.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01051, ecapa_loss=0.0001396, whisper_loss=0.08887, over 3829447.19 frames. ], batch size: 92, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 18:00:53,095 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-18 18:00:57,455 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.252e+01 2.528e+01 2.901e+01 4.004e+01, threshold=5.056e+01, percent-clipped=0.0 2024-08-18 18:01:10,649 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 17 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-18 18:01:10,894 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.386e+01 2024-08-18 18:01:18,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4026280.0, ans=0.2 2024-08-18 18:01:19,206 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4026280.0, ans=0.125 2024-08-18 18:01:25,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4026280.0, ans=0.0 2024-08-18 18:01:42,383 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-18 18:01:54,969 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 29 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-18 18:01:59,061 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 1550, loss[loss=0.09825, beats_loss=0.01189, ecapa_loss=0.0001632, whisper_loss=0.08473, over 21312.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01049, ecapa_loss=0.0001393, whisper_loss=0.08948, over 3853893.03 frames. ], batch size: 90, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 18:02:09,733 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 15 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-18 18:02:28,938 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=4026780.0, ans=10.0 2024-08-18 18:02:35,133 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=26.59 vs. limit=22.5 2024-08-18 18:02:45,684 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 17 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-18 18:03:05,343 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 30 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-18 18:03:09,372 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 1600, loss[loss=0.1137, beats_loss=0.007954, ecapa_loss=0.0001381, whisper_loss=0.1043, over 16550.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01049, ecapa_loss=0.0001393, whisper_loss=0.08943, over 3869639.96 frames. ], batch size: 62, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 18:03:19,041 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.241e+01 2.454e+01 2.842e+01 4.448e+01, threshold=4.908e+01, percent-clipped=0.0 2024-08-18 18:03:20,380 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 27 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-18 18:03:24,305 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-18 18:03:24,595 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4027180.0, ans=0.125 2024-08-18 18:03:32,560 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 24 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-18 18:03:46,165 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-18 18:03:49,701 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.94 vs. limit=15.0 2024-08-18 18:03:53,945 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4027380.0, ans=0.125 2024-08-18 18:03:58,017 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4027380.0, ans=0.125 2024-08-18 18:03:59,522 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4027380.0, ans=0.2 2024-08-18 18:04:08,243 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.64 vs. limit=15.0 2024-08-18 18:04:09,480 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4027480.0, ans=0.0 2024-08-18 18:04:17,795 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 28 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-18 18:04:18,838 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 1650, loss[loss=0.1135, beats_loss=0.008975, ecapa_loss=0.0001393, whisper_loss=0.1031, over 20233.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01048, ecapa_loss=0.0001391, whisper_loss=0.08981, over 3882852.69 frames. ], batch size: 77, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 18:04:26,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4027580.0, ans=0.125 2024-08-18 18:04:51,165 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4027780.0, ans=0.2 2024-08-18 18:05:01,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4027880.0, ans=0.04949747468305833 2024-08-18 18:05:26,419 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 1700, loss[loss=0.11, beats_loss=0.009494, ecapa_loss=0.0001687, whisper_loss=0.09885, over 20855.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01042, ecapa_loss=0.0001394, whisper_loss=0.09034, over 3879109.24 frames. ], batch size: 84, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:05:37,772 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.383e+01 2.589e+01 2.926e+01 5.501e+01, threshold=5.178e+01, percent-clipped=1.0 2024-08-18 18:05:41,216 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.56 vs. limit=12.0 2024-08-18 18:05:42,804 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.15 vs. limit=10.0 2024-08-18 18:05:54,163 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4028280.0, ans=0.2 2024-08-18 18:06:18,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4028480.0, ans=0.125 2024-08-18 18:06:23,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4028480.0, ans=0.0 2024-08-18 18:06:29,886 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 35 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-18 18:06:32,297 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 1750, loss[loss=0.07732, beats_loss=0.01155, ecapa_loss=0.000147, whisper_loss=0.0643, over 18229.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0104, ecapa_loss=0.0001397, whisper_loss=0.08962, over 3874724.12 frames. ], batch size: 72, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:06:42,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4028580.0, ans=0.1 2024-08-18 18:06:55,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4028680.0, ans=0.125 2024-08-18 18:07:06,821 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 21 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-18 18:07:12,793 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4028780.0, ans=0.1 2024-08-18 18:07:51,187 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 1800, loss[loss=0.09974, beats_loss=0.01055, ecapa_loss=0.0001258, whisper_loss=0.08793, over 21047.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01033, ecapa_loss=0.0001409, whisper_loss=0.08961, over 3871330.75 frames. ], batch size: 83, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:07:51,635 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4029080.0, ans=0.125 2024-08-18 18:07:57,401 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-18 18:08:03,006 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.849e+01 2.191e+01 2.428e+01 2.696e+01 4.164e+01, threshold=4.856e+01, percent-clipped=0.0 2024-08-18 18:08:03,629 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 20 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-18 18:08:06,275 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-18 18:08:19,345 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 12 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-18 18:08:33,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4029280.0, ans=0.1 2024-08-18 18:08:59,201 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=4029480.0, ans=15.0 2024-08-18 18:09:04,174 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 1850, loss[loss=0.09925, beats_loss=0.0118, ecapa_loss=0.0001398, whisper_loss=0.08605, over 20957.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01034, ecapa_loss=0.0001402, whisper_loss=0.08978, over 3861702.80 frames. ], batch size: 86, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:09:06,361 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.36 vs. limit=15.0 2024-08-18 18:09:07,916 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4029580.0, ans=0.0 2024-08-18 18:09:11,614 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2024-08-18 18:09:21,524 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.91 vs. limit=10.0 2024-08-18 18:09:22,546 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4029680.0, ans=0.0 2024-08-18 18:09:26,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4029680.0, ans=0.1 2024-08-18 18:09:36,347 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 19 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-18 18:09:43,322 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-18 18:09:45,433 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4029780.0, ans=0.04949747468305833 2024-08-18 18:09:52,444 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-18 18:09:59,363 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 23 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-18 18:10:03,011 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.85 vs. limit=15.0 2024-08-18 18:10:13,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4029980.0, ans=0.0 2024-08-18 18:10:15,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4030080.0, ans=0.125 2024-08-18 18:10:15,861 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 1900, loss[loss=0.07608, beats_loss=0.01228, ecapa_loss=0.000145, whisper_loss=0.06235, over 21338.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01038, ecapa_loss=0.0001392, whisper_loss=0.08923, over 3850776.24 frames. ], batch size: 91, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:10:27,312 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.264e+01 2.517e+01 2.852e+01 3.741e+01, threshold=5.034e+01, percent-clipped=0.0 2024-08-18 18:10:34,182 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.66 vs. limit=15.0 2024-08-18 18:10:34,880 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-18 18:10:39,185 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4030180.0, ans=0.1 2024-08-18 18:10:41,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4030180.0, ans=0.125 2024-08-18 18:10:41,621 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4030180.0, ans=0.1 2024-08-18 18:10:55,360 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=4030280.0, ans=10.0 2024-08-18 18:10:59,484 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 17 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-18 18:10:59,803 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4030380.0, ans=0.0 2024-08-18 18:11:01,895 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 27 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-18 18:11:15,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4030480.0, ans=0.2 2024-08-18 18:11:21,014 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.313e+05 2024-08-18 18:11:21,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4030480.0, ans=0.125 2024-08-18 18:11:27,280 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 1950, loss[loss=0.11, beats_loss=0.009509, ecapa_loss=0.000155, whisper_loss=0.09895, over 18459.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01041, ecapa_loss=0.0001392, whisper_loss=0.08935, over 3832939.20 frames. ], batch size: 71, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:11:37,259 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4030580.0, ans=0.125 2024-08-18 18:11:59,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4030780.0, ans=0.125 2024-08-18 18:12:03,359 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 26 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-18 18:12:08,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=4030880.0, ans=0.95 2024-08-18 18:12:28,590 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.92 vs. limit=10.0 2024-08-18 18:12:37,509 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4031080.0, ans=0.1 2024-08-18 18:12:38,243 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 2000, loss[loss=0.08372, beats_loss=0.009992, ecapa_loss=0.0001248, whisper_loss=0.07248, over 19959.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01038, ecapa_loss=0.0001385, whisper_loss=0.08971, over 3814471.73 frames. ], batch size: 77, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:12:40,262 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4031080.0, ans=0.0 2024-08-18 18:12:47,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4031080.0, ans=0.0 2024-08-18 18:12:49,417 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.230e+01 2.583e+01 2.898e+01 3.757e+01, threshold=5.165e+01, percent-clipped=0.0 2024-08-18 18:12:54,966 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-18 18:12:55,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4031180.0, ans=0.1 2024-08-18 18:13:11,193 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-18 18:13:20,782 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 18:13:50,374 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 2050, loss[loss=0.1047, beats_loss=0.01188, ecapa_loss=0.0001222, whisper_loss=0.09159, over 19992.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01038, ecapa_loss=0.0001385, whisper_loss=0.08964, over 3789863.36 frames. ], batch size: 80, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:13:58,681 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 22 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-18 18:14:02,966 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4031680.0, ans=0.2 2024-08-18 18:14:21,283 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 13 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-18 18:14:22,372 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 22 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-18 18:14:38,501 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4031880.0, ans=0.0 2024-08-18 18:14:39,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4031880.0, ans=0.2 2024-08-18 18:14:46,759 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4031980.0, ans=0.125 2024-08-18 18:14:48,537 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4031980.0, ans=0.125 2024-08-18 18:14:56,600 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4031980.0, ans=0.1 2024-08-18 18:14:56,612 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4031980.0, ans=0.07 2024-08-18 18:14:59,859 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 2100, loss[loss=0.06548, beats_loss=0.01101, ecapa_loss=0.0001618, whisper_loss=0.05286, over 13529.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0103, ecapa_loss=0.0001394, whisper_loss=0.08978, over 3766325.38 frames. ], batch size: 56, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:15:03,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4032080.0, ans=0.1 2024-08-18 18:15:03,959 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.65 vs. limit=15.0 2024-08-18 18:15:11,530 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.315e+01 2.589e+01 2.844e+01 4.091e+01, threshold=5.179e+01, percent-clipped=0.0 2024-08-18 18:15:19,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4032180.0, ans=0.1 2024-08-18 18:15:26,728 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4032180.0, ans=0.125 2024-08-18 18:15:33,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4032280.0, ans=0.125 2024-08-18 18:15:56,923 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-18 18:16:05,605 WARNING [optim.py:496] (2/4) Scaling gradients by 0.018764860928058624, model_norm_threshold=51.787418365478516 2024-08-18 18:16:05,773 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.26, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.003e+06, grad_sumsq=1.941e+08, orig_rms_sq=1.032e-02 2024-08-18 18:16:12,686 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 2150, loss[loss=0.1436, beats_loss=0.006426, ecapa_loss=0.0001577, whisper_loss=0.1356, over 17181.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01036, ecapa_loss=0.0001396, whisper_loss=0.08996, over 3778013.17 frames. ], batch size: 63, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:16:14,857 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 30 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-18 18:16:15,132 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 18:16:16,069 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-18 18:16:37,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4032680.0, ans=0.015 2024-08-18 18:16:37,224 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4032680.0, ans=0.2 2024-08-18 18:16:41,025 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-18 18:17:16,611 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4032980.0, ans=0.125 2024-08-18 18:17:18,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4032980.0, ans=0.125 2024-08-18 18:17:21,160 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4032980.0, ans=0.2 2024-08-18 18:17:21,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4032980.0, ans=0.125 2024-08-18 18:17:23,049 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 2200, loss[loss=0.08273, beats_loss=0.01089, ecapa_loss=0.0001341, whisper_loss=0.0705, over 13092.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01037, ecapa_loss=0.0001398, whisper_loss=0.09017, over 3805275.31 frames. ], batch size: 53, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:17:34,268 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.249e+01 2.473e+01 2.891e+01 2.760e+03, threshold=4.945e+01, percent-clipped=3.0 2024-08-18 18:17:36,541 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2024-08-18 18:17:47,067 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4033180.0, ans=0.025 2024-08-18 18:18:04,526 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4033280.0, ans=0.0 2024-08-18 18:18:05,716 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4033380.0, ans=0.125 2024-08-18 18:18:26,853 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4033480.0, ans=0.125 2024-08-18 18:18:34,896 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 2250, loss[loss=0.09683, beats_loss=0.01087, ecapa_loss=0.000142, whisper_loss=0.08453, over 20849.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0104, ecapa_loss=0.0001406, whisper_loss=0.09029, over 3843875.94 frames. ], batch size: 84, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:19:01,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4033780.0, ans=0.0 2024-08-18 18:19:06,593 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4033780.0, ans=0.0 2024-08-18 18:19:07,670 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-18 18:19:11,957 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-18 18:19:13,779 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.47 vs. limit=10.0 2024-08-18 18:19:17,489 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4033880.0, ans=0.0 2024-08-18 18:19:21,703 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4033880.0, ans=0.0 2024-08-18 18:19:44,271 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 2300, loss[loss=0.09894, beats_loss=0.0119, ecapa_loss=0.000118, whisper_loss=0.08586, over 18110.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01041, ecapa_loss=0.0001408, whisper_loss=0.09089, over 3850072.87 frames. ], batch size: 71, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:19:55,754 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.937e+01 2.302e+01 2.462e+01 2.661e+01 7.808e+01, threshold=4.924e+01, percent-clipped=1.0 2024-08-18 18:20:14,807 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-18 18:20:34,136 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 18 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-18 18:20:40,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4034480.0, ans=0.0 2024-08-18 18:20:52,186 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 2350, loss[loss=0.1049, beats_loss=0.01047, ecapa_loss=0.0001642, whisper_loss=0.0928, over 15487.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01044, ecapa_loss=0.0001418, whisper_loss=0.09002, over 3816542.68 frames. ], batch size: 64, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:21:03,307 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 27 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-18 18:21:08,046 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 22 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-18 18:21:08,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4034680.0, ans=0.125 2024-08-18 18:21:08,321 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4034680.0, ans=0.1 2024-08-18 18:21:27,549 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4034780.0, ans=0.125 2024-08-18 18:21:27,583 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4034780.0, ans=0.07 2024-08-18 18:21:28,427 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 19 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-18 18:21:28,714 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4034780.0, ans=0.1 2024-08-18 18:22:01,333 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 2400, loss[loss=0.1017, beats_loss=0.009408, ecapa_loss=0.0001303, whisper_loss=0.09096, over 21219.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01047, ecapa_loss=0.0001413, whisper_loss=0.09019, over 3842201.58 frames. ], batch size: 82, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:22:03,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4035080.0, ans=0.0 2024-08-18 18:22:11,484 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.715e+01 2.282e+01 2.511e+01 2.769e+01 4.268e+01, threshold=5.023e+01, percent-clipped=0.0 2024-08-18 18:22:16,688 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4035180.0, ans=0.2 2024-08-18 18:22:20,348 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 22 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-18 18:22:28,645 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 17 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-18 18:22:40,029 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4035280.0, ans=0.125 2024-08-18 18:23:09,685 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 2450, loss[loss=0.1175, beats_loss=0.009848, ecapa_loss=0.0001448, whisper_loss=0.1062, over 18681.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01044, ecapa_loss=0.0001421, whisper_loss=0.08994, over 3842967.02 frames. ], batch size: 72, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:23:31,504 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-18 18:23:59,331 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-18 18:24:08,799 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.03 vs. limit=22.5 2024-08-18 18:24:30,520 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 2500, loss[loss=0.1202, beats_loss=0.009326, ecapa_loss=0.0001451, whisper_loss=0.1095, over 22187.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01045, ecapa_loss=0.0001417, whisper_loss=0.08999, over 3811325.44 frames. ], batch size: 87, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:24:30,958 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4036080.0, ans=0.0 2024-08-18 18:24:43,876 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4036080.0, ans=0.0 2024-08-18 18:24:43,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4036080.0, ans=0.0 2024-08-18 18:24:44,636 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.285e+01 2.484e+01 2.880e+01 1.174e+02, threshold=4.969e+01, percent-clipped=1.0 2024-08-18 18:24:53,449 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4036180.0, ans=0.0 2024-08-18 18:24:57,561 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 30 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-18 18:25:06,502 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4036280.0, ans=0.0 2024-08-18 18:25:13,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4036280.0, ans=0.125 2024-08-18 18:25:20,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4036280.0, ans=0.125 2024-08-18 18:25:31,717 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4036380.0, ans=0.125 2024-08-18 18:25:41,111 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 17 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-18 18:25:46,790 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-18 18:25:59,114 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.33 vs. limit=22.5 2024-08-18 18:26:03,566 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 2550, loss[loss=0.09348, beats_loss=0.01126, ecapa_loss=0.000125, whisper_loss=0.08097, over 18213.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01045, ecapa_loss=0.0001412, whisper_loss=0.09028, over 3843423.27 frames. ], batch size: 70, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:26:06,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4036580.0, ans=0.1 2024-08-18 18:26:22,070 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4036680.0, ans=0.125 2024-08-18 18:26:39,407 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-18 18:26:49,186 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4036780.0, ans=0.125 2024-08-18 18:27:07,576 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.46 vs. limit=15.0 2024-08-18 18:27:08,392 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4036880.0, ans=0.125 2024-08-18 18:27:23,237 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4036980.0, ans=0.125 2024-08-18 18:27:31,654 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 2600, loss[loss=0.1188, beats_loss=0.009787, ecapa_loss=0.0001461, whisper_loss=0.1075, over 23162.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01046, ecapa_loss=0.0001419, whisper_loss=0.09002, over 3847210.82 frames. ], batch size: 91, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:27:44,000 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.385e+01 2.553e+01 2.816e+01 4.584e+01, threshold=5.105e+01, percent-clipped=0.0 2024-08-18 18:27:44,985 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.23 vs. limit=15.0 2024-08-18 18:27:49,783 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4037180.0, ans=0.125 2024-08-18 18:28:01,327 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.96 vs. limit=22.5 2024-08-18 18:28:04,238 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4037280.0, ans=0.1 2024-08-18 18:28:16,201 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4037280.0, ans=0.125 2024-08-18 18:28:26,101 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.49 vs. limit=15.0 2024-08-18 18:28:36,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4037380.0, ans=0.05 2024-08-18 18:28:59,088 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 2650, loss[loss=0.08686, beats_loss=0.0128, ecapa_loss=0.0001049, whisper_loss=0.07302, over 15513.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01044, ecapa_loss=0.0001427, whisper_loss=0.08959, over 3858639.80 frames. ], batch size: 60, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:29:04,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4037580.0, ans=0.125 2024-08-18 18:29:58,031 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-18 18:30:00,723 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4037880.0, ans=0.2 2024-08-18 18:30:26,729 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.69 vs. limit=12.0 2024-08-18 18:30:35,771 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 2700, loss[loss=0.09053, beats_loss=0.01077, ecapa_loss=0.0001407, whisper_loss=0.07836, over 20920.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01048, ecapa_loss=0.0001425, whisper_loss=0.08934, over 3871462.57 frames. ], batch size: 84, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:30:45,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4038080.0, ans=0.0 2024-08-18 18:30:48,668 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.289e+01 2.510e+01 2.864e+01 4.358e+01, threshold=5.020e+01, percent-clipped=0.0 2024-08-18 18:30:57,914 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4038180.0, ans=0.125 2024-08-18 18:31:00,347 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 22 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-18 18:31:16,655 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 38 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-18 18:31:17,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4038280.0, ans=0.0 2024-08-18 18:31:20,942 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-18 18:31:26,909 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4038380.0, ans=0.125 2024-08-18 18:31:43,276 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.18 vs. limit=12.0 2024-08-18 18:31:49,087 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 2750, loss[loss=0.107, beats_loss=0.01014, ecapa_loss=0.0001599, whisper_loss=0.09528, over 17902.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01051, ecapa_loss=0.0001418, whisper_loss=0.08987, over 3869046.64 frames. ], batch size: 72, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:31:55,224 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4038580.0, ans=0.125 2024-08-18 18:32:02,441 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.82 vs. limit=6.0 2024-08-18 18:32:04,670 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-18 18:32:20,397 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 21 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-18 18:33:01,163 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 2800, loss[loss=0.1112, beats_loss=0.00915, ecapa_loss=0.0001408, whisper_loss=0.1007, over 21669.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0105, ecapa_loss=0.0001405, whisper_loss=0.09021, over 3889559.78 frames. ], batch size: 84, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:33:14,159 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.362e+01 2.601e+01 2.838e+01 4.412e+01, threshold=5.203e+01, percent-clipped=0.0 2024-08-18 18:33:16,742 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.27 vs. limit=15.0 2024-08-18 18:33:17,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=4039180.0, ans=0.5 2024-08-18 18:33:48,390 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-18 18:33:51,671 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 13 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-18 18:33:59,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4039380.0, ans=0.1 2024-08-18 18:34:02,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=4039380.0, ans=0.5 2024-08-18 18:34:28,190 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 2850, loss[loss=0.06177, beats_loss=0.01105, ecapa_loss=0.0001414, whisper_loss=0.04931, over 16512.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01054, ecapa_loss=0.0001404, whisper_loss=0.08993, over 3860799.37 frames. ], batch size: 67, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:34:29,218 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.74 vs. limit=5.0 2024-08-18 18:34:42,067 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.54 vs. limit=22.5 2024-08-18 18:34:45,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4039680.0, ans=0.125 2024-08-18 18:34:53,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4039680.0, ans=0.0 2024-08-18 18:35:11,699 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.whiten.whitening_limit, batch_count=4039780.0, ans=15.0 2024-08-18 18:35:30,345 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-18 18:35:30,709 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4039880.0, ans=0.04949747468305833 2024-08-18 18:36:05,231 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 2900, loss[loss=0.07977, beats_loss=0.01219, ecapa_loss=0.0001561, whisper_loss=0.06603, over 15906.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01051, ecapa_loss=0.0001422, whisper_loss=0.09006, over 3866698.43 frames. ], batch size: 67, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:36:16,359 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4040080.0, ans=0.125 2024-08-18 18:36:21,856 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.256e+01 2.587e+01 2.877e+01 4.773e+01, threshold=5.175e+01, percent-clipped=0.0 2024-08-18 18:36:26,831 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-18 18:36:40,197 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.86 vs. limit=15.0 2024-08-18 18:36:44,318 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.47 vs. limit=15.0 2024-08-18 18:36:48,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4040280.0, ans=0.2 2024-08-18 18:37:21,510 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.35 vs. limit=12.0 2024-08-18 18:37:27,652 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4040480.0, ans=0.125 2024-08-18 18:37:32,890 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 2950, loss[loss=0.1206, beats_loss=0.007482, ecapa_loss=0.0001728, whisper_loss=0.1114, over 13604.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01053, ecapa_loss=0.0001426, whisper_loss=0.08982, over 3887507.92 frames. ], batch size: 53, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:37:44,869 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.23 vs. limit=15.0 2024-08-18 18:37:52,587 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.357e+00 2024-08-18 18:38:04,301 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-18 18:38:24,152 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.77 vs. limit=15.0 2024-08-18 18:38:25,507 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-18 18:38:41,659 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 18:38:54,616 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 24 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-18 18:39:10,829 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.38 vs. limit=15.0 2024-08-18 18:39:11,005 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.42 vs. limit=10.0 2024-08-18 18:39:11,648 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 3000, loss[loss=0.08852, beats_loss=0.01152, ecapa_loss=0.0001136, whisper_loss=0.07586, over 16681.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01049, ecapa_loss=0.000143, whisper_loss=0.08967, over 3878036.50 frames. ], batch size: 65, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:39:11,648 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-18 18:39:56,867 INFO [train_multi_KD3.py:1149] (2/4) Epoch 28, validation on ASR_libri: loss=0.2533, beats_loss=0, ecapa_loss=0.0005204, whisper_loss=0.2481, over 922467.00 frames. 2024-08-18 18:40:14,605 INFO [train_multi_KD3.py:1149] (2/4) Epoch 28, validation on SV_voxceleb1: loss=0.004036, beats_loss=0, ecapa_loss=0.0004036, whisper_loss=0, over 939242.00 frames. 2024-08-18 18:41:48,180 INFO [train_multi_KD3.py:1149] (2/4) Epoch 28, validation on AT_audioset: loss=0.02306, beats_loss=0.02306, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 18:41:48,184 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31584MB 2024-08-18 18:41:48,357 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-18 18:41:52,308 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.09 vs. limit=22.5 2024-08-18 18:41:58,247 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.290e+01 2.609e+01 2.861e+01 5.437e+01, threshold=5.217e+01, percent-clipped=0.0 2024-08-18 18:42:34,941 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-18 18:42:35,107 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4041280.0, ans=0.1 2024-08-18 18:43:06,645 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 35 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-18 18:43:14,055 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 22 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-18 18:43:28,685 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-18 18:43:42,158 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.57 vs. limit=22.5 2024-08-18 18:43:42,579 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 3050, loss[loss=0.09535, beats_loss=0.01165, ecapa_loss=0.0001422, whisper_loss=0.08228, over 22682.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01048, ecapa_loss=0.0001433, whisper_loss=0.09032, over 3910118.52 frames. ], batch size: 90, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:44:36,317 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2024-08-18 18:44:58,464 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.02 vs. limit=6.0 2024-08-18 18:45:13,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4041880.0, ans=0.07 2024-08-18 18:45:15,514 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.09 vs. limit=15.0 2024-08-18 18:45:27,299 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4041980.0, ans=0.1 2024-08-18 18:45:27,557 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.26 vs. limit=15.0 2024-08-18 18:45:42,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4041980.0, ans=0.09899494936611666 2024-08-18 18:45:43,088 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.56 vs. limit=12.0 2024-08-18 18:45:49,931 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 3100, loss[loss=0.1221, beats_loss=0.009607, ecapa_loss=0.0001229, whisper_loss=0.1113, over 17456.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01056, ecapa_loss=0.000143, whisper_loss=0.09024, over 3884376.12 frames. ], batch size: 64, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:45:56,151 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-18 18:46:10,559 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.308e+01 2.550e+01 2.809e+01 3.973e+01, threshold=5.100e+01, percent-clipped=1.0 2024-08-18 18:46:15,191 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-18 18:46:26,228 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2024-08-18 18:46:48,539 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-18 18:47:05,861 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4042380.0, ans=0.2 2024-08-18 18:47:10,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4042380.0, ans=0.0 2024-08-18 18:47:13,667 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4042380.0, ans=0.1 2024-08-18 18:47:18,483 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4042480.0, ans=0.125 2024-08-18 18:47:40,846 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 3150, loss[loss=0.1063, beats_loss=0.00905, ecapa_loss=0.0001526, whisper_loss=0.09573, over 16911.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01056, ecapa_loss=0.0001421, whisper_loss=0.09044, over 3866159.03 frames. ], batch size: 67, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:47:41,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4042580.0, ans=0.125 2024-08-18 18:48:08,226 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4042680.0, ans=0.125 2024-08-18 18:48:13,501 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4042680.0, ans=0.125 2024-08-18 18:48:14,675 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 28 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-18 18:48:20,414 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 19 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-18 18:48:34,116 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 19 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-18 18:48:49,723 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 21 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-18 18:48:51,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4042880.0, ans=0.0 2024-08-18 18:48:56,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4042880.0, ans=0.0 2024-08-18 18:49:03,160 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.82 vs. limit=12.0 2024-08-18 18:49:05,704 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4042980.0, ans=0.1 2024-08-18 18:49:05,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4042980.0, ans=0.0 2024-08-18 18:49:09,780 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-18 18:49:13,142 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4042980.0, ans=0.5 2024-08-18 18:49:16,949 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 3200, loss[loss=0.1036, beats_loss=0.01084, ecapa_loss=0.0001736, whisper_loss=0.09103, over 18773.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01055, ecapa_loss=0.0001423, whisper_loss=0.09082, over 3870129.67 frames. ], batch size: 76, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:49:29,054 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.766e+01 2.339e+01 2.552e+01 3.080e+01 4.481e+01, threshold=5.104e+01, percent-clipped=0.0 2024-08-18 18:49:40,289 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4043180.0, ans=0.0 2024-08-18 18:49:43,222 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 14 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-18 18:49:44,555 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 35 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-18 18:49:51,622 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4043280.0, ans=0.0 2024-08-18 18:49:52,958 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4043280.0, ans=0.0 2024-08-18 18:50:22,848 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4043480.0, ans=0.125 2024-08-18 18:50:34,873 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 3250, loss[loss=0.06519, beats_loss=0.01178, ecapa_loss=0.000152, whisper_loss=0.05189, over 13429.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01059, ecapa_loss=0.0001434, whisper_loss=0.09051, over 3857390.94 frames. ], batch size: 57, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:50:42,378 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.70 vs. limit=15.0 2024-08-18 18:50:57,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4043680.0, ans=0.125 2024-08-18 18:51:09,361 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.90 vs. limit=15.0 2024-08-18 18:51:20,104 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 26 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-18 18:51:39,549 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.46 vs. limit=15.0 2024-08-18 18:51:42,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4043980.0, ans=0.2 2024-08-18 18:51:48,317 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 22 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-18 18:51:50,245 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 3300, loss[loss=0.08902, beats_loss=0.01339, ecapa_loss=0.0001176, whisper_loss=0.07446, over 22244.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01058, ecapa_loss=0.0001422, whisper_loss=0.09071, over 3880158.78 frames. ], batch size: 89, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:51:56,983 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 21 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-18 18:52:03,004 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+01 2.380e+01 2.621e+01 2.872e+01 4.395e+01, threshold=5.242e+01, percent-clipped=0.0 2024-08-18 18:52:03,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4044080.0, ans=0.09899494936611666 2024-08-18 18:52:13,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4044180.0, ans=0.0 2024-08-18 18:52:39,879 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 12 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-18 18:52:40,226 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4044380.0, ans=0.125 2024-08-18 18:52:42,774 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-18 18:53:11,115 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 3350, loss[loss=0.09614, beats_loss=0.01113, ecapa_loss=0.0001115, whisper_loss=0.0839, over 17047.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01058, ecapa_loss=0.0001425, whisper_loss=0.09, over 3859629.48 frames. ], batch size: 66, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:53:13,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4044580.0, ans=0.125 2024-08-18 18:53:13,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4044580.0, ans=10.0 2024-08-18 18:53:35,111 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-18 18:53:51,223 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-18 18:54:05,539 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-18 18:54:16,251 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-18 18:54:28,106 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 3400, loss[loss=0.1176, beats_loss=0.009756, ecapa_loss=0.0001346, whisper_loss=0.1065, over 14194.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01055, ecapa_loss=0.0001426, whisper_loss=0.09009, over 3872868.39 frames. ], batch size: 55, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:54:40,790 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.177e+01 2.415e+01 2.723e+01 4.499e+01, threshold=4.829e+01, percent-clipped=0.0 2024-08-18 18:54:46,134 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 16 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-18 18:54:56,228 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4045180.0, ans=0.0 2024-08-18 18:55:01,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4045280.0, ans=0.125 2024-08-18 18:55:02,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4045280.0, ans=0.125 2024-08-18 18:55:23,437 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-18 18:55:29,563 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4045380.0, ans=0.0 2024-08-18 18:55:34,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4045480.0, ans=0.0 2024-08-18 18:55:37,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4045480.0, ans=0.0 2024-08-18 18:55:45,855 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4045480.0, ans=0.1 2024-08-18 18:55:51,603 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 3450, loss[loss=0.07666, beats_loss=0.01262, ecapa_loss=0.000115, whisper_loss=0.0629, over 19176.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01059, ecapa_loss=0.0001425, whisper_loss=0.08963, over 3886188.18 frames. ], batch size: 76, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:56:03,529 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.48 vs. limit=15.0 2024-08-18 18:56:09,361 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-18 18:56:27,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4045780.0, ans=0.125 2024-08-18 18:56:37,972 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-18 18:56:39,844 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 32 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-18 18:56:52,167 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 31 from Vox, 37 fro AS 2024-08-18 18:57:02,131 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4045980.0, ans=0.125 2024-08-18 18:57:10,935 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.70 vs. limit=15.0 2024-08-18 18:57:11,328 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 3500, loss[loss=0.09345, beats_loss=0.00864, ecapa_loss=0.0001445, whisper_loss=0.08337, over 14918.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01054, ecapa_loss=0.0001435, whisper_loss=0.08989, over 3888086.00 frames. ], batch size: 60, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:57:23,075 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.329e+01 2.522e+01 2.820e+01 3.952e+01, threshold=5.043e+01, percent-clipped=0.0 2024-08-18 18:57:23,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4046080.0, ans=0.1 2024-08-18 18:57:26,165 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-18 18:57:32,955 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4046180.0, ans=0.125 2024-08-18 18:57:46,649 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-18 18:58:11,063 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4046380.0, ans=0.0 2024-08-18 18:58:12,631 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4046380.0, ans=0.125 2024-08-18 18:58:32,230 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 3550, loss[loss=0.1169, beats_loss=0.009363, ecapa_loss=0.0001664, whisper_loss=0.1058, over 23584.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01065, ecapa_loss=0.0001434, whisper_loss=0.08855, over 3883167.13 frames. ], batch size: 93, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:58:33,011 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4046580.0, ans=0.125 2024-08-18 18:58:46,519 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4046580.0, ans=0.125 2024-08-18 18:58:58,143 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4046680.0, ans=0.2 2024-08-18 18:58:58,430 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.26 vs. limit=15.0 2024-08-18 18:59:30,362 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4046880.0, ans=0.0 2024-08-18 18:59:48,511 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 23 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-18 18:59:57,116 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 3600, loss[loss=0.09918, beats_loss=0.01059, ecapa_loss=0.0001513, whisper_loss=0.08708, over 18225.00 frames. ], tot_loss[loss=0.101, beats_loss=0.0106, ecapa_loss=0.0001432, whisper_loss=0.08892, over 3863051.75 frames. ], batch size: 74, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 19:00:06,621 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4047080.0, ans=0.125 2024-08-18 19:00:08,870 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.393e+01 2.591e+01 2.982e+01 4.231e+01, threshold=5.182e+01, percent-clipped=0.0 2024-08-18 19:00:23,051 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4047180.0, ans=0.2 2024-08-18 19:00:28,871 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4047280.0, ans=0.125 2024-08-18 19:00:34,164 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-18 19:00:36,156 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4047280.0, ans=0.125 2024-08-18 19:00:41,929 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 20 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-18 19:00:43,798 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4047380.0, ans=0.2 2024-08-18 19:00:53,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4047380.0, ans=0.125 2024-08-18 19:00:53,423 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4047380.0, ans=0.0 2024-08-18 19:00:57,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4047480.0, ans=0.125 2024-08-18 19:01:02,981 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4047480.0, ans=0.0 2024-08-18 19:01:05,097 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.21 vs. limit=15.0 2024-08-18 19:01:09,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4047480.0, ans=0.125 2024-08-18 19:01:11,819 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 3650, loss[loss=0.1134, beats_loss=0.01031, ecapa_loss=0.0001668, whisper_loss=0.1014, over 21852.00 frames. ], tot_loss[loss=0.101, beats_loss=0.0106, ecapa_loss=0.0001432, whisper_loss=0.08901, over 3861003.42 frames. ], batch size: 90, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 19:01:15,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4047580.0, ans=0.1 2024-08-18 19:01:45,157 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4047780.0, ans=0.1 2024-08-18 19:01:55,754 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 27 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-18 19:01:59,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4047780.0, ans=0.0 2024-08-18 19:02:15,898 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.40 vs. limit=22.5 2024-08-18 19:02:29,141 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4047980.0, ans=0.04949747468305833 2024-08-18 19:02:35,148 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 3700, loss[loss=0.1133, beats_loss=0.01093, ecapa_loss=0.0001192, whisper_loss=0.1012, over 21392.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01052, ecapa_loss=0.0001433, whisper_loss=0.08983, over 3866907.34 frames. ], batch size: 83, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:02:47,995 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.250e+01 2.400e+01 2.703e+01 3.510e+01, threshold=4.800e+01, percent-clipped=0.0 2024-08-18 19:02:53,685 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4048180.0, ans=0.125 2024-08-18 19:03:32,395 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-18 19:03:49,234 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.99 vs. limit=10.0 2024-08-18 19:03:53,226 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 3750, loss[loss=0.1544, beats_loss=0.007129, ecapa_loss=0.0001242, whisper_loss=0.1461, over 25447.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01051, ecapa_loss=0.0001429, whisper_loss=0.08999, over 3869532.52 frames. ], batch size: 91, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:04:00,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4048580.0, ans=0.0 2024-08-18 19:04:27,680 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.28 vs. limit=15.0 2024-08-18 19:04:34,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4048780.0, ans=0.1 2024-08-18 19:04:51,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4048880.0, ans=0.125 2024-08-18 19:05:00,497 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4048980.0, ans=0.2 2024-08-18 19:05:05,531 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4048980.0, ans=0.1 2024-08-18 19:05:18,252 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 3800, loss[loss=0.08806, beats_loss=0.01185, ecapa_loss=0.0001531, whisper_loss=0.07468, over 21440.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01059, ecapa_loss=0.0001432, whisper_loss=0.08924, over 3841312.35 frames. ], batch size: 94, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:05:25,483 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 15 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-18 19:05:31,267 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.387e+01 2.639e+01 2.992e+01 4.413e+01, threshold=5.277e+01, percent-clipped=0.0 2024-08-18 19:05:37,194 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 20 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-18 19:05:38,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4049180.0, ans=0.0 2024-08-18 19:05:45,885 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.34 vs. limit=15.0 2024-08-18 19:05:48,333 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 36 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-18 19:06:01,483 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.52 vs. limit=22.5 2024-08-18 19:06:11,186 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-18 19:06:23,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4049380.0, ans=0.0 2024-08-18 19:06:25,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4049480.0, ans=0.125 2024-08-18 19:06:32,399 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.21 vs. limit=15.0 2024-08-18 19:06:39,859 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 3850, loss[loss=0.1034, beats_loss=0.01214, ecapa_loss=0.0001397, whisper_loss=0.0899, over 22824.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01061, ecapa_loss=0.0001422, whisper_loss=0.08953, over 3852011.39 frames. ], batch size: 93, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:06:48,211 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4049580.0, ans=0.125 2024-08-18 19:07:09,460 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4049780.0, ans=0.125 2024-08-18 19:07:18,467 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4049880.0, ans=0.0 2024-08-18 19:07:19,960 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4049880.0, ans=0.125 2024-08-18 19:07:44,686 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 17 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-18 19:07:45,025 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4050080.0, ans=0.0 2024-08-18 19:07:45,743 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 3900, loss[loss=0.0998, beats_loss=0.01094, ecapa_loss=0.0001474, whisper_loss=0.08738, over 13804.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01057, ecapa_loss=0.0001426, whisper_loss=0.09001, over 3881891.55 frames. ], batch size: 56, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:07:56,324 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.489e+01 2.786e+01 3.014e+01 3.884e+02, threshold=5.572e+01, percent-clipped=4.0 2024-08-18 19:08:01,950 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-18 19:08:02,175 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4050180.0, ans=0.125 2024-08-18 19:08:04,788 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-18 19:08:10,209 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4050180.0, ans=0.2 2024-08-18 19:08:11,751 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2024-08-18 19:08:23,143 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.41 vs. limit=22.5 2024-08-18 19:08:31,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4050380.0, ans=0.0 2024-08-18 19:08:37,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4050480.0, ans=0.0 2024-08-18 19:08:46,749 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 19 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-18 19:08:50,759 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-18 19:08:51,194 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.56 vs. limit=22.5 2024-08-18 19:08:51,723 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 3950, loss[loss=0.1131, beats_loss=0.009446, ecapa_loss=0.0001559, whisper_loss=0.1021, over 21859.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01051, ecapa_loss=0.0001431, whisper_loss=0.09029, over 3888627.25 frames. ], batch size: 89, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:08:56,935 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4050580.0, ans=0.0 2024-08-18 19:09:02,621 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=4050580.0, ans=15.0 2024-08-18 19:09:11,077 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 15 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-18 19:09:20,278 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 30 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-18 19:09:38,847 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4050880.0, ans=0.125 2024-08-18 19:09:47,887 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4050980.0, ans=0.0 2024-08-18 19:09:56,259 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 4000, loss[loss=0.1018, beats_loss=0.01013, ecapa_loss=0.000139, whisper_loss=0.09025, over 22682.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01053, ecapa_loss=0.0001424, whisper_loss=0.09054, over 3920840.84 frames. ], batch size: 89, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:09:59,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4051080.0, ans=0.04949747468305833 2024-08-18 19:10:06,024 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 20 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-18 19:10:06,292 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4051080.0, ans=0.0 2024-08-18 19:10:06,941 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.265e+01 2.552e+01 2.868e+01 4.279e+01, threshold=5.105e+01, percent-clipped=0.0 2024-08-18 19:10:07,165 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 24 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-18 19:10:07,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4051080.0, ans=0.125 2024-08-18 19:10:08,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4051180.0, ans=0.0 2024-08-18 19:10:21,396 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-18 19:10:28,629 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.52 vs. limit=15.0 2024-08-18 19:10:57,896 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4051480.0, ans=0.1 2024-08-18 19:11:00,659 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4051480.0, ans=0.125 2024-08-18 19:11:02,673 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 4050, loss[loss=0.07755, beats_loss=0.01375, ecapa_loss=0.0001233, whisper_loss=0.06257, over 17645.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0105, ecapa_loss=0.0001429, whisper_loss=0.09048, over 3895697.23 frames. ], batch size: 73, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:11:05,618 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-18 19:11:07,149 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4051580.0, ans=0.04949747468305833 2024-08-18 19:11:33,610 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4051780.0, ans=0.2 2024-08-18 19:11:42,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4051880.0, ans=0.125 2024-08-18 19:11:44,227 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4051880.0, ans=0.125 2024-08-18 19:12:04,599 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.56 vs. limit=10.0 2024-08-18 19:12:08,498 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4052080.0, ans=0.0 2024-08-18 19:12:09,234 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 4100, loss[loss=0.1084, beats_loss=0.009968, ecapa_loss=0.0001416, whisper_loss=0.09697, over 18888.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01048, ecapa_loss=0.0001432, whisper_loss=0.0904, over 3895667.82 frames. ], batch size: 75, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:12:11,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4052080.0, ans=0.1 2024-08-18 19:12:19,688 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.351e+01 2.549e+01 2.874e+01 5.187e+01, threshold=5.098e+01, percent-clipped=1.0 2024-08-18 19:12:22,847 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4052180.0, ans=0.125 2024-08-18 19:12:34,356 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-18 19:12:37,302 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4052280.0, ans=0.1 2024-08-18 19:12:39,850 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.06 vs. limit=22.5 2024-08-18 19:12:53,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4052380.0, ans=0.0 2024-08-18 19:13:11,626 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-18 19:13:15,115 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 4150, loss[loss=0.1027, beats_loss=0.01184, ecapa_loss=0.0001719, whisper_loss=0.08916, over 21233.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01055, ecapa_loss=0.0001432, whisper_loss=0.09003, over 3904403.43 frames. ], batch size: 90, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:13:23,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4052580.0, ans=0.0 2024-08-18 19:13:31,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4052680.0, ans=0.0 2024-08-18 19:13:42,031 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 26 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-18 19:13:43,417 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 17 from LS+wenet, 28 from Vox, 24 fro AS 2024-08-18 19:13:49,408 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.98 vs. limit=15.0 2024-08-18 19:13:54,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4052880.0, ans=0.2 2024-08-18 19:14:04,463 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 23 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-18 19:14:10,288 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 34 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-18 19:14:11,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4052980.0, ans=0.2 2024-08-18 19:14:14,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4052980.0, ans=0.0 2024-08-18 19:14:19,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4052980.0, ans=0.2 2024-08-18 19:14:21,285 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 4200, loss[loss=0.1268, beats_loss=0.008495, ecapa_loss=0.0001201, whisper_loss=0.1171, over 24736.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01058, ecapa_loss=0.0001426, whisper_loss=0.08989, over 3904124.55 frames. ], batch size: 92, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:14:32,301 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.283e+01 2.553e+01 2.911e+01 4.394e+01, threshold=5.106e+01, percent-clipped=0.0 2024-08-18 19:14:47,970 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-18 19:15:09,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4053380.0, ans=0.0 2024-08-18 19:15:13,195 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4053480.0, ans=0.2 2024-08-18 19:15:16,921 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4053480.0, ans=0.2 2024-08-18 19:15:23,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4053480.0, ans=0.125 2024-08-18 19:15:27,043 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 4250, loss[loss=0.07286, beats_loss=0.0117, ecapa_loss=0.0001162, whisper_loss=0.06, over 19356.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01056, ecapa_loss=0.0001427, whisper_loss=0.08981, over 3890643.08 frames. ], batch size: 78, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:15:34,151 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-18 19:15:51,889 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4053680.0, ans=0.125 2024-08-18 19:16:01,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4053780.0, ans=0.0 2024-08-18 19:16:06,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4053880.0, ans=0.2 2024-08-18 19:16:11,969 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4053880.0, ans=0.125 2024-08-18 19:16:18,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4053880.0, ans=0.125 2024-08-18 19:16:27,781 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 39 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-18 19:16:33,927 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 4300, loss[loss=0.08802, beats_loss=0.009841, ecapa_loss=0.0001656, whisper_loss=0.07652, over 21273.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01044, ecapa_loss=0.0001428, whisper_loss=0.09068, over 3904278.93 frames. ], batch size: 91, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:16:37,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4054080.0, ans=0.0 2024-08-18 19:16:44,515 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.769e+01 2.278e+01 2.525e+01 2.871e+01 4.782e+01, threshold=5.050e+01, percent-clipped=0.0 2024-08-18 19:17:10,297 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4054280.0, ans=0.0 2024-08-18 19:17:23,447 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4054380.0, ans=0.05 2024-08-18 19:17:28,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4054480.0, ans=0.1 2024-08-18 19:17:40,321 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 4350, loss[loss=0.1288, beats_loss=0.008951, ecapa_loss=0.000179, whisper_loss=0.1181, over 21821.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01048, ecapa_loss=0.0001432, whisper_loss=0.09014, over 3898382.65 frames. ], batch size: 91, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:17:40,784 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4054580.0, ans=0.125 2024-08-18 19:17:47,470 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.360e+05 2024-08-18 19:17:57,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4054680.0, ans=0.125 2024-08-18 19:18:00,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4054680.0, ans=0.125 2024-08-18 19:18:10,560 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.77 vs. limit=22.5 2024-08-18 19:18:11,519 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4054780.0, ans=0.0 2024-08-18 19:18:23,989 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 17 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-18 19:18:26,043 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.02 vs. limit=10.0 2024-08-18 19:18:26,559 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 20 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-18 19:18:45,578 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 4400, loss[loss=0.09029, beats_loss=0.01281, ecapa_loss=0.0001311, whisper_loss=0.07617, over 22533.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01052, ecapa_loss=0.0001422, whisper_loss=0.09018, over 3920688.99 frames. ], batch size: 92, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:18:51,457 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4055080.0, ans=0.0 2024-08-18 19:18:55,623 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.96 vs. limit=15.0 2024-08-18 19:18:55,955 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.724e+01 2.281e+01 2.472e+01 2.660e+01 4.951e+01, threshold=4.945e+01, percent-clipped=0.0 2024-08-18 19:18:56,166 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 34 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-18 19:19:00,355 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4055180.0, ans=0.125 2024-08-18 19:19:03,215 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.10 vs. limit=15.0 2024-08-18 19:19:23,010 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=15.21 vs. limit=15.0 2024-08-18 19:19:25,904 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.59 vs. limit=15.0 2024-08-18 19:19:26,869 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 29 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-18 19:19:39,957 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 23 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-18 19:19:41,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4055480.0, ans=0.1 2024-08-18 19:19:42,586 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 28 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-18 19:19:52,149 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 4450, loss[loss=0.1125, beats_loss=0.008284, ecapa_loss=0.000153, whisper_loss=0.1027, over 22510.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01039, ecapa_loss=0.0001427, whisper_loss=0.09093, over 3910821.56 frames. ], batch size: 90, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:19:54,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4055580.0, ans=0.2 2024-08-18 19:19:55,014 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 17 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-18 19:20:10,688 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=4055680.0, ans=0.02 2024-08-18 19:20:14,891 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.22 vs. limit=6.0 2024-08-18 19:20:43,212 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.16 vs. limit=15.0 2024-08-18 19:20:47,440 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 27 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-18 19:20:59,952 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 4500, loss[loss=0.08713, beats_loss=0.01113, ecapa_loss=0.0001714, whisper_loss=0.07428, over 20388.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01046, ecapa_loss=0.000142, whisper_loss=0.09048, over 3917292.45 frames. ], batch size: 85, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:21:10,759 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.274e+01 2.537e+01 2.836e+01 4.716e+01, threshold=5.074e+01, percent-clipped=0.0 2024-08-18 19:21:26,973 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-18 19:21:33,841 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 32 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-18 19:21:35,541 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4056280.0, ans=0.125 2024-08-18 19:21:35,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4056280.0, ans=0.125 2024-08-18 19:21:40,600 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 20 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-18 19:21:50,455 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4056380.0, ans=0.0 2024-08-18 19:21:52,617 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 13 from Vox, 49 fro AS 2024-08-18 19:21:54,078 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4056480.0, ans=0.2 2024-08-18 19:21:59,590 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4056480.0, ans=0.0 2024-08-18 19:22:07,182 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 4550, loss[loss=0.09433, beats_loss=0.009362, ecapa_loss=0.0001524, whisper_loss=0.08344, over 22054.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01053, ecapa_loss=0.0001417, whisper_loss=0.09011, over 3925163.34 frames. ], batch size: 90, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:22:07,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4056580.0, ans=0.04949747468305833 2024-08-18 19:22:08,693 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-18 19:22:11,230 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-18 19:22:12,586 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-18 19:22:26,945 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.88 vs. limit=15.0 2024-08-18 19:22:31,175 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-18 19:22:34,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4056780.0, ans=0.125 2024-08-18 19:22:48,610 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.76 vs. limit=15.0 2024-08-18 19:22:53,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4056880.0, ans=0.125 2024-08-18 19:22:55,938 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4056880.0, ans=0.1 2024-08-18 19:23:14,102 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 4600, loss[loss=0.0988, beats_loss=0.009828, ecapa_loss=0.0001603, whisper_loss=0.08737, over 22223.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01043, ecapa_loss=0.0001433, whisper_loss=0.09009, over 3894154.56 frames. ], batch size: 91, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:23:25,097 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.308e+01 2.504e+01 2.960e+01 4.674e+01, threshold=5.007e+01, percent-clipped=0.0 2024-08-18 19:23:29,175 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 17 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-18 19:23:34,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4057180.0, ans=0.1 2024-08-18 19:23:37,091 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 19 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-18 19:23:37,310 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4057180.0, ans=0.0 2024-08-18 19:23:41,727 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.12 vs. limit=22.5 2024-08-18 19:24:06,188 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4057480.0, ans=0.1 2024-08-18 19:24:20,248 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 4650, loss[loss=0.1106, beats_loss=0.01019, ecapa_loss=0.0001831, whisper_loss=0.09856, over 22063.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01051, ecapa_loss=0.0001434, whisper_loss=0.0896, over 3902787.86 frames. ], batch size: 92, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:24:37,747 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4057680.0, ans=0.125 2024-08-18 19:24:48,270 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 30 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-18 19:24:51,726 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.01 vs. limit=22.5 2024-08-18 19:24:52,923 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=4057780.0, ans=15.0 2024-08-18 19:24:59,442 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4057880.0, ans=0.1 2024-08-18 19:25:05,200 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.22 vs. limit=22.5 2024-08-18 19:25:07,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4057880.0, ans=0.0 2024-08-18 19:25:09,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4057880.0, ans=0.0 2024-08-18 19:25:26,290 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 4700, loss[loss=0.1019, beats_loss=0.008455, ecapa_loss=0.0001329, whisper_loss=0.09216, over 19091.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0105, ecapa_loss=0.0001433, whisper_loss=0.08996, over 3882987.48 frames. ], batch size: 73, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:25:28,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4058080.0, ans=0.125 2024-08-18 19:25:35,764 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-18 19:25:36,746 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.335e+01 2.621e+01 2.898e+01 4.887e+01, threshold=5.242e+01, percent-clipped=0.0 2024-08-18 19:25:36,997 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-18 19:25:41,789 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.43 vs. limit=15.0 2024-08-18 19:25:57,513 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.84 vs. limit=15.0 2024-08-18 19:25:58,366 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4058280.0, ans=0.125 2024-08-18 19:26:11,194 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-18 19:26:15,641 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4058380.0, ans=0.1 2024-08-18 19:26:17,850 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 24 from LS+wenet, 17 from Vox, 17 fro AS 2024-08-18 19:26:32,103 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 4750, loss[loss=0.1108, beats_loss=0.009257, ecapa_loss=0.000148, whisper_loss=0.1, over 23327.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01055, ecapa_loss=0.0001423, whisper_loss=0.09021, over 3887320.71 frames. ], batch size: 94, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:26:34,316 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4058580.0, ans=0.0 2024-08-18 19:26:40,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4058580.0, ans=0.125 2024-08-18 19:26:41,916 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 34 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-18 19:26:52,332 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 21 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-18 19:27:03,981 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.33 vs. limit=22.5 2024-08-18 19:27:38,350 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 4800, loss[loss=0.1024, beats_loss=0.01173, ecapa_loss=0.0001576, whisper_loss=0.08908, over 19414.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01058, ecapa_loss=0.0001428, whisper_loss=0.09, over 3913167.35 frames. ], batch size: 78, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:27:40,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4059080.0, ans=0.125 2024-08-18 19:27:49,124 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.817e+01 2.300e+01 2.541e+01 2.799e+01 4.808e+02, threshold=5.082e+01, percent-clipped=2.0 2024-08-18 19:27:49,592 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4059080.0, ans=0.125 2024-08-18 19:27:50,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4059180.0, ans=0.125 2024-08-18 19:28:36,021 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 21 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-18 19:28:45,694 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 4850, loss[loss=0.124, beats_loss=0.008424, ecapa_loss=0.0001187, whisper_loss=0.1144, over 25163.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01055, ecapa_loss=0.0001435, whisper_loss=0.08996, over 3894514.40 frames. ], batch size: 91, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:28:47,359 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4059580.0, ans=0.1 2024-08-18 19:28:56,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4059580.0, ans=0.0 2024-08-18 19:29:00,188 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4059680.0, ans=0.0 2024-08-18 19:29:05,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4059680.0, ans=0.025 2024-08-18 19:29:29,121 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-18 19:29:30,483 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4059880.0, ans=0.2 2024-08-18 19:29:31,861 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4059880.0, ans=0.125 2024-08-18 19:29:38,391 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4059980.0, ans=0.0 2024-08-18 19:29:44,729 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-18 19:29:46,071 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 18 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-18 19:29:48,621 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 27 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-18 19:29:50,972 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 4900, loss[loss=0.09609, beats_loss=0.01058, ecapa_loss=9.933e-05, whisper_loss=0.08451, over 15791.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01054, ecapa_loss=0.000144, whisper_loss=0.08999, over 3875005.00 frames. ], batch size: 58, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:29:55,622 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4060080.0, ans=0.0 2024-08-18 19:30:00,354 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-18 19:30:01,412 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.212e+01 2.467e+01 2.752e+01 9.926e+01, threshold=4.934e+01, percent-clipped=3.0 2024-08-18 19:30:03,447 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4060180.0, ans=0.1 2024-08-18 19:30:07,106 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 39 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-18 19:30:15,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4060180.0, ans=0.125 2024-08-18 19:30:25,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4060280.0, ans=0.125 2024-08-18 19:30:30,284 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.12 vs. limit=22.5 2024-08-18 19:30:47,128 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4060480.0, ans=0.125 2024-08-18 19:30:57,401 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 4950, loss[loss=0.08257, beats_loss=0.01232, ecapa_loss=0.0001366, whisper_loss=0.06888, over 20878.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.0106, ecapa_loss=0.0001442, whisper_loss=0.08885, over 3874718.36 frames. ], batch size: 85, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:31:08,164 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-18 19:31:18,077 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4060680.0, ans=0.0 2024-08-18 19:31:31,428 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.57 vs. limit=22.5 2024-08-18 19:31:40,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4060880.0, ans=0.1 2024-08-18 19:31:58,812 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 37 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-18 19:32:03,831 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 5000, loss[loss=0.1102, beats_loss=0.009049, ecapa_loss=0.0001633, whisper_loss=0.09949, over 22958.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01057, ecapa_loss=0.0001444, whisper_loss=0.08971, over 3870328.58 frames. ], batch size: 95, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:32:14,149 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.772e+01 2.344e+01 2.610e+01 2.936e+01 3.838e+01, threshold=5.220e+01, percent-clipped=0.0 2024-08-18 19:32:14,680 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4061080.0, ans=0.125 2024-08-18 19:32:21,033 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 22 from LS+wenet, 29 from Vox, 43 fro AS 2024-08-18 19:32:24,925 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 19 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-18 19:32:25,206 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4061180.0, ans=0.1 2024-08-18 19:32:32,141 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4061280.0, ans=0.1 2024-08-18 19:32:34,700 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4061280.0, ans=0.125 2024-08-18 19:32:49,685 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.82 vs. limit=15.0 2024-08-18 19:32:55,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4061480.0, ans=0.125 2024-08-18 19:32:58,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4061480.0, ans=0.125 2024-08-18 19:33:02,886 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 19 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-18 19:33:09,205 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 5050, loss[loss=0.1053, beats_loss=0.01114, ecapa_loss=0.0001068, whisper_loss=0.09311, over 23332.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01064, ecapa_loss=0.0001429, whisper_loss=0.09004, over 3879658.98 frames. ], batch size: 90, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:33:10,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4061580.0, ans=0.0 2024-08-18 19:33:13,431 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4061580.0, ans=0.125 2024-08-18 19:33:13,863 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.69 vs. limit=22.5 2024-08-18 19:33:14,459 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 22 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-18 19:33:27,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4061680.0, ans=0.0 2024-08-18 19:33:37,602 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 26 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-18 19:33:45,778 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4061780.0, ans=0.125 2024-08-18 19:33:47,412 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4061880.0, ans=0.0 2024-08-18 19:33:51,759 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.65 vs. limit=22.5 2024-08-18 19:33:58,970 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4061880.0, ans=0.125 2024-08-18 19:34:04,346 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4061980.0, ans=0.125 2024-08-18 19:34:04,388 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4061980.0, ans=0.1 2024-08-18 19:34:08,553 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.27 vs. limit=15.0 2024-08-18 19:34:14,212 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 5100, loss[loss=0.1042, beats_loss=0.01043, ecapa_loss=0.000161, whisper_loss=0.09217, over 22586.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01064, ecapa_loss=0.0001423, whisper_loss=0.09007, over 3883153.70 frames. ], batch size: 91, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:34:24,692 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.714e+01 2.303e+01 2.611e+01 2.910e+01 2.012e+02, threshold=5.222e+01, percent-clipped=3.0 2024-08-18 19:34:27,532 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 31 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-18 19:34:51,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4062280.0, ans=0.1 2024-08-18 19:34:52,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4062380.0, ans=0.0 2024-08-18 19:35:03,996 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-18 19:35:19,339 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 5150, loss[loss=0.1053, beats_loss=0.01124, ecapa_loss=0.000161, whisper_loss=0.09244, over 21762.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01072, ecapa_loss=0.0001418, whisper_loss=0.0897, over 3895541.64 frames. ], batch size: 90, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:35:47,234 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=8.043e+00 2024-08-18 19:35:49,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4062780.0, ans=0.1 2024-08-18 19:36:03,818 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4062880.0, ans=0.0 2024-08-18 19:36:08,190 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.24 vs. limit=22.5 2024-08-18 19:36:24,444 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 5200, loss[loss=0.1138, beats_loss=0.01029, ecapa_loss=0.0001298, whisper_loss=0.1023, over 22833.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01055, ecapa_loss=0.0001432, whisper_loss=0.09051, over 3905015.59 frames. ], batch size: 89, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:36:30,251 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4063080.0, ans=0.0 2024-08-18 19:36:32,707 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-18 19:36:34,933 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.237e+01 2.499e+01 2.869e+01 3.918e+01, threshold=4.998e+01, percent-clipped=0.0 2024-08-18 19:36:38,870 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 19 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-18 19:36:43,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4063180.0, ans=0.125 2024-08-18 19:36:48,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4063180.0, ans=0.125 2024-08-18 19:36:50,754 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-18 19:36:58,003 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 10 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-18 19:37:00,843 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4063280.0, ans=0.1 2024-08-18 19:37:00,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4063280.0, ans=0.0 2024-08-18 19:37:29,312 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 5250, loss[loss=0.09417, beats_loss=0.009295, ecapa_loss=0.0001472, whisper_loss=0.0834, over 22301.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01052, ecapa_loss=0.0001426, whisper_loss=0.09092, over 3903016.33 frames. ], batch size: 88, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:37:35,143 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4063580.0, ans=0.1 2024-08-18 19:37:36,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=4063580.0, ans=0.025 2024-08-18 19:37:38,560 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 26 from LS+wenet, 14 from Vox, 46 fro AS 2024-08-18 19:37:41,392 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-18 19:38:06,391 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-18 19:38:33,980 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-18 19:38:34,949 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 5300, loss[loss=0.1023, beats_loss=0.01036, ecapa_loss=0.0001602, whisper_loss=0.09031, over 20322.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01051, ecapa_loss=0.000143, whisper_loss=0.09062, over 3899722.44 frames. ], batch size: 88, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:38:38,735 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.54 vs. limit=15.0 2024-08-18 19:38:40,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4064080.0, ans=0.125 2024-08-18 19:38:45,248 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.271e+01 2.459e+01 2.862e+01 3.681e+01, threshold=4.918e+01, percent-clipped=0.0 2024-08-18 19:38:46,807 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-18 19:39:14,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4064380.0, ans=0.1 2024-08-18 19:39:34,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4064480.0, ans=0.07 2024-08-18 19:39:39,099 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 22 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-18 19:39:40,413 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 5350, loss[loss=0.09306, beats_loss=0.009662, ecapa_loss=0.0001472, whisper_loss=0.08192, over 20285.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01049, ecapa_loss=0.0001424, whisper_loss=0.09069, over 3880944.78 frames. ], batch size: 81, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:39:40,600 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-18 19:39:41,108 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.43 vs. limit=15.0 2024-08-18 19:39:41,902 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-18 19:39:54,026 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4064680.0, ans=0.0 2024-08-18 19:39:56,847 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.06 vs. limit=22.5 2024-08-18 19:40:04,353 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4064680.0, ans=0.0 2024-08-18 19:40:23,556 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-18 19:40:34,140 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4064980.0, ans=0.0 2024-08-18 19:40:44,320 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4065080.0, ans=0.125 2024-08-18 19:40:45,350 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 5400, loss[loss=0.111, beats_loss=0.009018, ecapa_loss=0.000146, whisper_loss=0.1006, over 21700.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0104, ecapa_loss=0.000142, whisper_loss=0.09143, over 3908068.60 frames. ], batch size: 83, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:40:46,714 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 18 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-18 19:40:48,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4065080.0, ans=0.2 2024-08-18 19:40:55,813 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.323e+01 2.486e+01 2.757e+01 7.615e+01, threshold=4.971e+01, percent-clipped=1.0 2024-08-18 19:41:00,081 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4065180.0, ans=0.1 2024-08-18 19:41:04,802 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 17 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-18 19:41:05,368 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.57 vs. limit=22.5 2024-08-18 19:41:23,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4065380.0, ans=0.0 2024-08-18 19:41:24,626 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4065380.0, ans=0.125 2024-08-18 19:41:37,810 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.84 vs. limit=15.0 2024-08-18 19:41:45,226 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4065480.0, ans=0.125 2024-08-18 19:41:50,147 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 5450, loss[loss=0.1043, beats_loss=0.01034, ecapa_loss=0.0001458, whisper_loss=0.09247, over 17494.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01042, ecapa_loss=0.0001426, whisper_loss=0.09107, over 3915242.66 frames. ], batch size: 68, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:42:03,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4065680.0, ans=0.125 2024-08-18 19:42:13,613 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4065680.0, ans=0.0 2024-08-18 19:42:13,952 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.17 vs. limit=15.0 2024-08-18 19:42:27,716 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 12 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-18 19:42:49,050 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.70 vs. limit=15.0 2024-08-18 19:42:54,889 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 5500, loss[loss=0.09661, beats_loss=0.01174, ecapa_loss=0.0001057, whisper_loss=0.08381, over 18992.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01044, ecapa_loss=0.0001426, whisper_loss=0.09043, over 3916042.81 frames. ], batch size: 76, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:42:57,762 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 31 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-18 19:42:57,988 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4066080.0, ans=0.125 2024-08-18 19:43:00,885 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4066080.0, ans=0.125 2024-08-18 19:43:05,262 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.744e+01 2.282e+01 2.535e+01 2.838e+01 1.372e+02, threshold=5.070e+01, percent-clipped=2.0 2024-08-18 19:43:05,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4066080.0, ans=0.0 2024-08-18 19:43:15,103 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4066180.0, ans=0.0 2024-08-18 19:43:28,421 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.62 vs. limit=22.5 2024-08-18 19:43:29,503 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4066280.0, ans=0.125 2024-08-18 19:43:30,962 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4066280.0, ans=0.125 2024-08-18 19:43:40,378 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4066380.0, ans=0.125 2024-08-18 19:43:46,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=4066380.0, ans=0.95 2024-08-18 19:43:56,248 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4066480.0, ans=0.1 2024-08-18 19:44:02,115 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 5550, loss[loss=0.09876, beats_loss=0.009719, ecapa_loss=0.0001425, whisper_loss=0.08762, over 18765.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01051, ecapa_loss=0.0001417, whisper_loss=0.09053, over 3927206.93 frames. ], batch size: 73, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:44:09,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4066580.0, ans=0.0 2024-08-18 19:44:13,855 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-18 19:44:21,782 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4066680.0, ans=0.0 2024-08-18 19:44:28,133 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 14 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-18 19:44:41,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4066780.0, ans=0.1 2024-08-18 19:44:45,771 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4066880.0, ans=0.0 2024-08-18 19:44:59,628 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4066980.0, ans=0.125 2024-08-18 19:45:04,949 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 13 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-18 19:45:10,974 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.44 vs. limit=15.0 2024-08-18 19:45:14,742 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 5600, loss[loss=0.1109, beats_loss=0.01033, ecapa_loss=0.0001641, whisper_loss=0.09889, over 23066.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0106, ecapa_loss=0.0001416, whisper_loss=0.09024, over 3916200.63 frames. ], batch size: 92, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:45:25,304 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4067080.0, ans=0.125 2024-08-18 19:45:25,979 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.465e+01 2.692e+01 2.981e+01 3.503e+02, threshold=5.385e+01, percent-clipped=2.0 2024-08-18 19:45:30,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4067180.0, ans=0.125 2024-08-18 19:45:38,319 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-18 19:45:54,468 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4067280.0, ans=0.0 2024-08-18 19:46:01,435 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4067380.0, ans=0.04949747468305833 2024-08-18 19:46:09,158 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.47 vs. limit=6.0 2024-08-18 19:46:10,407 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.563e-03 2024-08-18 19:46:10,420 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=4067380.0, ans=0.05 2024-08-18 19:46:28,867 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 5650, loss[loss=0.0736, beats_loss=0.01046, ecapa_loss=0.0001663, whisper_loss=0.06148, over 17588.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01065, ecapa_loss=0.0001415, whisper_loss=0.08965, over 3932730.54 frames. ], batch size: 75, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:46:30,979 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4067580.0, ans=0.07 2024-08-18 19:46:40,790 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4067580.0, ans=0.0 2024-08-18 19:46:42,583 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4067680.0, ans=0.2 2024-08-18 19:47:12,655 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 25 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-18 19:47:21,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4067880.0, ans=0.125 2024-08-18 19:47:24,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4067880.0, ans=0.125 2024-08-18 19:47:29,312 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4067980.0, ans=0.0 2024-08-18 19:47:29,631 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.52 vs. limit=15.0 2024-08-18 19:47:38,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4067980.0, ans=0.1 2024-08-18 19:47:45,262 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 5700, loss[loss=0.1416, beats_loss=0.009273, ecapa_loss=0.0001295, whisper_loss=0.1311, over 18253.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01064, ecapa_loss=0.000142, whisper_loss=0.08971, over 3922162.19 frames. ], batch size: 69, lr: 2.19e-03, grad_scale: 1.152921504606847e+18 2024-08-18 19:47:54,258 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-18 19:47:58,013 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.845e+01 2.343e+01 2.551e+01 2.885e+01 3.907e+01, threshold=5.102e+01, percent-clipped=0.0 2024-08-18 19:48:16,701 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 22 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-18 19:48:21,463 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.56 vs. limit=15.0 2024-08-18 19:48:33,829 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-18 19:48:35,303 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 37 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-18 19:48:42,792 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4068380.0, ans=0.0 2024-08-18 19:48:50,532 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.37 vs. limit=22.5 2024-08-18 19:48:53,575 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 23 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-18 19:48:56,368 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-18 19:49:00,668 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 5750, loss[loss=0.09456, beats_loss=0.01225, ecapa_loss=0.0001226, whisper_loss=0.08109, over 18061.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01057, ecapa_loss=0.0001434, whisper_loss=0.08953, over 3890229.90 frames. ], batch size: 72, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:49:16,471 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.35 vs. limit=15.0 2024-08-18 19:50:05,321 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 16 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-18 19:50:10,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4068980.0, ans=0.125 2024-08-18 19:50:14,098 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 18 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-18 19:50:15,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4068980.0, ans=0.125 2024-08-18 19:50:17,181 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 22 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-18 19:50:22,526 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 5800, loss[loss=0.09855, beats_loss=0.01433, ecapa_loss=0.0001326, whisper_loss=0.08289, over 21376.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01061, ecapa_loss=0.0001432, whisper_loss=0.08917, over 3878161.88 frames. ], batch size: 89, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:50:27,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4069080.0, ans=0.0 2024-08-18 19:50:35,679 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.304e+01 2.520e+01 2.839e+01 4.509e+01, threshold=5.039e+01, percent-clipped=0.0 2024-08-18 19:50:43,296 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.85 vs. limit=15.0 2024-08-18 19:50:43,891 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 34 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-18 19:50:48,604 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4069180.0, ans=0.125 2024-08-18 19:51:18,118 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4069380.0, ans=0.0 2024-08-18 19:51:24,076 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.26 vs. limit=22.5 2024-08-18 19:51:37,334 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 5850, loss[loss=0.1113, beats_loss=0.00747, ecapa_loss=0.0001508, whisper_loss=0.1023, over 21967.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01057, ecapa_loss=0.0001428, whisper_loss=0.0899, over 3857311.54 frames. ], batch size: 85, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:51:39,103 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 23 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-18 19:51:45,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4069580.0, ans=0.2 2024-08-18 19:51:51,573 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4069680.0, ans=0.0 2024-08-18 19:51:53,232 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-18 19:51:57,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4069680.0, ans=0.1 2024-08-18 19:52:12,643 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 11 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-18 19:52:21,400 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.02 vs. limit=15.0 2024-08-18 19:52:27,710 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-18 19:52:34,468 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 14 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-18 19:52:34,689 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4069880.0, ans=0.1 2024-08-18 19:52:39,575 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.91 vs. limit=10.0 2024-08-18 19:52:45,829 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-18 19:52:51,747 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 5900, loss[loss=0.09914, beats_loss=0.00972, ecapa_loss=0.0001215, whisper_loss=0.08821, over 15896.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01051, ecapa_loss=0.0001423, whisper_loss=0.08964, over 3865223.23 frames. ], batch size: 59, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:53:00,847 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-18 19:53:03,416 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.303e+01 2.495e+01 2.775e+01 3.811e+01, threshold=4.989e+01, percent-clipped=0.0 2024-08-18 19:53:13,061 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 28 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-18 19:53:26,718 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-18 19:53:37,168 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4070380.0, ans=0.125 2024-08-18 19:53:38,344 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 21 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-18 19:53:56,520 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 21 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-18 19:53:57,669 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-18 19:53:58,793 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 5950, loss[loss=0.09068, beats_loss=0.01087, ecapa_loss=0.0001314, whisper_loss=0.0785, over 22501.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01055, ecapa_loss=0.000143, whisper_loss=0.08929, over 3882002.80 frames. ], batch size: 92, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:54:14,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4070680.0, ans=0.0 2024-08-18 19:54:49,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4070880.0, ans=0.125 2024-08-18 19:54:50,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4070980.0, ans=0.0 2024-08-18 19:54:55,791 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 19 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-18 19:55:01,638 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 15 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-18 19:55:04,512 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 6000, loss[loss=0.1139, beats_loss=0.01139, ecapa_loss=0.0001649, whisper_loss=0.1009, over 22899.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01056, ecapa_loss=0.0001431, whisper_loss=0.08935, over 3852776.95 frames. ], batch size: 93, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:55:04,512 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-18 19:55:42,581 INFO [train_multi_KD3.py:1149] (2/4) Epoch 28, validation on ASR_libri: loss=0.2546, beats_loss=0, ecapa_loss=0.0005279, whisper_loss=0.2493, over 922467.00 frames. 2024-08-18 19:55:59,704 INFO [train_multi_KD3.py:1149] (2/4) Epoch 28, validation on SV_voxceleb1: loss=0.003985, beats_loss=0, ecapa_loss=0.0003985, whisper_loss=0, over 939242.00 frames. 2024-08-18 19:57:44,525 INFO [train_multi_KD3.py:1149] (2/4) Epoch 28, validation on AT_audioset: loss=0.02306, beats_loss=0.02306, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 19:57:44,529 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31584MB 2024-08-18 19:57:47,290 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4071080.0, ans=0.125 2024-08-18 19:57:49,635 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 26 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-18 19:57:56,137 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.336e+01 2.599e+01 2.933e+01 4.741e+01, threshold=5.199e+01, percent-clipped=0.0 2024-08-18 19:58:01,979 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4071180.0, ans=0.125 2024-08-18 19:58:16,045 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2024-08-18 19:58:20,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4071280.0, ans=0.125 2024-08-18 19:58:36,371 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 19 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-18 19:58:46,014 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4071480.0, ans=0.09899494936611666 2024-08-18 19:58:51,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4071580.0, ans=0.125 2024-08-18 19:58:52,589 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 6050, loss[loss=0.08853, beats_loss=0.008541, ecapa_loss=0.0001528, whisper_loss=0.07846, over 15186.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01055, ecapa_loss=0.0001432, whisper_loss=0.08956, over 3831908.60 frames. ], batch size: 59, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:58:55,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4071580.0, ans=0.125 2024-08-18 19:59:01,023 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 19:59:02,407 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4071580.0, ans=0.0 2024-08-18 19:59:22,889 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4071780.0, ans=0.0 2024-08-18 19:59:29,096 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 14 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-18 19:59:34,656 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.53 vs. limit=15.0 2024-08-18 19:59:38,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4071880.0, ans=0.125 2024-08-18 19:59:44,974 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 19:59:52,427 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-18 19:59:53,976 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4071980.0, ans=0.125 2024-08-18 19:59:59,692 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 6100, loss[loss=0.1131, beats_loss=0.008457, ecapa_loss=0.0001621, whisper_loss=0.1031, over 18024.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0105, ecapa_loss=0.0001442, whisper_loss=0.08968, over 3845899.86 frames. ], batch size: 70, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:00:02,535 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-18 20:00:12,351 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.299e+01 2.594e+01 2.933e+01 3.314e+02, threshold=5.188e+01, percent-clipped=1.0 2024-08-18 20:00:14,002 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4072180.0, ans=0.125 2024-08-18 20:00:16,691 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 16 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-18 20:00:20,799 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4072180.0, ans=0.07 2024-08-18 20:00:25,859 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-18 20:00:46,573 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4072380.0, ans=0.0 2024-08-18 20:00:59,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4072480.0, ans=0.1 2024-08-18 20:00:59,765 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.63 vs. limit=15.0 2024-08-18 20:01:01,537 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4072480.0, ans=0.1 2024-08-18 20:01:03,720 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-18 20:01:06,114 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 6150, loss[loss=0.1018, beats_loss=0.009073, ecapa_loss=0.0001781, whisper_loss=0.09091, over 16657.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01047, ecapa_loss=0.0001455, whisper_loss=0.09019, over 3863391.46 frames. ], batch size: 69, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:01:11,512 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-18 20:01:13,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4072580.0, ans=0.2 2024-08-18 20:01:34,573 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 18 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-18 20:01:38,404 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-18 20:01:38,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4072780.0, ans=0.1 2024-08-18 20:01:48,371 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4072880.0, ans=0.0 2024-08-18 20:01:53,476 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-18 20:02:08,691 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4072980.0, ans=0.125 2024-08-18 20:02:13,426 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 6200, loss[loss=0.1192, beats_loss=0.01079, ecapa_loss=0.0001148, whisper_loss=0.1072, over 16376.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01055, ecapa_loss=0.0001444, whisper_loss=0.09055, over 3881011.93 frames. ], batch size: 62, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:02:26,334 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.940e+01 2.303e+01 2.554e+01 2.870e+01 1.661e+02, threshold=5.109e+01, percent-clipped=2.0 2024-08-18 20:02:35,488 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4073180.0, ans=0.125 2024-08-18 20:02:49,949 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.02 vs. limit=15.0 2024-08-18 20:02:54,574 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-18 20:02:58,954 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 35 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-18 20:03:15,198 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 13 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-18 20:03:19,502 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4073480.0, ans=0.125 2024-08-18 20:03:24,500 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 6250, loss[loss=0.09705, beats_loss=0.008143, ecapa_loss=0.0001904, whisper_loss=0.087, over 15490.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01045, ecapa_loss=0.0001447, whisper_loss=0.09041, over 3885241.10 frames. ], batch size: 64, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:03:24,955 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-18 20:03:25,187 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4073580.0, ans=0.125 2024-08-18 20:03:45,238 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4073680.0, ans=0.125 2024-08-18 20:03:59,340 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4073780.0, ans=0.125 2024-08-18 20:04:21,964 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=4073980.0, ans=0.5 2024-08-18 20:04:36,679 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 6300, loss[loss=0.1015, beats_loss=0.009676, ecapa_loss=0.000128, whisper_loss=0.09057, over 16534.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01043, ecapa_loss=0.0001441, whisper_loss=0.09015, over 3832182.54 frames. ], batch size: 64, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:04:49,050 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.796e+01 2.375e+01 2.574e+01 2.890e+01 4.000e+02, threshold=5.149e+01, percent-clipped=1.0 2024-08-18 20:04:51,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4074180.0, ans=0.2 2024-08-18 20:05:09,420 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4074280.0, ans=0.07 2024-08-18 20:05:11,547 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 23 from LS+wenet, 25 from Vox, 47 fro AS 2024-08-18 20:05:19,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4074380.0, ans=0.0 2024-08-18 20:05:24,571 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.29 vs. limit=15.0 2024-08-18 20:05:37,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4074480.0, ans=0.0 2024-08-18 20:05:40,938 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-18 20:05:42,229 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 25 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-18 20:05:45,067 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 6350, loss[loss=0.103, beats_loss=0.01126, ecapa_loss=0.0001278, whisper_loss=0.09044, over 22287.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01053, ecapa_loss=0.0001442, whisper_loss=0.08967, over 3832235.95 frames. ], batch size: 88, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:05:45,197 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 30 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-18 20:05:54,555 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-18 20:05:56,440 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4074580.0, ans=0.125 2024-08-18 20:06:10,771 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-18 20:06:22,990 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=4074780.0, ans=22.5 2024-08-18 20:06:29,981 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 16 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-18 20:06:50,354 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 6400, loss[loss=0.1032, beats_loss=0.009176, ecapa_loss=0.0001412, whisper_loss=0.09259, over 20308.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01052, ecapa_loss=0.0001433, whisper_loss=0.09017, over 3846563.39 frames. ], batch size: 77, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:06:55,797 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-18 20:06:57,007 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 15 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-18 20:07:02,019 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.918e+01 2.365e+01 2.555e+01 2.895e+01 7.791e+01, threshold=5.110e+01, percent-clipped=1.0 2024-08-18 20:07:14,190 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.85 vs. limit=12.0 2024-08-18 20:07:21,484 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 33 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-18 20:07:47,859 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 20 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-18 20:07:54,042 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 6450, loss[loss=0.1035, beats_loss=0.01088, ecapa_loss=0.0001538, whisper_loss=0.09108, over 17277.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01054, ecapa_loss=0.0001434, whisper_loss=0.09023, over 3850215.38 frames. ], batch size: 70, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:08:03,345 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-18 20:08:04,745 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4075580.0, ans=0.0 2024-08-18 20:08:04,848 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4075580.0, ans=0.0 2024-08-18 20:08:21,050 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 32 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-18 20:08:30,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4075780.0, ans=0.2 2024-08-18 20:08:36,508 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2024-08-18 20:08:38,450 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-18 20:08:41,214 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4075880.0, ans=0.125 2024-08-18 20:08:57,336 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 6500, loss[loss=0.1133, beats_loss=0.01053, ecapa_loss=0.0001449, whisper_loss=0.1013, over 16697.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01054, ecapa_loss=0.0001428, whisper_loss=0.09077, over 3870526.88 frames. ], batch size: 66, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:09:08,802 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.695e+01 2.274e+01 2.478e+01 2.663e+01 4.004e+01, threshold=4.956e+01, percent-clipped=0.0 2024-08-18 20:09:10,678 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 22 from LS+wenet, 35 from Vox, 32 fro AS 2024-08-18 20:09:15,605 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-18 20:09:25,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4076280.0, ans=0.04949747468305833 2024-08-18 20:09:57,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4076480.0, ans=0.1 2024-08-18 20:10:01,924 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 6550, loss[loss=0.0884, beats_loss=0.01135, ecapa_loss=0.000126, whisper_loss=0.0758, over 14646.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01047, ecapa_loss=0.0001431, whisper_loss=0.09164, over 3893724.02 frames. ], batch size: 59, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:10:04,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4076580.0, ans=0.125 2024-08-18 20:10:08,964 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4076580.0, ans=0.125 2024-08-18 20:10:11,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4076580.0, ans=0.125 2024-08-18 20:10:15,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4076680.0, ans=0.125 2024-08-18 20:10:26,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4076780.0, ans=0.1 2024-08-18 20:10:46,157 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4076880.0, ans=0.1 2024-08-18 20:11:03,064 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.30 vs. limit=15.0 2024-08-18 20:11:06,265 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 6600, loss[loss=0.12, beats_loss=0.01083, ecapa_loss=0.0001167, whisper_loss=0.108, over 19035.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01055, ecapa_loss=0.0001432, whisper_loss=0.09092, over 3910311.99 frames. ], batch size: 73, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:11:07,190 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.34 vs. limit=10.0 2024-08-18 20:11:10,426 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4077080.0, ans=0.125 2024-08-18 20:11:11,788 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4077080.0, ans=0.125 2024-08-18 20:11:13,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4077080.0, ans=0.2 2024-08-18 20:11:14,049 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 15 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-18 20:11:17,597 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.939e+01 2.459e+01 2.687e+01 3.202e+01 5.546e+01, threshold=5.373e+01, percent-clipped=1.0 2024-08-18 20:11:22,962 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4077180.0, ans=0.1 2024-08-18 20:11:26,579 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 24 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-18 20:11:39,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4077280.0, ans=0.2 2024-08-18 20:12:01,321 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4077480.0, ans=0.125 2024-08-18 20:12:06,748 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4077480.0, ans=0.0 2024-08-18 20:12:10,161 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 6650, loss[loss=0.1324, beats_loss=0.009796, ecapa_loss=0.0001224, whisper_loss=0.1214, over 23682.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01053, ecapa_loss=0.0001441, whisper_loss=0.09068, over 3909987.79 frames. ], batch size: 89, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:12:18,331 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.09 vs. limit=15.0 2024-08-18 20:12:24,832 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4077680.0, ans=0.2 2024-08-18 20:12:29,810 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4077680.0, ans=0.0 2024-08-18 20:12:32,016 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-18 20:12:33,466 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 31 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-18 20:12:33,687 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4077680.0, ans=0.1 2024-08-18 20:12:37,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4077780.0, ans=0.125 2024-08-18 20:12:43,739 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-18 20:12:52,788 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 16 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-18 20:12:58,981 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-18 20:13:14,512 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 6700, loss[loss=0.1085, beats_loss=0.0103, ecapa_loss=0.0001568, whisper_loss=0.09667, over 20334.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01045, ecapa_loss=0.0001443, whisper_loss=0.09134, over 3891645.96 frames. ], batch size: 84, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:13:21,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4078080.0, ans=0.125 2024-08-18 20:13:26,210 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.353e+01 2.592e+01 3.066e+01 1.135e+02, threshold=5.185e+01, percent-clipped=5.0 2024-08-18 20:13:33,237 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4078180.0, ans=0.125 2024-08-18 20:13:49,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4078280.0, ans=0.0 2024-08-18 20:14:11,618 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 23 from LS+wenet, 15 from Vox, 17 fro AS 2024-08-18 20:14:11,846 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4078480.0, ans=10.0 2024-08-18 20:14:17,026 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4078480.0, ans=0.125 2024-08-18 20:14:17,061 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4078480.0, ans=0.0 2024-08-18 20:14:19,050 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 6750, loss[loss=0.0997, beats_loss=0.01047, ecapa_loss=0.0001193, whisper_loss=0.08803, over 19240.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01042, ecapa_loss=0.0001447, whisper_loss=0.0912, over 3888402.49 frames. ], batch size: 75, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:14:24,191 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.09 vs. limit=15.0 2024-08-18 20:14:36,453 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 16 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-18 20:14:47,040 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4078780.0, ans=0.125 2024-08-18 20:14:47,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4078780.0, ans=0.0 2024-08-18 20:15:16,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4078980.0, ans=0.1 2024-08-18 20:15:22,753 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-18 20:15:24,196 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 6800, loss[loss=0.1009, beats_loss=0.01124, ecapa_loss=0.0001617, whisper_loss=0.088, over 21821.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01043, ecapa_loss=0.0001458, whisper_loss=0.09079, over 3902346.51 frames. ], batch size: 91, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:15:33,440 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.92 vs. limit=22.5 2024-08-18 20:15:35,429 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.259e+01 2.465e+01 2.807e+01 3.943e+01, threshold=4.931e+01, percent-clipped=0.0 2024-08-18 20:15:38,704 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.53 vs. limit=15.0 2024-08-18 20:16:09,871 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4079380.0, ans=0.0 2024-08-18 20:16:13,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4079380.0, ans=0.125 2024-08-18 20:16:14,172 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.18 vs. limit=12.0 2024-08-18 20:16:28,601 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 6850, loss[loss=0.1245, beats_loss=0.007491, ecapa_loss=0.0001841, whisper_loss=0.1152, over 22133.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0104, ecapa_loss=0.0001453, whisper_loss=0.09126, over 3899775.49 frames. ], batch size: 88, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:16:37,788 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 11 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-18 20:16:47,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4079680.0, ans=0.2 2024-08-18 20:16:57,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4079780.0, ans=0.0 2024-08-18 20:17:04,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4079780.0, ans=0.1 2024-08-18 20:17:07,093 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.17 vs. limit=15.0 2024-08-18 20:17:07,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4079880.0, ans=0.2 2024-08-18 20:17:19,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4079980.0, ans=0.0 2024-08-18 20:17:26,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4079980.0, ans=0.125 2024-08-18 20:17:34,999 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 6900, loss[loss=0.09495, beats_loss=0.01109, ecapa_loss=0.0001481, whisper_loss=0.08238, over 15229.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01045, ecapa_loss=0.0001449, whisper_loss=0.09136, over 3932331.24 frames. ], batch size: 62, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:17:35,169 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 36 from LS+wenet, 13 from Vox, 45 fro AS 2024-08-18 20:17:36,922 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4080080.0, ans=0.1 2024-08-18 20:17:39,572 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4080080.0, ans=0.0 2024-08-18 20:17:41,836 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-18 20:17:46,852 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.343e+01 2.661e+01 3.031e+01 5.071e+01, threshold=5.321e+01, percent-clipped=1.0 2024-08-18 20:18:13,988 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4080380.0, ans=0.125 2024-08-18 20:18:17,731 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 30 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-18 20:18:19,615 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.07 vs. limit=15.0 2024-08-18 20:18:24,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4080380.0, ans=0.0 2024-08-18 20:18:38,915 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 6950, loss[loss=0.08788, beats_loss=0.01217, ecapa_loss=0.0001214, whisper_loss=0.0745, over 17267.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01043, ecapa_loss=0.0001446, whisper_loss=0.09203, over 3907572.63 frames. ], batch size: 66, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:18:42,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4080580.0, ans=0.1 2024-08-18 20:18:49,402 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-18 20:18:53,405 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4080680.0, ans=0.1 2024-08-18 20:19:01,331 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4080680.0, ans=0.125 2024-08-18 20:19:03,625 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-18 20:19:33,942 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 20 from LS+wenet, 32 from Vox, 38 fro AS 2024-08-18 20:19:40,319 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-18 20:19:43,051 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 7000, loss[loss=0.1067, beats_loss=0.01042, ecapa_loss=0.0001561, whisper_loss=0.09474, over 21870.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01045, ecapa_loss=0.0001455, whisper_loss=0.0915, over 3890297.87 frames. ], batch size: 88, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:19:49,150 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.61 vs. limit=15.0 2024-08-18 20:19:52,332 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-18 20:19:54,718 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.235e+01 2.482e+01 2.791e+01 3.681e+01, threshold=4.964e+01, percent-clipped=0.0 2024-08-18 20:20:04,248 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 21 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-18 20:20:14,945 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.27 vs. limit=15.0 2024-08-18 20:20:15,658 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 19 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-18 20:20:20,623 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-18 20:20:25,448 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.10 vs. limit=15.0 2024-08-18 20:20:25,865 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-18 20:20:43,910 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 15 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-18 20:20:47,846 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 7050, loss[loss=0.08033, beats_loss=0.01332, ecapa_loss=0.0001125, whisper_loss=0.06588, over 22511.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01053, ecapa_loss=0.0001444, whisper_loss=0.09037, over 3848595.43 frames. ], batch size: 90, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:21:00,979 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4081680.0, ans=0.125 2024-08-18 20:21:14,720 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 12 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-18 20:21:24,076 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 20:21:28,014 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-18 20:21:39,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4081980.0, ans=0.2 2024-08-18 20:21:44,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4081980.0, ans=0.0 2024-08-18 20:21:51,177 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 26 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-18 20:21:52,369 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 7100, loss[loss=0.09855, beats_loss=0.01215, ecapa_loss=0.0001361, whisper_loss=0.08504, over 20514.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01056, ecapa_loss=0.0001433, whisper_loss=0.0897, over 3832117.23 frames. ], batch size: 85, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:21:55,142 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 35 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-18 20:22:04,118 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.686e+01 2.313e+01 2.533e+01 2.792e+01 3.997e+01, threshold=5.065e+01, percent-clipped=0.0 2024-08-18 20:22:10,899 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 38 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-18 20:22:18,850 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.134e-01 2024-08-18 20:22:20,226 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.83 vs. limit=22.5 2024-08-18 20:22:22,846 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 20:22:22,892 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4082280.0, ans=0.025 2024-08-18 20:22:40,956 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-18 20:22:42,520 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 14 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-18 20:22:47,181 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.59 vs. limit=12.0 2024-08-18 20:22:57,094 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4082580.0, ans=0.125 2024-08-18 20:22:57,952 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 7150, loss[loss=0.107, beats_loss=0.01009, ecapa_loss=0.0001359, whisper_loss=0.09558, over 21960.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01052, ecapa_loss=0.0001428, whisper_loss=0.08985, over 3846765.74 frames. ], batch size: 88, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:23:13,717 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4082680.0, ans=0.125 2024-08-18 20:23:20,114 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4082680.0, ans=0.125 2024-08-18 20:23:26,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=4082780.0, ans=0.025 2024-08-18 20:23:38,675 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-18 20:23:41,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4082880.0, ans=0.125 2024-08-18 20:23:46,958 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4082880.0, ans=0.125 2024-08-18 20:23:52,116 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.67 vs. limit=22.5 2024-08-18 20:24:03,106 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-08-18 20:24:03,594 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 7200, loss[loss=0.08197, beats_loss=0.0117, ecapa_loss=0.0001456, whisper_loss=0.06881, over 17691.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01056, ecapa_loss=0.000143, whisper_loss=0.08934, over 3855896.06 frames. ], batch size: 74, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:24:11,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4083080.0, ans=0.1 2024-08-18 20:24:11,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4083080.0, ans=0.0 2024-08-18 20:24:12,940 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4083080.0, ans=0.0 2024-08-18 20:24:12,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4083080.0, ans=0.2 2024-08-18 20:24:14,863 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+01 2.258e+01 2.559e+01 2.767e+01 6.438e+01, threshold=5.118e+01, percent-clipped=2.0 2024-08-18 20:24:15,532 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4083180.0, ans=0.1 2024-08-18 20:24:15,736 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.47 vs. limit=15.0 2024-08-18 20:24:22,095 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-18 20:24:34,600 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-18 20:24:36,002 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 14 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-18 20:24:45,828 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.77 vs. limit=22.5 2024-08-18 20:24:54,215 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4083480.0, ans=0.0 2024-08-18 20:25:08,626 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 7250, loss[loss=0.103, beats_loss=0.008941, ecapa_loss=0.0001414, whisper_loss=0.09269, over 20018.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01051, ecapa_loss=0.0001431, whisper_loss=0.08963, over 3868966.57 frames. ], batch size: 78, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:25:12,838 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4083580.0, ans=0.125 2024-08-18 20:25:31,080 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 26 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-18 20:25:32,640 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4083680.0, ans=0.125 2024-08-18 20:25:40,211 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-18 20:25:55,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4083880.0, ans=0.125 2024-08-18 20:26:01,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4083980.0, ans=0.0 2024-08-18 20:26:03,121 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4083980.0, ans=0.5 2024-08-18 20:26:14,866 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.60 vs. limit=15.0 2024-08-18 20:26:16,588 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 7300, loss[loss=0.08255, beats_loss=0.01229, ecapa_loss=0.0001208, whisper_loss=0.06905, over 16118.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01049, ecapa_loss=0.0001427, whisper_loss=0.09033, over 3872985.85 frames. ], batch size: 65, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:26:23,798 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4084080.0, ans=0.1 2024-08-18 20:26:32,303 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.408e+01 2.606e+01 2.923e+01 5.019e+01, threshold=5.211e+01, percent-clipped=0.0 2024-08-18 20:26:56,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4084280.0, ans=0.0 2024-08-18 20:27:03,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4084380.0, ans=0.125 2024-08-18 20:27:26,301 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4084480.0, ans=0.125 2024-08-18 20:27:29,420 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4084480.0, ans=0.0 2024-08-18 20:27:33,130 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 7350, loss[loss=0.1453, beats_loss=0.005871, ecapa_loss=0.0001471, whisper_loss=0.138, over 15489.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01052, ecapa_loss=0.0001423, whisper_loss=0.09045, over 3877292.62 frames. ], batch size: 55, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:27:35,040 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=4084580.0, ans=0.05 2024-08-18 20:27:41,773 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-18 20:27:55,604 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 12 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-18 20:28:21,918 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-18 20:28:26,170 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 28 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-18 20:28:28,222 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-18 20:29:02,050 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-18 20:29:03,676 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.69 vs. limit=15.0 2024-08-18 20:29:06,963 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 7400, loss[loss=0.09325, beats_loss=0.01117, ecapa_loss=0.0001504, whisper_loss=0.08057, over 18222.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01052, ecapa_loss=0.0001424, whisper_loss=0.08996, over 3843360.68 frames. ], batch size: 75, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:29:07,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4085080.0, ans=0.125 2024-08-18 20:29:08,293 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 17 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-18 20:29:25,006 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.347e+01 2.572e+01 2.832e+01 4.744e+01, threshold=5.145e+01, percent-clipped=0.0 2024-08-18 20:29:37,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4085180.0, ans=0.125 2024-08-18 20:30:24,617 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 16 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-18 20:30:33,688 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 16 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-18 20:30:37,916 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4085580.0, ans=0.125 2024-08-18 20:30:38,994 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 7450, loss[loss=0.1207, beats_loss=0.00952, ecapa_loss=0.0001534, whisper_loss=0.1096, over 22441.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0105, ecapa_loss=0.0001443, whisper_loss=0.08978, over 3831500.35 frames. ], batch size: 89, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:30:49,549 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-18 20:31:13,436 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 21 from LS+wenet, 8 from Vox, 29 fro AS 2024-08-18 20:31:15,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4085680.0, ans=0.125 2024-08-18 20:31:33,200 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 9 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-18 20:31:49,329 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 36 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-18 20:31:57,672 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 35 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-18 20:32:15,768 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.84 vs. limit=12.0 2024-08-18 20:32:27,627 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 7500, loss[loss=0.1105, beats_loss=0.008134, ecapa_loss=0.0001339, whisper_loss=0.101, over 19916.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01037, ecapa_loss=0.0001444, whisper_loss=0.09022, over 3844761.31 frames. ], batch size: 78, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:32:29,497 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-18 20:32:34,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4086080.0, ans=0.125 2024-08-18 20:32:47,086 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.294e+01 2.519e+01 2.774e+01 4.079e+01, threshold=5.039e+01, percent-clipped=0.0 2024-08-18 20:33:16,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4086280.0, ans=0.125 2024-08-18 20:33:16,735 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.14 vs. limit=22.5 2024-08-18 20:33:59,421 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-18 20:34:05,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4086480.0, ans=0.0 2024-08-18 20:34:20,277 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4086480.0, ans=0.1 2024-08-18 20:34:23,647 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 15 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-18 20:34:25,849 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 7550, loss[loss=0.08223, beats_loss=0.01275, ecapa_loss=0.0001263, whisper_loss=0.06822, over 15075.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01034, ecapa_loss=0.000145, whisper_loss=0.09078, over 3827625.78 frames. ], batch size: 62, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:34:31,253 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 37 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-18 20:35:06,451 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 18 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-18 20:35:07,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4086780.0, ans=0.125 2024-08-18 20:35:09,388 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 18 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-18 20:35:10,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4086780.0, ans=0.2 2024-08-18 20:35:10,966 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=4086780.0, ans=0.5 2024-08-18 20:35:29,322 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 23 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-18 20:35:45,202 WARNING [optim.py:496] (2/4) Scaling gradients by 0.029811669141054153, model_norm_threshold=50.385860443115234 2024-08-18 20:35:45,368 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.20, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.709e+05, grad_sumsq=5.709e+05, orig_rms_sq=1.000e+00 2024-08-18 20:35:47,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4086980.0, ans=0.0 2024-08-18 20:35:51,139 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 7600, loss[loss=0.1011, beats_loss=0.01119, ecapa_loss=0.0001634, whisper_loss=0.08824, over 21704.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01041, ecapa_loss=0.0001457, whisper_loss=0.09082, over 3838094.31 frames. ], batch size: 87, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:36:04,040 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.298e+01 2.604e+01 3.012e+01 1.690e+03, threshold=5.209e+01, percent-clipped=1.0 2024-08-18 20:36:05,976 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4087180.0, ans=0.2 2024-08-18 20:36:07,400 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4087180.0, ans=0.0 2024-08-18 20:36:09,029 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4087180.0, ans=0.1 2024-08-18 20:36:18,659 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.93 vs. limit=15.0 2024-08-18 20:36:30,409 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-18 20:36:56,871 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4087480.0, ans=0.0 2024-08-18 20:37:04,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4087580.0, ans=0.125 2024-08-18 20:37:05,031 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4087580.0, ans=0.125 2024-08-18 20:37:05,866 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 7650, loss[loss=0.1049, beats_loss=0.01106, ecapa_loss=0.0001456, whisper_loss=0.09234, over 22310.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0104, ecapa_loss=0.0001452, whisper_loss=0.09063, over 3851208.71 frames. ], batch size: 89, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:37:07,485 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 24 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-18 20:37:30,323 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4087680.0, ans=0.0 2024-08-18 20:37:31,418 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-18 20:37:33,005 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-18 20:37:43,100 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4087780.0, ans=0.125 2024-08-18 20:37:56,433 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.264e-03 2024-08-18 20:38:02,345 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 17 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-18 20:38:10,308 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-18 20:38:10,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4087980.0, ans=0.0 2024-08-18 20:38:21,641 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 7700, loss[loss=0.09118, beats_loss=0.0106, ecapa_loss=0.0001399, whisper_loss=0.07918, over 23129.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01041, ecapa_loss=0.0001443, whisper_loss=0.09003, over 3876362.13 frames. ], batch size: 94, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:38:34,622 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.256e+01 2.493e+01 2.776e+01 3.819e+01, threshold=4.986e+01, percent-clipped=0.0 2024-08-18 20:38:54,351 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 18 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-18 20:39:05,078 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4088380.0, ans=0.125 2024-08-18 20:39:18,931 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.88 vs. limit=5.0 2024-08-18 20:39:22,206 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4088480.0, ans=10.0 2024-08-18 20:39:24,732 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 26 from LS+wenet, 30 from Vox, 21 fro AS 2024-08-18 20:39:35,765 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 7750, loss[loss=0.1091, beats_loss=0.01024, ecapa_loss=0.0001281, whisper_loss=0.0976, over 19833.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01036, ecapa_loss=0.0001438, whisper_loss=0.09016, over 3883385.50 frames. ], batch size: 76, lr: 2.19e-03, grad_scale: 1.152921504606847e+18 2024-08-18 20:39:39,102 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-18 20:39:39,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4088580.0, ans=0.125 2024-08-18 20:39:48,717 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4088680.0, ans=0.125 2024-08-18 20:39:50,020 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-18 20:39:59,376 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4088680.0, ans=0.1 2024-08-18 20:40:06,958 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4088780.0, ans=0.125 2024-08-18 20:40:10,056 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-18 20:40:10,394 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4088780.0, ans=0.125 2024-08-18 20:40:17,813 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-18 20:40:18,972 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 18 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-18 20:40:40,386 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-18 20:40:50,124 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 7800, loss[loss=0.1313, beats_loss=0.006546, ecapa_loss=0.0001522, whisper_loss=0.1233, over 23951.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01033, ecapa_loss=0.0001451, whisper_loss=0.09045, over 3898549.90 frames. ], batch size: 93, lr: 2.18e-03, grad_scale: 1.152921504606847e+18 2024-08-18 20:40:50,250 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-18 20:40:53,526 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4089080.0, ans=0.025 2024-08-18 20:40:56,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4089080.0, ans=0.1 2024-08-18 20:40:56,623 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4089080.0, ans=0.2 2024-08-18 20:41:02,867 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.709e+01 2.370e+01 2.620e+01 3.018e+01 4.706e+01, threshold=5.240e+01, percent-clipped=0.0 2024-08-18 20:41:10,457 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4089180.0, ans=0.1 2024-08-18 20:41:18,887 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.460e-02 2024-08-18 20:41:26,715 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 23 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-18 20:41:44,271 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.48 vs. limit=22.5 2024-08-18 20:41:57,012 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4089480.0, ans=0.125 2024-08-18 20:42:04,522 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 7850, loss[loss=0.1101, beats_loss=0.01171, ecapa_loss=0.0001461, whisper_loss=0.09689, over 15483.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01039, ecapa_loss=0.0001446, whisper_loss=0.09057, over 3875027.41 frames. ], batch size: 62, lr: 2.18e-03, grad_scale: 1.152921504606847e+18 2024-08-18 20:42:27,506 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4089680.0, ans=0.125 2024-08-18 20:42:46,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4089880.0, ans=0.0 2024-08-18 20:43:11,985 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-18 20:43:17,757 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 7900, loss[loss=0.1349, beats_loss=0.009294, ecapa_loss=0.0001315, whisper_loss=0.1243, over 23776.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01049, ecapa_loss=0.0001449, whisper_loss=0.09054, over 3875411.64 frames. ], batch size: 90, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:43:17,914 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 24 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-18 20:43:18,700 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.34 vs. limit=22.5 2024-08-18 20:43:23,544 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-18 20:43:23,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4090080.0, ans=0.125 2024-08-18 20:43:26,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4090080.0, ans=0.1 2024-08-18 20:43:28,242 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-18 20:43:28,821 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.46 vs. limit=15.0 2024-08-18 20:43:32,446 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.778e+01 2.368e+01 2.642e+01 2.985e+01 1.655e+02, threshold=5.283e+01, percent-clipped=2.0 2024-08-18 20:43:52,831 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4090280.0, ans=0.125 2024-08-18 20:43:52,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4090280.0, ans=0.0 2024-08-18 20:43:56,980 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-18 20:44:15,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4090480.0, ans=0.125 2024-08-18 20:44:24,406 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4090480.0, ans=0.2 2024-08-18 20:44:29,983 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 7950, loss[loss=0.1046, beats_loss=0.01151, ecapa_loss=0.0001005, whisper_loss=0.09207, over 19242.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0105, ecapa_loss=0.0001443, whisper_loss=0.09127, over 3892249.78 frames. ], batch size: 71, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:44:53,673 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 15 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-18 20:45:11,964 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4090880.0, ans=0.125 2024-08-18 20:45:14,058 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-18 20:45:30,945 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 18 from LS+wenet, 30 from Vox, 43 fro AS 2024-08-18 20:45:39,879 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 19 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-18 20:45:40,979 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 8000, loss[loss=0.1124, beats_loss=0.009305, ecapa_loss=0.0001207, whisper_loss=0.1019, over 15058.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01051, ecapa_loss=0.0001435, whisper_loss=0.09069, over 3878542.61 frames. ], batch size: 55, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:45:56,014 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.375e+01 2.587e+01 2.855e+01 4.354e+01, threshold=5.173e+01, percent-clipped=0.0 2024-08-18 20:46:05,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4091180.0, ans=0.1 2024-08-18 20:46:13,029 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.28 vs. limit=15.0 2024-08-18 20:46:14,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4091280.0, ans=0.2 2024-08-18 20:46:19,791 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 22 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-18 20:46:23,173 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4091380.0, ans=0.125 2024-08-18 20:46:33,759 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 18 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-18 20:46:41,107 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 26 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-18 20:46:42,547 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-18 20:46:51,158 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 13 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-18 20:46:52,647 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 8050, loss[loss=0.07962, beats_loss=0.01144, ecapa_loss=0.000137, whisper_loss=0.06681, over 14698.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01052, ecapa_loss=0.0001434, whisper_loss=0.09066, over 3845107.48 frames. ], batch size: 60, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:46:58,455 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4091580.0, ans=0.1 2024-08-18 20:47:02,321 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-18 20:47:18,420 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-18 20:47:20,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4091780.0, ans=0.2 2024-08-18 20:47:42,218 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4091880.0, ans=0.125 2024-08-18 20:47:48,773 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4091980.0, ans=0.125 2024-08-18 20:47:49,838 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 21 from LS+wenet, 33 from Vox, 36 fro AS 2024-08-18 20:48:00,117 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 8100, loss[loss=0.1105, beats_loss=0.008157, ecapa_loss=0.0001558, whisper_loss=0.1007, over 23208.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01043, ecapa_loss=0.0001435, whisper_loss=0.09104, over 3866114.43 frames. ], batch size: 92, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:48:07,151 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 24 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-18 20:48:11,993 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2024-08-18 20:48:12,579 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 35 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-18 20:48:12,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4092180.0, ans=0.125 2024-08-18 20:48:14,180 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.377e+01 2.582e+01 2.787e+01 4.982e+01, threshold=5.165e+01, percent-clipped=0.0 2024-08-18 20:48:33,610 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4092280.0, ans=0.0 2024-08-18 20:48:38,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4092280.0, ans=0.0 2024-08-18 20:48:42,891 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-18 20:49:06,800 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=19.65 vs. limit=15.0 2024-08-18 20:49:10,731 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 8150, loss[loss=0.08792, beats_loss=0.008971, ecapa_loss=0.0001456, whisper_loss=0.07749, over 13644.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01044, ecapa_loss=0.0001434, whisper_loss=0.08992, over 3862675.95 frames. ], batch size: 53, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:49:21,849 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.69 vs. limit=15.0 2024-08-18 20:49:32,760 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4092680.0, ans=0.0 2024-08-18 20:49:33,878 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-18 20:49:36,290 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 28 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-18 20:49:43,095 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2024-08-18 20:50:03,177 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-18 20:50:07,259 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 25 from LS+wenet, 32 from Vox, 38 fro AS 2024-08-18 20:50:21,990 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2024-08-18 20:50:22,414 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 8200, loss[loss=0.1018, beats_loss=0.01105, ecapa_loss=0.0001409, whisper_loss=0.08935, over 22535.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01043, ecapa_loss=0.0001436, whisper_loss=0.08984, over 3889479.52 frames. ], batch size: 91, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:50:30,382 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 20 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-18 20:50:35,256 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.361e+01 2.593e+01 2.842e+01 4.964e+01, threshold=5.187e+01, percent-clipped=0.0 2024-08-18 20:50:55,256 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4093280.0, ans=0.125 2024-08-18 20:50:59,096 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 37 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-18 20:51:12,150 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-18 20:51:21,717 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-18 20:51:24,520 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4093480.0, ans=0.125 2024-08-18 20:51:25,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4093480.0, ans=0.0 2024-08-18 20:51:28,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4093580.0, ans=0.0 2024-08-18 20:51:29,650 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 8250, loss[loss=0.09975, beats_loss=0.009205, ecapa_loss=0.0001656, whisper_loss=0.08889, over 21941.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01052, ecapa_loss=0.0001446, whisper_loss=0.08975, over 3898883.50 frames. ], batch size: 91, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:51:46,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4093680.0, ans=0.2 2024-08-18 20:51:53,075 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 30 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-18 20:51:57,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4093780.0, ans=0.125 2024-08-18 20:52:08,373 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-18 20:52:16,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4093880.0, ans=0.125 2024-08-18 20:52:27,201 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.34 vs. limit=15.0 2024-08-18 20:52:39,848 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 8300, loss[loss=0.08686, beats_loss=0.01239, ecapa_loss=0.0001028, whisper_loss=0.07343, over 17846.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01051, ecapa_loss=0.0001426, whisper_loss=0.09036, over 3918779.35 frames. ], batch size: 67, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:52:53,831 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.411e+01 2.666e+01 2.982e+01 3.666e+02, threshold=5.332e+01, percent-clipped=2.0 2024-08-18 20:52:58,272 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4094180.0, ans=0.1 2024-08-18 20:53:03,194 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4094180.0, ans=0.0 2024-08-18 20:53:18,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4094280.0, ans=0.125 2024-08-18 20:53:33,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4094480.0, ans=0.2 2024-08-18 20:53:42,076 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-18 20:53:43,909 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4094480.0, ans=0.125 2024-08-18 20:53:48,434 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 8350, loss[loss=0.118, beats_loss=0.01049, ecapa_loss=0.0001218, whisper_loss=0.1063, over 20314.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01057, ecapa_loss=0.0001425, whisper_loss=0.09055, over 3930997.71 frames. ], batch size: 77, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:53:52,478 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-18 20:53:55,255 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4094580.0, ans=0.125 2024-08-18 20:54:03,509 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4094680.0, ans=0.0 2024-08-18 20:54:10,553 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-18 20:54:10,935 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4094680.0, ans=0.0 2024-08-18 20:54:14,529 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 37 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-18 20:54:19,155 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.37 vs. limit=15.0 2024-08-18 20:54:28,213 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 27 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-18 20:54:42,201 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=4.005e+01 2024-08-18 20:54:55,475 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 8400, loss[loss=0.09885, beats_loss=0.01003, ecapa_loss=9.87e-05, whisper_loss=0.08783, over 15443.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01051, ecapa_loss=0.0001438, whisper_loss=0.0909, over 3934717.05 frames. ], batch size: 55, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:55:02,328 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4095080.0, ans=0.0 2024-08-18 20:55:08,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4095180.0, ans=0.125 2024-08-18 20:55:09,026 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.021e+01 2.413e+01 2.574e+01 2.874e+01 4.308e+01, threshold=5.147e+01, percent-clipped=0.0 2024-08-18 20:55:16,263 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4095180.0, ans=0.0 2024-08-18 20:55:29,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4095280.0, ans=0.04949747468305833 2024-08-18 20:55:33,649 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.73 vs. limit=6.0 2024-08-18 20:55:35,979 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4095380.0, ans=0.125 2024-08-18 20:55:53,631 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-18 20:56:05,826 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 8450, loss[loss=0.08961, beats_loss=0.01203, ecapa_loss=0.0001319, whisper_loss=0.07626, over 22520.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0104, ecapa_loss=0.0001445, whisper_loss=0.09149, over 3925992.41 frames. ], batch size: 91, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:56:13,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4095580.0, ans=0.0 2024-08-18 20:56:27,702 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-18 20:56:45,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4095780.0, ans=0.1 2024-08-18 20:56:53,396 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-18 20:57:06,918 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4095980.0, ans=0.0 2024-08-18 20:57:08,875 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.28 vs. limit=22.5 2024-08-18 20:57:16,810 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 8500, loss[loss=0.08947, beats_loss=0.01236, ecapa_loss=0.000139, whisper_loss=0.07572, over 21307.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01035, ecapa_loss=0.0001444, whisper_loss=0.09175, over 3910117.74 frames. ], batch size: 88, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:57:19,654 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4096080.0, ans=0.1 2024-08-18 20:57:34,613 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.19 vs. limit=10.0 2024-08-18 20:57:35,424 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.290e+01 2.484e+01 2.745e+01 4.794e+01, threshold=4.968e+01, percent-clipped=0.0 2024-08-18 20:57:35,914 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4096180.0, ans=0.2 2024-08-18 20:57:51,999 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4096280.0, ans=0.0 2024-08-18 20:57:54,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4096280.0, ans=0.125 2024-08-18 20:58:06,876 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.11 vs. limit=15.0 2024-08-18 20:58:22,132 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4096480.0, ans=0.125 2024-08-18 20:58:32,054 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.80 vs. limit=22.5 2024-08-18 20:58:33,927 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 8550, loss[loss=0.0994, beats_loss=0.009633, ecapa_loss=0.0001533, whisper_loss=0.08823, over 14631.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01032, ecapa_loss=0.0001431, whisper_loss=0.09146, over 3879828.91 frames. ], batch size: 58, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:58:46,826 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 22 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-18 20:59:36,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4096980.0, ans=0.125 2024-08-18 20:59:37,686 WARNING [optim.py:496] (2/4) Scaling gradients by 0.05524347350001335, model_norm_threshold=49.67615509033203 2024-08-18 20:59:37,856 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.096e+05, grad_sumsq=1.096e+05, orig_rms_sq=1.000e+00 2024-08-18 20:59:38,406 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4096980.0, ans=0.125 2024-08-18 20:59:39,730 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 23 from LS+wenet, 10 from Vox, 22 fro AS 2024-08-18 20:59:42,799 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.520e-01 2024-08-18 20:59:44,094 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4096980.0, ans=0.125 2024-08-18 20:59:44,171 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4096980.0, ans=0.125 2024-08-18 20:59:45,634 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4096980.0, ans=0.0 2024-08-18 20:59:47,829 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 8600, loss[loss=0.08858, beats_loss=0.01428, ecapa_loss=0.0001823, whisper_loss=0.07248, over 20020.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01033, ecapa_loss=0.0001432, whisper_loss=0.0912, over 3876108.57 frames. ], batch size: 90, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:59:54,348 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-18 21:00:02,266 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.313e+01 2.617e+01 3.011e+01 8.992e+02, threshold=5.234e+01, percent-clipped=3.0 2024-08-18 21:00:02,412 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 19 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-18 21:00:04,941 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-18 21:00:05,491 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.94 vs. limit=15.0 2024-08-18 21:00:32,146 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 18 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-18 21:00:41,225 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4097380.0, ans=0.125 2024-08-18 21:00:55,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4097480.0, ans=0.125 2024-08-18 21:00:56,771 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4097580.0, ans=0.1 2024-08-18 21:00:57,472 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 8650, loss[loss=0.1248, beats_loss=0.008248, ecapa_loss=0.0001746, whisper_loss=0.1148, over 22670.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01032, ecapa_loss=0.0001437, whisper_loss=0.09154, over 3857389.78 frames. ], batch size: 88, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:00:59,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4097580.0, ans=0.2 2024-08-18 21:00:59,693 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4097580.0, ans=0.0 2024-08-18 21:01:03,483 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.85 vs. limit=15.0 2024-08-18 21:01:09,180 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.59 vs. limit=15.0 2024-08-18 21:01:38,826 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 21:01:41,070 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 27 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-18 21:02:12,060 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 8700, loss[loss=0.09742, beats_loss=0.01137, ecapa_loss=0.0001638, whisper_loss=0.08441, over 21763.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01038, ecapa_loss=0.0001449, whisper_loss=0.09117, over 3857953.04 frames. ], batch size: 93, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:02:18,036 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.42 vs. limit=15.0 2024-08-18 21:02:22,984 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4098080.0, ans=0.0 2024-08-18 21:02:27,006 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.217e+01 2.441e+01 2.789e+01 4.170e+01, threshold=4.882e+01, percent-clipped=0.0 2024-08-18 21:02:36,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4098180.0, ans=0.1 2024-08-18 21:02:43,116 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-18 21:03:03,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4098380.0, ans=0.0 2024-08-18 21:03:09,025 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4098480.0, ans=0.0 2024-08-18 21:03:11,317 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.28 vs. limit=15.0 2024-08-18 21:03:16,281 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 20 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-18 21:03:20,506 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4098480.0, ans=0.09899494936611666 2024-08-18 21:03:24,116 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 8750, loss[loss=0.09874, beats_loss=0.01216, ecapa_loss=0.0001266, whisper_loss=0.08531, over 18561.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01036, ecapa_loss=0.0001463, whisper_loss=0.09119, over 3824308.49 frames. ], batch size: 75, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:03:39,433 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-18 21:03:46,763 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.05 vs. limit=12.0 2024-08-18 21:03:57,899 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 23 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-18 21:04:01,522 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4098780.0, ans=0.125 2024-08-18 21:04:13,541 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 23 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-18 21:04:19,228 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4098880.0, ans=0.1 2024-08-18 21:04:34,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4098980.0, ans=0.0 2024-08-18 21:04:37,574 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 17 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-18 21:04:41,095 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 8800, loss[loss=0.08813, beats_loss=0.01132, ecapa_loss=0.0001539, whisper_loss=0.07528, over 19050.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01054, ecapa_loss=0.0001451, whisper_loss=0.0896, over 3832246.09 frames. ], batch size: 78, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:04:42,538 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-18 21:04:54,934 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 26 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-18 21:04:56,559 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.312e+01 2.589e+01 2.893e+01 4.195e+01, threshold=5.178e+01, percent-clipped=0.0 2024-08-18 21:05:01,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4099180.0, ans=0.125 2024-08-18 21:05:04,134 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.23 vs. limit=6.0 2024-08-18 21:05:09,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4099180.0, ans=0.07 2024-08-18 21:05:58,907 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 8850, loss[loss=0.08121, beats_loss=0.01356, ecapa_loss=0.0001526, whisper_loss=0.06612, over 20612.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01057, ecapa_loss=0.0001443, whisper_loss=0.08951, over 3844420.92 frames. ], batch size: 89, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:06:15,341 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-18 21:06:15,861 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.83 vs. limit=15.0 2024-08-18 21:06:22,967 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4099680.0, ans=0.125 2024-08-18 21:06:41,532 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4099780.0, ans=0.0 2024-08-18 21:06:53,176 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.25 vs. limit=22.5 2024-08-18 21:06:55,409 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4099880.0, ans=0.125 2024-08-18 21:07:01,504 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.28 vs. limit=15.0 2024-08-18 21:07:10,502 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4099980.0, ans=0.0 2024-08-18 21:07:16,734 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 8900, loss[loss=0.08769, beats_loss=0.01191, ecapa_loss=0.0001526, whisper_loss=0.07425, over 16675.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0106, ecapa_loss=0.0001434, whisper_loss=0.08958, over 3838925.54 frames. ], batch size: 69, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:07:30,457 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-18 21:07:33,459 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.732e+01 2.315e+01 2.485e+01 2.808e+01 3.547e+01, threshold=4.971e+01, percent-clipped=0.0 2024-08-18 21:07:55,938 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4100280.0, ans=0.125 2024-08-18 21:08:10,609 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4100380.0, ans=0.0 2024-08-18 21:08:12,101 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 38 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-18 21:08:17,998 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4100380.0, ans=0.1 2024-08-18 21:08:22,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4100480.0, ans=0.07 2024-08-18 21:08:25,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4100480.0, ans=0.0 2024-08-18 21:08:30,216 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 14 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-18 21:08:38,880 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 8950, loss[loss=0.1001, beats_loss=0.01187, ecapa_loss=8.44e-05, whisper_loss=0.08738, over 16700.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01064, ecapa_loss=0.0001429, whisper_loss=0.08935, over 3822133.51 frames. ], batch size: 58, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:08:39,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4100580.0, ans=0.125 2024-08-18 21:08:40,459 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-18 21:08:42,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4100580.0, ans=0.125 2024-08-18 21:08:49,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4100580.0, ans=0.125 2024-08-18 21:08:54,949 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.26 vs. limit=15.0 2024-08-18 21:08:56,227 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-18 21:08:58,421 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-18 21:09:16,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4100780.0, ans=0.125 2024-08-18 21:09:17,567 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-18 21:09:29,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4100880.0, ans=0.125 2024-08-18 21:09:51,917 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 9000, loss[loss=0.08967, beats_loss=0.01052, ecapa_loss=0.0001489, whisper_loss=0.07766, over 22487.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01058, ecapa_loss=0.0001433, whisper_loss=0.09002, over 3850929.06 frames. ], batch size: 88, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:09:51,917 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-18 21:10:17,015 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.1008, 4.1320, 4.8882, 4.7382], device='cuda:2') 2024-08-18 21:10:26,958 INFO [train_multi_KD3.py:1149] (2/4) Epoch 28, validation on ASR_libri: loss=0.2539, beats_loss=0, ecapa_loss=0.0005164, whisper_loss=0.2487, over 922467.00 frames. 2024-08-18 21:10:44,132 INFO [train_multi_KD3.py:1149] (2/4) Epoch 28, validation on SV_voxceleb1: loss=0.004068, beats_loss=0, ecapa_loss=0.0004068, whisper_loss=0, over 939242.00 frames. 2024-08-18 21:12:26,398 INFO [train_multi_KD3.py:1149] (2/4) Epoch 28, validation on AT_audioset: loss=0.0231, beats_loss=0.0231, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 21:12:26,408 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31584MB 2024-08-18 21:12:35,564 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-18 21:12:40,842 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.331e+01 2.657e+01 3.077e+01 4.248e+01, threshold=5.315e+01, percent-clipped=0.0 2024-08-18 21:12:45,965 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4101180.0, ans=0.125 2024-08-18 21:12:50,150 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 35 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-18 21:12:59,505 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.74 vs. limit=12.0 2024-08-18 21:13:04,950 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 27 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-18 21:13:30,698 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4101480.0, ans=0.125 2024-08-18 21:13:38,883 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 9050, loss[loss=0.1151, beats_loss=0.01026, ecapa_loss=0.0001618, whisper_loss=0.1033, over 22925.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01054, ecapa_loss=0.0001432, whisper_loss=0.09016, over 3855002.50 frames. ], batch size: 94, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:13:48,743 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 18 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-18 21:13:57,240 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-18 21:14:03,900 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-18 21:14:09,654 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4101780.0, ans=0.0 2024-08-18 21:14:27,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=4101880.0, ans=10.0 2024-08-18 21:14:35,064 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-18 21:14:40,858 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4101980.0, ans=0.125 2024-08-18 21:14:45,215 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-18 21:14:52,448 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 9100, loss[loss=0.103, beats_loss=0.009281, ecapa_loss=0.0001581, whisper_loss=0.09217, over 21980.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0105, ecapa_loss=0.0001439, whisper_loss=0.09038, over 3859527.55 frames. ], batch size: 91, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:14:56,244 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4102080.0, ans=0.125 2024-08-18 21:15:04,209 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 29 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-18 21:15:06,452 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.769e+01 2.448e+01 2.687e+01 2.999e+01 3.130e+02, threshold=5.374e+01, percent-clipped=2.0 2024-08-18 21:15:08,313 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-18 21:15:21,019 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4102280.0, ans=0.125 2024-08-18 21:15:37,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4102380.0, ans=0.0 2024-08-18 21:15:48,081 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4102480.0, ans=0.125 2024-08-18 21:15:59,234 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-18 21:16:02,564 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4102480.0, ans=0.07 2024-08-18 21:16:05,381 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 9150, loss[loss=0.1118, beats_loss=0.00942, ecapa_loss=0.000136, whisper_loss=0.101, over 22997.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0105, ecapa_loss=0.0001444, whisper_loss=0.09028, over 3885668.75 frames. ], batch size: 89, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:16:11,382 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-18 21:16:16,692 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-18 21:16:25,558 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 16 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-18 21:16:28,934 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.25 vs. limit=10.0 2024-08-18 21:16:47,120 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4102880.0, ans=0.125 2024-08-18 21:16:54,154 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 21:16:55,100 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-18 21:16:58,401 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.58 vs. limit=10.0 2024-08-18 21:17:03,192 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 28 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-18 21:17:12,938 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.48 vs. limit=15.0 2024-08-18 21:17:15,280 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 9200, loss[loss=0.08613, beats_loss=0.01135, ecapa_loss=0.0001827, whisper_loss=0.07295, over 20466.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01054, ecapa_loss=0.0001444, whisper_loss=0.08997, over 3886104.88 frames. ], batch size: 93, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:17:15,790 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4103080.0, ans=0.125 2024-08-18 21:17:18,771 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4103080.0, ans=0.0 2024-08-18 21:17:20,546 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.67 vs. limit=15.0 2024-08-18 21:17:26,869 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 17 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-18 21:17:29,684 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.018e+01 2.309e+01 2.617e+01 2.863e+01 2.175e+02, threshold=5.234e+01, percent-clipped=2.0 2024-08-18 21:17:38,262 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.62 vs. limit=15.0 2024-08-18 21:17:50,459 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-18 21:18:00,024 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-18 21:18:07,478 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-18 21:18:08,104 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.18 vs. limit=15.0 2024-08-18 21:18:26,827 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 9250, loss[loss=0.1103, beats_loss=0.01147, ecapa_loss=0.0001061, whisper_loss=0.0978, over 21208.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01051, ecapa_loss=0.0001441, whisper_loss=0.08935, over 3896900.98 frames. ], batch size: 80, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:18:31,224 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4103580.0, ans=0.125 2024-08-18 21:18:35,472 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4103580.0, ans=0.125 2024-08-18 21:18:38,122 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4103580.0, ans=0.125 2024-08-18 21:18:42,798 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-18 21:18:52,396 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.31 vs. limit=10.0 2024-08-18 21:18:53,340 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=4103680.0, ans=0.5 2024-08-18 21:18:59,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4103780.0, ans=0.0 2024-08-18 21:19:08,641 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 35 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-18 21:19:16,545 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-18 21:19:30,450 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-18 21:19:39,021 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 21 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-18 21:19:40,552 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 9300, loss[loss=0.08383, beats_loss=0.01302, ecapa_loss=0.0001608, whisper_loss=0.0692, over 19903.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01052, ecapa_loss=0.0001444, whisper_loss=0.0897, over 3918544.40 frames. ], batch size: 86, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:19:47,388 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 16 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-18 21:19:50,288 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4104080.0, ans=10.0 2024-08-18 21:19:53,440 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.50 vs. limit=15.0 2024-08-18 21:19:53,833 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.381e+01 2.641e+01 3.038e+01 1.790e+02, threshold=5.283e+01, percent-clipped=1.0 2024-08-18 21:20:04,492 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4104180.0, ans=0.125 2024-08-18 21:20:13,973 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 24 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-18 21:20:21,407 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4104380.0, ans=0.125 2024-08-18 21:20:51,659 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 9350, loss[loss=0.07831, beats_loss=0.01254, ecapa_loss=0.0001132, whisper_loss=0.06464, over 21631.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01055, ecapa_loss=0.0001441, whisper_loss=0.08923, over 3892484.55 frames. ], batch size: 88, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:20:55,505 WARNING [optim.py:496] (2/4) Scaling gradients by 0.06869103014469147, model_norm_threshold=52.8264274597168 2024-08-18 21:20:55,668 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.21, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.260e+05, grad_sumsq=1.216e+07, orig_rms_sq=1.036e-02 2024-08-18 21:21:04,102 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 23 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-18 21:21:04,931 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.77 vs. limit=15.0 2024-08-18 21:21:11,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4104680.0, ans=0.125 2024-08-18 21:21:17,746 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.76 vs. limit=22.5 2024-08-18 21:21:25,054 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.74 vs. limit=15.0 2024-08-18 21:21:28,640 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4104780.0, ans=0.125 2024-08-18 21:21:29,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4104780.0, ans=0.125 2024-08-18 21:21:32,743 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=4104880.0, ans=0.95 2024-08-18 21:21:33,630 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 26 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-18 21:21:34,009 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4104880.0, ans=0.125 2024-08-18 21:21:36,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4104880.0, ans=0.125 2024-08-18 21:22:01,317 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 9400, loss[loss=0.1296, beats_loss=0.00834, ecapa_loss=0.0001389, whisper_loss=0.1199, over 20998.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01057, ecapa_loss=0.0001442, whisper_loss=0.08909, over 3903692.41 frames. ], batch size: 79, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:22:05,025 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4105080.0, ans=0.125 2024-08-18 21:22:08,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4105080.0, ans=0.1 2024-08-18 21:22:11,366 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 20 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-18 21:22:13,171 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4105080.0, ans=0.2 2024-08-18 21:22:17,195 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.616e+01 2.374e+01 2.623e+01 3.026e+01 7.690e+02, threshold=5.245e+01, percent-clipped=3.0 2024-08-18 21:22:19,163 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4105180.0, ans=0.2 2024-08-18 21:22:20,848 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.80 vs. limit=6.0 2024-08-18 21:22:38,057 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4105280.0, ans=0.125 2024-08-18 21:22:43,676 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.40 vs. limit=12.0 2024-08-18 21:22:48,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4105380.0, ans=0.0 2024-08-18 21:22:57,109 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4105480.0, ans=0.2 2024-08-18 21:23:08,960 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-18 21:23:12,774 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 9450, loss[loss=0.1117, beats_loss=0.01061, ecapa_loss=0.0001435, whisper_loss=0.09971, over 22584.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01059, ecapa_loss=0.0001449, whisper_loss=0.08902, over 3877653.60 frames. ], batch size: 92, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:23:41,270 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 22 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-18 21:23:48,276 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.96 vs. limit=6.0 2024-08-18 21:24:01,916 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 23 from LS+wenet, 32 from Vox, 39 fro AS 2024-08-18 21:24:19,824 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-18 21:24:25,740 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 9500, loss[loss=0.1172, beats_loss=0.008158, ecapa_loss=0.0001387, whisper_loss=0.1076, over 18627.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01064, ecapa_loss=0.0001443, whisper_loss=0.08846, over 3907202.87 frames. ], batch size: 72, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:24:29,300 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-18 21:24:42,301 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.355e+01 2.580e+01 2.944e+01 6.232e+01, threshold=5.161e+01, percent-clipped=0.0 2024-08-18 21:24:53,046 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.54 vs. limit=15.0 2024-08-18 21:24:53,810 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 19 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-18 21:24:56,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4106280.0, ans=0.125 2024-08-18 21:24:59,035 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.76 vs. limit=22.5 2024-08-18 21:25:00,121 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4106280.0, ans=0.125 2024-08-18 21:25:06,657 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.35 vs. limit=15.0 2024-08-18 21:25:09,333 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.93 vs. limit=15.0 2024-08-18 21:25:14,621 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-18 21:25:14,966 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4106380.0, ans=0.0 2024-08-18 21:25:25,009 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4106480.0, ans=0.0 2024-08-18 21:25:28,559 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.50 vs. limit=12.0 2024-08-18 21:25:36,886 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.34 vs. limit=22.5 2024-08-18 21:25:40,310 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 9550, loss[loss=0.1112, beats_loss=0.009386, ecapa_loss=0.0001693, whisper_loss=0.1001, over 21665.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01057, ecapa_loss=0.0001452, whisper_loss=0.08911, over 3904004.77 frames. ], batch size: 91, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:25:59,421 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-18 21:26:02,340 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4106680.0, ans=0.125 2024-08-18 21:26:15,972 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.29 vs. limit=15.0 2024-08-18 21:26:25,543 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4106880.0, ans=0.09899494936611666 2024-08-18 21:26:36,411 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4106880.0, ans=0.0 2024-08-18 21:26:41,722 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 14 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-18 21:26:41,973 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4106980.0, ans=0.125 2024-08-18 21:26:53,579 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 9600, loss[loss=0.1032, beats_loss=0.009409, ecapa_loss=0.0001303, whisper_loss=0.09248, over 16202.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01057, ecapa_loss=0.0001438, whisper_loss=0.08844, over 3892373.15 frames. ], batch size: 63, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:27:05,259 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4107080.0, ans=0.125 2024-08-18 21:27:07,901 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.928e+01 2.422e+01 2.704e+01 3.075e+01 1.101e+02, threshold=5.409e+01, percent-clipped=2.0 2024-08-18 21:27:08,077 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 27 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-18 21:27:11,077 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4107180.0, ans=0.0 2024-08-18 21:27:24,045 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=6.394e+00 2024-08-18 21:27:25,140 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 18 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-18 21:27:28,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=4107280.0, ans=0.5 2024-08-18 21:27:53,226 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.29 vs. limit=15.0 2024-08-18 21:28:06,496 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 9650, loss[loss=0.1063, beats_loss=0.01063, ecapa_loss=0.0001451, whisper_loss=0.09424, over 22295.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01055, ecapa_loss=0.000144, whisper_loss=0.0887, over 3871033.42 frames. ], batch size: 90, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:28:23,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4107680.0, ans=0.0 2024-08-18 21:28:34,122 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 27 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-18 21:28:35,849 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4107780.0, ans=0.0 2024-08-18 21:28:37,060 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4107780.0, ans=0.0 2024-08-18 21:28:42,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4107780.0, ans=10.0 2024-08-18 21:28:51,785 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.605e+00 2024-08-18 21:28:54,186 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4107880.0, ans=0.2 2024-08-18 21:29:01,653 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-18 21:29:13,822 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 9700, loss[loss=0.1241, beats_loss=0.008176, ecapa_loss=0.0001604, whisper_loss=0.1144, over 15560.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01059, ecapa_loss=0.0001438, whisper_loss=0.08862, over 3870405.04 frames. ], batch size: 61, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:29:21,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4108080.0, ans=0.1 2024-08-18 21:29:26,338 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 28 from LS+wenet, 14 from Vox, 43 fro AS 2024-08-18 21:29:26,838 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 21:29:27,631 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.335e+01 2.610e+01 2.872e+01 4.548e+01, threshold=5.221e+01, percent-clipped=0.0 2024-08-18 21:29:42,375 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4108280.0, ans=0.04949747468305833 2024-08-18 21:29:48,967 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4108280.0, ans=0.0 2024-08-18 21:30:03,589 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.98 vs. limit=6.0 2024-08-18 21:30:10,925 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-18 21:30:22,458 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 9750, loss[loss=0.103, beats_loss=0.01158, ecapa_loss=0.0001425, whisper_loss=0.09, over 22053.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01062, ecapa_loss=0.0001431, whisper_loss=0.08807, over 3839536.36 frames. ], batch size: 89, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:30:29,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4108580.0, ans=0.1 2024-08-18 21:30:38,409 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-18 21:30:41,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4108680.0, ans=0.0 2024-08-18 21:30:41,276 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4108680.0, ans=0.0 2024-08-18 21:30:51,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4108780.0, ans=0.1 2024-08-18 21:30:54,077 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 14 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-18 21:30:59,559 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4108780.0, ans=0.125 2024-08-18 21:31:16,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4108980.0, ans=0.125 2024-08-18 21:31:19,586 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4108980.0, ans=0.125 2024-08-18 21:31:30,471 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 9800, loss[loss=0.115, beats_loss=0.007603, ecapa_loss=0.0001352, whisper_loss=0.1061, over 16626.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01059, ecapa_loss=0.0001437, whisper_loss=0.08871, over 3820848.11 frames. ], batch size: 62, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:31:43,473 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.918e+01 2.279e+01 2.411e+01 2.686e+01 7.087e+01, threshold=4.821e+01, percent-clipped=1.0 2024-08-18 21:31:57,543 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 17 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-18 21:32:32,124 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-18 21:32:36,149 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 9850, loss[loss=0.1057, beats_loss=0.007426, ecapa_loss=0.0001579, whisper_loss=0.09666, over 21326.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01052, ecapa_loss=0.0001442, whisper_loss=0.08962, over 3823885.06 frames. ], batch size: 85, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:32:44,624 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 23 from LS+wenet, 35 from Vox, 33 fro AS 2024-08-18 21:32:48,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4109580.0, ans=0.1 2024-08-18 21:32:49,250 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4109680.0, ans=0.1 2024-08-18 21:32:56,262 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4109680.0, ans=0.125 2024-08-18 21:33:06,468 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4109780.0, ans=0.125 2024-08-18 21:33:12,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4109780.0, ans=0.1 2024-08-18 21:33:21,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4109880.0, ans=0.0 2024-08-18 21:33:31,385 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4109880.0, ans=0.2 2024-08-18 21:33:39,899 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4109980.0, ans=0.2 2024-08-18 21:33:48,362 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 9900, loss[loss=0.1189, beats_loss=0.009612, ecapa_loss=0.0001112, whisper_loss=0.1082, over 19804.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01053, ecapa_loss=0.0001434, whisper_loss=0.0898, over 3850827.33 frames. ], batch size: 75, lr: 2.18e-03, grad_scale: 1.152921504606847e+18 2024-08-18 21:33:53,897 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-18 21:33:55,275 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-18 21:33:59,837 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4110080.0, ans=0.125 2024-08-18 21:34:02,276 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.807e+01 2.316e+01 2.554e+01 2.819e+01 9.199e+01, threshold=5.108e+01, percent-clipped=2.0 2024-08-18 21:34:05,193 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4110180.0, ans=0.2 2024-08-18 21:34:08,386 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 20 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-18 21:34:17,607 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-18 21:34:21,903 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 36 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-18 21:34:25,429 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2024-08-18 21:34:31,502 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-18 21:34:36,004 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.96 vs. limit=15.0 2024-08-18 21:34:46,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4110480.0, ans=0.125 2024-08-18 21:34:54,685 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4110480.0, ans=0.04949747468305833 2024-08-18 21:34:58,376 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4110580.0, ans=0.2 2024-08-18 21:34:59,328 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 9950, loss[loss=0.09214, beats_loss=0.01205, ecapa_loss=0.0001506, whisper_loss=0.07858, over 22191.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01064, ecapa_loss=0.0001434, whisper_loss=0.08853, over 3850833.71 frames. ], batch size: 92, lr: 2.18e-03, grad_scale: 1.152921504606847e+18 2024-08-18 21:35:22,191 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.96 vs. limit=22.5 2024-08-18 21:35:30,039 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=4110780.0, ans=15.0 2024-08-18 21:35:35,826 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4110780.0, ans=0.125 2024-08-18 21:35:41,559 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4110880.0, ans=0.2 2024-08-18 21:35:55,711 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 15 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-18 21:36:02,573 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4110980.0, ans=0.125 2024-08-18 21:36:06,356 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=4111080.0, ans=0.5 2024-08-18 21:36:07,218 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 10000, loss[loss=0.11, beats_loss=0.01048, ecapa_loss=0.0001534, whisper_loss=0.09796, over 16135.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01062, ecapa_loss=0.0001434, whisper_loss=0.08952, over 3880867.98 frames. ], batch size: 67, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:36:21,635 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.96 vs. limit=15.0 2024-08-18 21:36:23,694 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.836e+01 2.302e+01 2.583e+01 2.911e+01 1.277e+02, threshold=5.165e+01, percent-clipped=1.0 2024-08-18 21:36:27,693 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 38 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-18 21:36:29,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4111180.0, ans=0.125 2024-08-18 21:36:40,775 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4111280.0, ans=0.125 2024-08-18 21:36:46,367 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-18 21:36:50,306 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4111380.0, ans=0.07 2024-08-18 21:36:53,361 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4111380.0, ans=0.0 2024-08-18 21:37:12,516 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.06 vs. limit=22.5 2024-08-18 21:37:20,929 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.22 vs. limit=12.0 2024-08-18 21:37:21,242 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 10050, loss[loss=0.1086, beats_loss=0.01006, ecapa_loss=0.000159, whisper_loss=0.09693, over 23114.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01053, ecapa_loss=0.0001431, whisper_loss=0.08993, over 3866569.16 frames. ], batch size: 92, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:37:26,781 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.42 vs. limit=12.0 2024-08-18 21:37:41,517 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-18 21:37:48,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4111780.0, ans=0.07 2024-08-18 21:37:48,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4111780.0, ans=0.125 2024-08-18 21:37:56,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4111780.0, ans=0.05 2024-08-18 21:38:00,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4111780.0, ans=0.125 2024-08-18 21:38:06,368 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4111880.0, ans=0.1 2024-08-18 21:38:08,719 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 19 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-18 21:38:26,138 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4111980.0, ans=0.125 2024-08-18 21:38:30,890 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 10100, loss[loss=0.08208, beats_loss=0.009879, ecapa_loss=0.0001413, whisper_loss=0.07079, over 17735.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01056, ecapa_loss=0.0001438, whisper_loss=0.08962, over 3883122.56 frames. ], batch size: 71, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:38:46,460 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+01 2.328e+01 2.605e+01 3.008e+01 2.431e+02, threshold=5.209e+01, percent-clipped=1.0 2024-08-18 21:38:46,649 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 24 from LS+wenet, 13 from Vox, 19 fro AS 2024-08-18 21:38:48,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=4112180.0, ans=0.5 2024-08-18 21:38:52,400 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4112180.0, ans=0.125 2024-08-18 21:39:02,621 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4112280.0, ans=0.125 2024-08-18 21:39:15,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4112380.0, ans=0.1 2024-08-18 21:39:15,448 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.32 vs. limit=15.0 2024-08-18 21:39:34,639 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4112480.0, ans=0.0 2024-08-18 21:39:36,998 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 10150, loss[loss=0.1202, beats_loss=0.009497, ecapa_loss=0.0001548, whisper_loss=0.1092, over 21938.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01061, ecapa_loss=0.0001436, whisper_loss=0.08934, over 3913644.55 frames. ], batch size: 85, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:39:40,666 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4112580.0, ans=0.2 2024-08-18 21:40:00,744 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.68 vs. limit=6.0 2024-08-18 21:40:02,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4112680.0, ans=0.125 2024-08-18 21:40:02,531 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.76 vs. limit=15.0 2024-08-18 21:40:19,939 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4112880.0, ans=0.0 2024-08-18 21:40:27,534 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4112880.0, ans=0.0 2024-08-18 21:40:30,090 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4112980.0, ans=0.0 2024-08-18 21:40:39,344 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.42 vs. limit=22.5 2024-08-18 21:40:42,575 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 10200, loss[loss=0.1087, beats_loss=0.009638, ecapa_loss=0.0001594, whisper_loss=0.09751, over 22781.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01055, ecapa_loss=0.000145, whisper_loss=0.08912, over 3888303.62 frames. ], batch size: 91, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:40:57,268 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.883e+01 2.349e+01 2.556e+01 2.919e+01 5.340e+01, threshold=5.112e+01, percent-clipped=2.0 2024-08-18 21:40:58,986 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4113180.0, ans=0.1 2024-08-18 21:40:59,430 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.59 vs. limit=10.0 2024-08-18 21:41:00,331 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4113180.0, ans=0.0 2024-08-18 21:41:00,388 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4113180.0, ans=0.0 2024-08-18 21:41:03,148 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 34 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-18 21:41:08,271 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-18 21:41:29,789 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4113380.0, ans=0.125 2024-08-18 21:41:30,946 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 13 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-18 21:41:31,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4113380.0, ans=0.125 2024-08-18 21:41:35,196 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4113480.0, ans=0.1 2024-08-18 21:41:45,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4113480.0, ans=0.125 2024-08-18 21:41:49,162 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 10250, loss[loss=0.07481, beats_loss=0.01077, ecapa_loss=0.0001442, whisper_loss=0.0626, over 15355.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01059, ecapa_loss=0.0001442, whisper_loss=0.08847, over 3885972.48 frames. ], batch size: 60, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:41:53,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4113580.0, ans=0.0 2024-08-18 21:41:55,809 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4113580.0, ans=0.05 2024-08-18 21:42:02,387 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 35 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-18 21:42:38,500 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-18 21:42:53,481 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4114080.0, ans=0.125 2024-08-18 21:42:54,318 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 10300, loss[loss=0.06664, beats_loss=0.01394, ecapa_loss=8.419e-05, whisper_loss=0.05186, over 15813.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01058, ecapa_loss=0.0001435, whisper_loss=0.08916, over 3886231.81 frames. ], batch size: 60, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:43:07,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4114180.0, ans=0.2 2024-08-18 21:43:08,361 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.322e+01 2.569e+01 2.887e+01 6.847e+01, threshold=5.137e+01, percent-clipped=1.0 2024-08-18 21:43:11,247 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4114180.0, ans=0.125 2024-08-18 21:43:12,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4114180.0, ans=0.0 2024-08-18 21:43:36,105 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-18 21:43:45,614 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4114480.0, ans=0.0 2024-08-18 21:43:57,764 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 10350, loss[loss=0.0913, beats_loss=0.009877, ecapa_loss=0.0001337, whisper_loss=0.08009, over 17459.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01052, ecapa_loss=0.0001438, whisper_loss=0.08966, over 3892947.40 frames. ], batch size: 67, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:43:58,228 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-18 21:44:20,444 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-18 21:44:36,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4114880.0, ans=0.0 2024-08-18 21:44:41,455 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 15 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-18 21:44:42,797 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 19 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-18 21:44:49,428 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.94 vs. limit=15.0 2024-08-18 21:45:03,376 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 10400, loss[loss=0.08332, beats_loss=0.01261, ecapa_loss=0.0001385, whisper_loss=0.06932, over 18748.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01058, ecapa_loss=0.0001431, whisper_loss=0.08885, over 3866699.20 frames. ], batch size: 78, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:45:04,671 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 23 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-18 21:45:08,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4115080.0, ans=0.2 2024-08-18 21:45:17,374 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.312e+01 2.463e+01 2.686e+01 5.077e+01, threshold=4.927e+01, percent-clipped=0.0 2024-08-18 21:45:17,855 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4115180.0, ans=0.125 2024-08-18 21:45:27,628 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4115180.0, ans=0.125 2024-08-18 21:45:31,134 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4115280.0, ans=0.125 2024-08-18 21:45:32,483 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 18 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-18 21:45:45,798 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 15 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-18 21:45:57,206 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 17 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-18 21:46:04,496 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 22 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-18 21:46:08,134 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 10450, loss[loss=0.1009, beats_loss=0.01111, ecapa_loss=0.0001121, whisper_loss=0.0887, over 19350.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01056, ecapa_loss=0.0001431, whisper_loss=0.08863, over 3865165.57 frames. ], batch size: 74, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:46:08,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4115580.0, ans=0.0 2024-08-18 21:46:09,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4115580.0, ans=0.1 2024-08-18 21:46:28,955 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4115680.0, ans=0.0 2024-08-18 21:46:32,755 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-18 21:46:33,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4115780.0, ans=0.07 2024-08-18 21:46:34,141 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4115780.0, ans=0.1 2024-08-18 21:47:14,638 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 10500, loss[loss=0.09448, beats_loss=0.01185, ecapa_loss=0.0001609, whisper_loss=0.08102, over 21178.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01047, ecapa_loss=0.0001436, whisper_loss=0.08945, over 3894887.83 frames. ], batch size: 89, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:47:28,564 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 21 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-18 21:47:29,474 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.013e+01 2.302e+01 2.513e+01 2.856e+01 4.600e+01, threshold=5.027e+01, percent-clipped=0.0 2024-08-18 21:47:49,115 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 16 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-18 21:47:52,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4116280.0, ans=0.125 2024-08-18 21:48:07,031 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4116380.0, ans=0.125 2024-08-18 21:48:08,485 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 14 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-18 21:48:08,810 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4116380.0, ans=0.125 2024-08-18 21:48:17,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4116480.0, ans=0.125 2024-08-18 21:48:20,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=4116480.0, ans=0.05 2024-08-18 21:48:25,063 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 10550, loss[loss=0.07406, beats_loss=0.01276, ecapa_loss=0.0001381, whisper_loss=0.05992, over 14639.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01046, ecapa_loss=0.0001435, whisper_loss=0.08937, over 3897742.17 frames. ], batch size: 57, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:48:25,704 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.58 vs. limit=15.0 2024-08-18 21:48:26,762 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-18 21:48:26,970 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4116580.0, ans=0.1 2024-08-18 21:48:33,487 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4116580.0, ans=0.125 2024-08-18 21:48:43,146 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 18 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-18 21:48:50,696 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.14 vs. limit=15.0 2024-08-18 21:48:55,211 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-18 21:49:23,727 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4116980.0, ans=0.125 2024-08-18 21:49:28,305 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.13 vs. limit=15.0 2024-08-18 21:49:34,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4117080.0, ans=0.125 2024-08-18 21:49:35,582 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 10600, loss[loss=0.0855, beats_loss=0.01181, ecapa_loss=0.0001281, whisper_loss=0.07241, over 16760.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01046, ecapa_loss=0.0001433, whisper_loss=0.08913, over 3875991.29 frames. ], batch size: 65, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:49:50,614 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.667e+01 2.337e+01 2.542e+01 2.897e+01 3.687e+01, threshold=5.085e+01, percent-clipped=0.0 2024-08-18 21:49:55,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=4117180.0, ans=0.025 2024-08-18 21:50:00,894 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.35 vs. limit=15.0 2024-08-18 21:50:07,525 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.78 vs. limit=15.0 2024-08-18 21:50:08,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4117280.0, ans=0.125 2024-08-18 21:50:08,805 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.38 vs. limit=15.0 2024-08-18 21:50:14,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4117280.0, ans=0.0 2024-08-18 21:50:19,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4117380.0, ans=0.0 2024-08-18 21:50:25,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4117380.0, ans=0.125 2024-08-18 21:50:43,766 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 10650, loss[loss=0.1088, beats_loss=0.01111, ecapa_loss=0.0001074, whisper_loss=0.0966, over 23285.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01043, ecapa_loss=0.0001432, whisper_loss=0.08935, over 3868983.90 frames. ], batch size: 88, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:51:05,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4117680.0, ans=0.0 2024-08-18 21:51:06,326 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 21 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-18 21:51:28,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4117880.0, ans=0.125 2024-08-18 21:51:33,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4117880.0, ans=0.125 2024-08-18 21:51:33,086 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4117880.0, ans=0.125 2024-08-18 21:51:43,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4117980.0, ans=0.125 2024-08-18 21:51:46,269 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4117980.0, ans=0.125 2024-08-18 21:51:51,058 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 10700, loss[loss=0.1068, beats_loss=0.01042, ecapa_loss=0.0001158, whisper_loss=0.09524, over 19891.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01057, ecapa_loss=0.0001421, whisper_loss=0.08852, over 3845069.48 frames. ], batch size: 77, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:52:05,525 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.628e+01 2.357e+01 2.585e+01 2.828e+01 1.514e+02, threshold=5.170e+01, percent-clipped=1.0 2024-08-18 21:52:18,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4118280.0, ans=0.125 2024-08-18 21:52:23,192 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.77 vs. limit=15.0 2024-08-18 21:52:47,033 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-18 21:52:54,113 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4118480.0, ans=0.125 2024-08-18 21:52:55,385 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4118580.0, ans=0.0 2024-08-18 21:52:55,385 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4118580.0, ans=0.125 2024-08-18 21:52:56,106 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 10750, loss[loss=0.09833, beats_loss=0.01131, ecapa_loss=0.0001797, whisper_loss=0.08523, over 21370.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01058, ecapa_loss=0.0001415, whisper_loss=0.08886, over 3873789.66 frames. ], batch size: 90, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:53:03,639 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=4118580.0, ans=22.5 2024-08-18 21:53:17,153 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 8 from Vox, 30 fro AS 2024-08-18 21:53:18,644 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4118680.0, ans=0.125 2024-08-18 21:53:27,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4118780.0, ans=0.0 2024-08-18 21:53:35,030 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 21 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-18 21:53:50,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4118980.0, ans=0.1 2024-08-18 21:53:56,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4118980.0, ans=0.1 2024-08-18 21:53:58,019 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4118980.0, ans=0.0 2024-08-18 21:54:01,142 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 10800, loss[loss=0.1097, beats_loss=0.009201, ecapa_loss=0.0001897, whisper_loss=0.09857, over 17014.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01057, ecapa_loss=0.0001424, whisper_loss=0.08913, over 3884178.05 frames. ], batch size: 69, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:54:15,505 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.266e+01 2.523e+01 2.854e+01 3.753e+01, threshold=5.046e+01, percent-clipped=0.0 2024-08-18 21:54:17,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4119180.0, ans=0.0 2024-08-18 21:54:27,165 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 26 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-18 21:54:27,889 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.47 vs. limit=15.0 2024-08-18 21:54:31,312 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4119280.0, ans=0.125 2024-08-18 21:54:38,526 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.39 vs. limit=15.0 2024-08-18 21:54:57,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4119480.0, ans=0.125 2024-08-18 21:55:00,225 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4119480.0, ans=0.07 2024-08-18 21:55:06,182 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 10850, loss[loss=0.09135, beats_loss=0.00955, ecapa_loss=0.0001961, whisper_loss=0.07984, over 21482.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01056, ecapa_loss=0.0001425, whisper_loss=0.08889, over 3881541.55 frames. ], batch size: 91, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:55:06,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4119580.0, ans=0.125 2024-08-18 21:55:12,028 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4119580.0, ans=0.125 2024-08-18 21:55:20,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4119680.0, ans=0.125 2024-08-18 21:55:23,986 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.38 vs. limit=15.0 2024-08-18 21:55:38,051 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4119780.0, ans=0.0 2024-08-18 21:55:52,134 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4119880.0, ans=0.2 2024-08-18 21:56:03,659 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-18 21:56:13,963 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 10900, loss[loss=0.09636, beats_loss=0.008552, ecapa_loss=0.0001606, whisper_loss=0.0862, over 19217.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0105, ecapa_loss=0.0001428, whisper_loss=0.08929, over 3859342.69 frames. ], batch size: 75, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:56:15,776 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4120080.0, ans=0.125 2024-08-18 21:56:22,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4120080.0, ans=0.0 2024-08-18 21:56:27,996 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.367e+01 2.602e+01 2.908e+01 4.089e+01, threshold=5.204e+01, percent-clipped=0.0 2024-08-18 21:57:07,303 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-18 21:57:08,571 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 21 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-18 21:57:16,551 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 25 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-18 21:57:19,092 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 10950, loss[loss=0.1059, beats_loss=0.008385, ecapa_loss=0.0001562, whisper_loss=0.09594, over 18362.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01047, ecapa_loss=0.0001434, whisper_loss=0.08978, over 3862135.09 frames. ], batch size: 72, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:57:24,239 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-18 21:57:33,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4120680.0, ans=0.125 2024-08-18 21:57:54,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4120780.0, ans=0.1 2024-08-18 21:58:00,798 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 19 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-18 21:58:03,345 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-18 21:58:05,148 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.08 vs. limit=6.0 2024-08-18 21:58:23,753 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 11000, loss[loss=0.1143, beats_loss=0.009666, ecapa_loss=0.0001505, whisper_loss=0.1031, over 23478.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01041, ecapa_loss=0.0001437, whisper_loss=0.09019, over 3899750.60 frames. ], batch size: 93, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:58:38,414 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.764e+01 2.293e+01 2.499e+01 2.865e+01 3.776e+01, threshold=4.999e+01, percent-clipped=0.0 2024-08-18 21:58:38,555 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-18 21:58:48,676 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4121280.0, ans=0.125 2024-08-18 21:59:07,122 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.16 vs. limit=5.0 2024-08-18 21:59:15,040 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.43 vs. limit=6.0 2024-08-18 21:59:28,275 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 11050, loss[loss=0.08859, beats_loss=0.0114, ecapa_loss=0.0001423, whisper_loss=0.07577, over 22258.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01046, ecapa_loss=0.0001441, whisper_loss=0.08974, over 3903732.34 frames. ], batch size: 92, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:59:36,950 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4121580.0, ans=0.125 2024-08-18 21:59:55,017 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4121780.0, ans=0.125 2024-08-18 22:00:04,595 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-18 22:00:11,035 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 31 from Vox, 29 fro AS 2024-08-18 22:00:11,323 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4121880.0, ans=0.2 2024-08-18 22:00:23,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4121980.0, ans=0.125 2024-08-18 22:00:27,016 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 34 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-18 22:00:33,542 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 11100, loss[loss=0.104, beats_loss=0.009965, ecapa_loss=0.0001624, whisper_loss=0.09243, over 18402.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01038, ecapa_loss=0.0001449, whisper_loss=0.09005, over 3872635.07 frames. ], batch size: 76, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 22:00:41,135 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.97 vs. limit=15.0 2024-08-18 22:00:47,895 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.978e+01 2.366e+01 2.564e+01 2.884e+01 4.228e+01, threshold=5.128e+01, percent-clipped=0.0 2024-08-18 22:00:58,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4122280.0, ans=0.0 2024-08-18 22:01:01,585 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.25 vs. limit=15.0 2024-08-18 22:01:13,031 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4122380.0, ans=0.0 2024-08-18 22:01:13,163 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=5.250e+00 2024-08-18 22:01:38,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4122580.0, ans=0.125 2024-08-18 22:01:39,087 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 11150, loss[loss=0.08801, beats_loss=0.01049, ecapa_loss=0.0001714, whisper_loss=0.07581, over 20058.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0104, ecapa_loss=0.000145, whisper_loss=0.09016, over 3883049.45 frames. ], batch size: 81, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 22:01:43,101 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-18 22:02:14,727 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4122780.0, ans=0.125 2024-08-18 22:02:24,630 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 26 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-18 22:02:25,839 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 29 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-18 22:02:26,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4122880.0, ans=0.0 2024-08-18 22:02:27,798 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4122880.0, ans=0.125 2024-08-18 22:02:42,791 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 24 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-18 22:02:43,781 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 11200, loss[loss=0.09681, beats_loss=0.01178, ecapa_loss=0.0001441, whisper_loss=0.08359, over 18967.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01033, ecapa_loss=0.0001463, whisper_loss=0.09045, over 3868039.56 frames. ], batch size: 79, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 22:02:48,277 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4123080.0, ans=0.0 2024-08-18 22:02:58,232 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.383e+01 2.625e+01 2.814e+01 6.266e+01, threshold=5.250e+01, percent-clipped=1.0 2024-08-18 22:02:58,465 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 18 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-18 22:03:18,290 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4123280.0, ans=0.125 2024-08-18 22:03:20,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4123280.0, ans=0.0 2024-08-18 22:03:27,125 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4123380.0, ans=0.0 2024-08-18 22:03:33,600 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4123380.0, ans=0.0 2024-08-18 22:03:35,363 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.94 vs. limit=22.5 2024-08-18 22:03:44,040 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4123480.0, ans=0.1 2024-08-18 22:03:46,463 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4123480.0, ans=0.2 2024-08-18 22:03:48,730 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 11250, loss[loss=0.1033, beats_loss=0.01025, ecapa_loss=0.0001381, whisper_loss=0.09165, over 15777.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01033, ecapa_loss=0.0001459, whisper_loss=0.09057, over 3881907.14 frames. ], batch size: 61, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 22:04:10,864 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-18 22:04:12,935 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.18 vs. limit=22.5 2024-08-18 22:04:51,345 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 15 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-18 22:04:53,725 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 11300, loss[loss=0.1136, beats_loss=0.007911, ecapa_loss=0.0001407, whisper_loss=0.1043, over 22644.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01038, ecapa_loss=0.0001441, whisper_loss=0.08991, over 3878413.26 frames. ], batch size: 89, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 22:04:54,129 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-18 22:04:55,646 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4124080.0, ans=0.125 2024-08-18 22:04:56,511 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-18 22:04:57,751 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 24 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-18 22:04:58,087 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4124080.0, ans=0.09899494936611666 2024-08-18 22:05:01,857 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.32 vs. limit=15.0 2024-08-18 22:05:07,793 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.238e+01 2.501e+01 2.783e+01 2.394e+02, threshold=5.001e+01, percent-clipped=1.0 2024-08-18 22:05:10,837 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4124180.0, ans=0.125 2024-08-18 22:05:13,205 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-18 22:05:48,153 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 31 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-18 22:05:58,559 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 11350, loss[loss=0.09826, beats_loss=0.01126, ecapa_loss=0.0001492, whisper_loss=0.0855, over 19646.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01038, ecapa_loss=0.0001437, whisper_loss=0.09052, over 3906589.81 frames. ], batch size: 80, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 22:06:02,541 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 22 from LS+wenet, 13 from Vox, 42 fro AS 2024-08-18 22:06:14,474 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4124680.0, ans=0.125 2024-08-18 22:06:31,827 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4124780.0, ans=0.125 2024-08-18 22:06:39,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4124880.0, ans=0.125 2024-08-18 22:06:42,513 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 18 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-18 22:07:03,776 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-18 22:07:04,805 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 11400, loss[loss=0.102, beats_loss=0.01024, ecapa_loss=0.0001293, whisper_loss=0.09049, over 19085.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01033, ecapa_loss=0.000144, whisper_loss=0.09074, over 3904559.60 frames. ], batch size: 75, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 22:07:12,432 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-18 22:07:14,589 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.97 vs. limit=12.0 2024-08-18 22:07:18,933 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.937e+01 2.399e+01 2.610e+01 2.996e+01 4.711e+01, threshold=5.221e+01, percent-clipped=0.0 2024-08-18 22:07:29,850 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-18 22:07:32,421 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-18 22:07:45,139 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-18 22:08:05,227 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4125480.0, ans=0.0 2024-08-18 22:08:09,660 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 11450, loss[loss=0.08702, beats_loss=0.01159, ecapa_loss=0.0001369, whisper_loss=0.07406, over 16452.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01044, ecapa_loss=0.0001436, whisper_loss=0.08999, over 3912728.28 frames. ], batch size: 64, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 22:08:11,161 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 30 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-18 22:08:13,935 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4125580.0, ans=0.2 2024-08-18 22:08:28,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4125680.0, ans=0.125 2024-08-18 22:08:45,107 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4125780.0, ans=0.125 2024-08-18 22:08:58,683 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4125880.0, ans=0.2 2024-08-18 22:08:58,959 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.02 vs. limit=22.5 2024-08-18 22:09:15,162 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 11500, loss[loss=0.1091, beats_loss=0.01037, ecapa_loss=0.0001222, whisper_loss=0.09749, over 23144.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01049, ecapa_loss=0.0001442, whisper_loss=0.08967, over 3875246.13 frames. ], batch size: 91, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 22:09:22,173 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4126080.0, ans=0.125 2024-08-18 22:09:24,381 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 15 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-18 22:09:28,915 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.98 vs. limit=10.0 2024-08-18 22:09:29,385 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.741e+01 2.337e+01 2.548e+01 2.823e+01 1.618e+02, threshold=5.097e+01, percent-clipped=1.0 2024-08-18 22:09:31,482 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.72 vs. limit=15.0 2024-08-18 22:09:35,228 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 20 from LS+wenet, 22 from Vox, 13 fro AS 2024-08-18 22:09:36,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4126180.0, ans=0.0 2024-08-18 22:09:37,830 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-18 22:09:40,950 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.11 vs. limit=15.0 2024-08-18 22:09:41,631 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-18 22:09:59,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4126380.0, ans=0.1 2024-08-18 22:09:59,156 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4126380.0, ans=0.125 2024-08-18 22:10:02,324 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 27 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-18 22:10:20,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4126580.0, ans=0.2 2024-08-18 22:10:21,113 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 11550, loss[loss=0.1014, beats_loss=0.008697, ecapa_loss=0.0001554, whisper_loss=0.09117, over 14571.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01037, ecapa_loss=0.0001454, whisper_loss=0.09041, over 3859461.16 frames. ], batch size: 57, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 22:10:24,855 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4126580.0, ans=0.2 2024-08-18 22:10:26,001 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=4126580.0, ans=0.025 2024-08-18 22:10:27,237 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 20 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-18 22:10:29,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4126580.0, ans=0.125 2024-08-18 22:10:32,329 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 23 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-18 22:10:44,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4126680.0, ans=0.0 2024-08-18 22:10:45,216 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.95 vs. limit=15.0 2024-08-18 22:11:01,987 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.48 vs. limit=15.0 2024-08-18 22:11:12,952 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.14 vs. limit=15.0 2024-08-18 22:11:14,403 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-08-18 22:11:17,231 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 16 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-18 22:11:20,671 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.57 vs. limit=15.0 2024-08-18 22:11:29,452 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 11600, loss[loss=0.06585, beats_loss=0.01166, ecapa_loss=0.0001311, whisper_loss=0.05287, over 16262.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01043, ecapa_loss=0.0001447, whisper_loss=0.09006, over 3864513.59 frames. ], batch size: 66, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 22:11:31,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4127080.0, ans=0.2 2024-08-18 22:11:41,523 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.60 vs. limit=15.0 2024-08-18 22:11:42,782 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4127180.0, ans=0.0 2024-08-18 22:11:43,905 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 19 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-18 22:11:44,870 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.928e+01 2.392e+01 2.631e+01 2.895e+01 1.114e+02, threshold=5.261e+01, percent-clipped=3.0 2024-08-18 22:12:15,320 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4127380.0, ans=0.1 2024-08-18 22:12:22,129 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4127380.0, ans=0.125 2024-08-18 22:12:22,451 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.63 vs. limit=10.0 2024-08-18 22:12:28,041 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.34 vs. limit=6.0 2024-08-18 22:12:37,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4127480.0, ans=0.1 2024-08-18 22:12:38,472 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=4127580.0, ans=0.2 2024-08-18 22:12:39,334 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 11650, loss[loss=0.1015, beats_loss=0.01145, ecapa_loss=0.0001291, whisper_loss=0.0888, over 18676.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01053, ecapa_loss=0.0001436, whisper_loss=0.09032, over 3904984.08 frames. ], batch size: 75, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 22:12:43,589 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-18 22:12:43,839 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4127580.0, ans=0.125 2024-08-18 22:12:45,369 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4127580.0, ans=0.0 2024-08-18 22:12:46,667 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4127580.0, ans=0.125 2024-08-18 22:12:55,223 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-18 22:12:55,708 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.20 vs. limit=15.0 2024-08-18 22:12:57,650 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-18 22:12:57,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4127680.0, ans=0.0 2024-08-18 22:13:00,584 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-18 22:13:01,878 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-18 22:13:02,980 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 24 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-18 22:13:11,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4127780.0, ans=0.125 2024-08-18 22:13:20,739 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.98 vs. limit=22.5 2024-08-18 22:13:31,826 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4127980.0, ans=0.125 2024-08-18 22:13:33,087 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4127980.0, ans=0.0 2024-08-18 22:13:35,863 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.73 vs. limit=15.0 2024-08-18 22:13:43,933 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 11700, loss[loss=0.07696, beats_loss=0.01156, ecapa_loss=0.0001114, whisper_loss=0.06429, over 17567.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01057, ecapa_loss=0.000143, whisper_loss=0.09032, over 3904076.00 frames. ], batch size: 69, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 22:13:53,680 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.51 vs. limit=15.0 2024-08-18 22:13:58,082 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.372e+01 2.671e+01 2.866e+01 2.586e+02, threshold=5.342e+01, percent-clipped=1.0 2024-08-18 22:14:07,812 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4128180.0, ans=0.125 2024-08-18 22:14:08,797 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 32 from Vox, 32 fro AS 2024-08-18 22:14:09,052 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4128280.0, ans=0.07 2024-08-18 22:14:16,938 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.98 vs. limit=12.0 2024-08-18 22:14:20,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4128280.0, ans=0.0 2024-08-18 22:14:30,481 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 21 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-18 22:14:30,760 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4128380.0, ans=0.0 2024-08-18 22:14:47,900 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 11750, loss[loss=0.1021, beats_loss=0.01275, ecapa_loss=0.0001167, whisper_loss=0.08822, over 23730.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01054, ecapa_loss=0.0001439, whisper_loss=0.0906, over 3922762.12 frames. ], batch size: 95, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:15:01,703 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 16 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-18 22:15:02,919 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 22 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-18 22:15:05,507 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-18 22:15:18,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4128780.0, ans=0.0 2024-08-18 22:15:50,779 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 11800, loss[loss=0.1082, beats_loss=0.01305, ecapa_loss=0.0001375, whisper_loss=0.09381, over 21627.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01062, ecapa_loss=0.0001421, whisper_loss=0.09081, over 3933812.13 frames. ], batch size: 85, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:16:06,196 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.286e+01 2.543e+01 2.807e+01 3.749e+01, threshold=5.086e+01, percent-clipped=0.0 2024-08-18 22:16:15,514 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-18 22:16:19,071 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 18 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-18 22:16:23,175 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4129280.0, ans=0.125 2024-08-18 22:16:24,795 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4129280.0, ans=0.2 2024-08-18 22:16:24,859 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.78 vs. limit=22.5 2024-08-18 22:16:27,205 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4129280.0, ans=0.125 2024-08-18 22:16:31,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4129380.0, ans=0.0 2024-08-18 22:16:46,415 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 21 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-18 22:16:50,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4129480.0, ans=0.0 2024-08-18 22:16:55,185 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 11850, loss[loss=0.1213, beats_loss=0.009422, ecapa_loss=0.000158, whisper_loss=0.1103, over 21763.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01064, ecapa_loss=0.0001424, whisper_loss=0.0905, over 3952818.14 frames. ], batch size: 88, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:17:00,592 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4129580.0, ans=0.125 2024-08-18 22:17:09,133 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.26 vs. limit=22.5 2024-08-18 22:17:26,988 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.46 vs. limit=15.0 2024-08-18 22:17:29,073 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 17 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-18 22:17:33,009 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 23 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-18 22:17:59,274 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 11900, loss[loss=0.1053, beats_loss=0.009797, ecapa_loss=0.0001373, whisper_loss=0.09409, over 15093.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01064, ecapa_loss=0.0001426, whisper_loss=0.09089, over 3932754.82 frames. ], batch size: 58, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:17:59,427 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 27 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-18 22:18:04,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4130080.0, ans=0.125 2024-08-18 22:18:08,692 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4130080.0, ans=0.2 2024-08-18 22:18:09,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4130080.0, ans=0.015 2024-08-18 22:18:14,417 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.051e+01 2.305e+01 2.478e+01 2.772e+01 3.965e+01, threshold=4.956e+01, percent-clipped=0.0 2024-08-18 22:18:17,411 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4130180.0, ans=0.125 2024-08-18 22:18:17,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4130180.0, ans=0.2 2024-08-18 22:18:21,302 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4130180.0, ans=0.09899494936611666 2024-08-18 22:18:21,356 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4130180.0, ans=0.1 2024-08-18 22:18:34,089 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-18 22:18:39,660 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.29 vs. limit=15.0 2024-08-18 22:18:48,346 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 23 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-18 22:19:03,305 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 11950, loss[loss=0.09682, beats_loss=0.006058, ecapa_loss=0.0001625, whisper_loss=0.08914, over 15568.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01055, ecapa_loss=0.0001428, whisper_loss=0.09069, over 3891196.40 frames. ], batch size: 57, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:19:15,700 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4130680.0, ans=0.0 2024-08-18 22:19:19,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4130680.0, ans=0.0 2024-08-18 22:19:33,353 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4130780.0, ans=0.2 2024-08-18 22:19:53,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4130980.0, ans=0.2 2024-08-18 22:20:02,143 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4130980.0, ans=0.0 2024-08-18 22:20:06,935 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 12000, loss[loss=0.07996, beats_loss=0.009921, ecapa_loss=0.0001534, whisper_loss=0.0685, over 14412.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01061, ecapa_loss=0.0001414, whisper_loss=0.0901, over 3900943.17 frames. ], batch size: 57, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:20:06,936 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-18 22:20:47,535 INFO [train_multi_KD3.py:1149] (2/4) Epoch 28, validation on ASR_libri: loss=0.2535, beats_loss=0, ecapa_loss=0.0005126, whisper_loss=0.2484, over 922467.00 frames. 2024-08-18 22:21:06,696 INFO [train_multi_KD3.py:1149] (2/4) Epoch 28, validation on SV_voxceleb1: loss=0.004051, beats_loss=0, ecapa_loss=0.0004051, whisper_loss=0, over 939242.00 frames. 2024-08-18 22:22:55,544 INFO [train_multi_KD3.py:1149] (2/4) Epoch 28, validation on AT_audioset: loss=0.02313, beats_loss=0.02313, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 22:22:55,548 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31584MB 2024-08-18 22:22:58,841 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4131080.0, ans=0.0 2024-08-18 22:22:58,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4131080.0, ans=0.125 2024-08-18 22:23:05,801 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4131080.0, ans=0.125 2024-08-18 22:23:11,598 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.800e+01 2.291e+01 2.547e+01 2.883e+01 4.329e+01, threshold=5.094e+01, percent-clipped=0.0 2024-08-18 22:23:28,227 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 18 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-18 22:23:43,247 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 35 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-18 22:23:59,412 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 12050, loss[loss=0.1006, beats_loss=0.01074, ecapa_loss=0.0001449, whisper_loss=0.08838, over 16515.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01057, ecapa_loss=0.000141, whisper_loss=0.08982, over 3885424.21 frames. ], batch size: 67, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:24:01,300 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4131580.0, ans=0.125 2024-08-18 22:24:05,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4131580.0, ans=0.0 2024-08-18 22:24:11,441 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-18 22:24:15,778 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.11 vs. limit=15.0 2024-08-18 22:24:29,029 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 21 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-18 22:24:34,307 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4131780.0, ans=0.125 2024-08-18 22:24:35,931 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.95 vs. limit=12.0 2024-08-18 22:25:02,506 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4132080.0, ans=0.0 2024-08-18 22:25:03,150 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 12100, loss[loss=0.1162, beats_loss=0.00863, ecapa_loss=0.0001473, whisper_loss=0.1061, over 21513.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01049, ecapa_loss=0.0001423, whisper_loss=0.09026, over 3835068.61 frames. ], batch size: 88, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:25:04,996 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4132080.0, ans=0.125 2024-08-18 22:25:15,039 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=10.05 vs. limit=12.0 2024-08-18 22:25:17,902 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.763e+01 2.242e+01 2.477e+01 2.714e+01 3.521e+01, threshold=4.955e+01, percent-clipped=0.0 2024-08-18 22:25:23,407 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4132180.0, ans=0.125 2024-08-18 22:25:23,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4132180.0, ans=0.125 2024-08-18 22:25:35,966 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-18 22:25:40,802 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.08 vs. limit=15.0 2024-08-18 22:25:42,898 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4132380.0, ans=0.125 2024-08-18 22:25:44,415 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.72 vs. limit=15.0 2024-08-18 22:25:47,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4132380.0, ans=0.0 2024-08-18 22:25:50,066 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4132380.0, ans=0.125 2024-08-18 22:25:52,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4132380.0, ans=0.0 2024-08-18 22:26:01,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=4132480.0, ans=0.025 2024-08-18 22:26:09,543 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 12150, loss[loss=0.09845, beats_loss=0.0097, ecapa_loss=0.0001377, whisper_loss=0.08738, over 18512.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01048, ecapa_loss=0.000143, whisper_loss=0.08993, over 3867097.52 frames. ], batch size: 72, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:26:15,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4132580.0, ans=0.0 2024-08-18 22:26:24,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4132580.0, ans=0.125 2024-08-18 22:26:27,864 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2024-08-18 22:26:30,043 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-18 22:26:40,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4132680.0, ans=0.07 2024-08-18 22:27:21,847 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 36 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-18 22:27:22,242 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 22:27:27,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4132980.0, ans=0.0 2024-08-18 22:27:40,521 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 12200, loss[loss=0.09702, beats_loss=0.007428, ecapa_loss=0.0001255, whisper_loss=0.08834, over 16279.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01048, ecapa_loss=0.0001425, whisper_loss=0.0897, over 3839550.86 frames. ], batch size: 60, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:27:42,250 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4133080.0, ans=0.125 2024-08-18 22:27:57,012 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 30 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-18 22:28:03,438 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.24 vs. limit=22.5 2024-08-18 22:28:07,545 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.413e+01 2.660e+01 2.941e+01 4.822e+01, threshold=5.320e+01, percent-clipped=0.0 2024-08-18 22:28:38,969 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4133280.0, ans=0.2 2024-08-18 22:29:11,961 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 22:29:28,668 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 12250, loss[loss=0.1236, beats_loss=0.01029, ecapa_loss=0.0001295, whisper_loss=0.112, over 24340.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01047, ecapa_loss=0.000143, whisper_loss=0.08998, over 3858012.39 frames. ], batch size: 92, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:29:33,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4133580.0, ans=0.2 2024-08-18 22:29:41,841 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-18 22:30:10,368 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4133680.0, ans=0.07 2024-08-18 22:30:28,033 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 16 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-18 22:30:32,360 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4133880.0, ans=0.125 2024-08-18 22:30:35,923 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4133880.0, ans=0.0 2024-08-18 22:30:37,621 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4133880.0, ans=0.125 2024-08-18 22:30:50,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4133980.0, ans=0.125 2024-08-18 22:31:01,639 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4134080.0, ans=0.04949747468305833 2024-08-18 22:31:01,705 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 22:31:02,378 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 12300, loss[loss=0.1021, beats_loss=0.01199, ecapa_loss=0.0001889, whisper_loss=0.08821, over 14275.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0105, ecapa_loss=0.0001427, whisper_loss=0.08965, over 3851249.28 frames. ], batch size: 63, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:31:19,463 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.309e+01 2.558e+01 2.931e+01 4.229e+01, threshold=5.115e+01, percent-clipped=0.0 2024-08-18 22:31:39,631 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2024-08-18 22:31:54,436 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.22 vs. limit=15.0 2024-08-18 22:32:11,109 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4134480.0, ans=0.2 2024-08-18 22:32:15,589 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.64 vs. limit=22.5 2024-08-18 22:32:16,008 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 12350, loss[loss=0.09363, beats_loss=0.01133, ecapa_loss=0.00014, whisper_loss=0.0809, over 21106.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01052, ecapa_loss=0.0001425, whisper_loss=0.08949, over 3874034.07 frames. ], batch size: 89, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:32:36,361 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4134680.0, ans=0.1 2024-08-18 22:32:43,635 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4134680.0, ans=0.0 2024-08-18 22:32:46,781 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 17 from LS+wenet, 26 from Vox, 23 fro AS 2024-08-18 22:33:29,987 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4134980.0, ans=0.125 2024-08-18 22:33:33,584 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 12400, loss[loss=0.09659, beats_loss=0.01174, ecapa_loss=0.0001328, whisper_loss=0.08352, over 21402.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01049, ecapa_loss=0.000143, whisper_loss=0.08977, over 3876391.66 frames. ], batch size: 88, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:33:46,428 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-18 22:33:51,676 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4135180.0, ans=0.125 2024-08-18 22:33:51,760 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4135180.0, ans=0.125 2024-08-18 22:33:52,688 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.330e+01 2.647e+01 2.859e+01 4.171e+01, threshold=5.294e+01, percent-clipped=0.0 2024-08-18 22:34:00,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4135180.0, ans=0.125 2024-08-18 22:34:17,744 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 15 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-18 22:34:30,111 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 16 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-18 22:34:35,246 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.98 vs. limit=15.0 2024-08-18 22:34:46,786 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 21 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-18 22:34:50,931 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 12450, loss[loss=0.07353, beats_loss=0.01343, ecapa_loss=0.0001345, whisper_loss=0.05875, over 15764.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0105, ecapa_loss=0.0001431, whisper_loss=0.08958, over 3867743.29 frames. ], batch size: 64, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:34:57,385 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-18 22:34:58,772 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-18 22:35:16,493 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4135680.0, ans=0.0 2024-08-18 22:35:24,373 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4135780.0, ans=0.1 2024-08-18 22:35:30,110 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.50 vs. limit=22.5 2024-08-18 22:35:32,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4135780.0, ans=0.125 2024-08-18 22:35:34,288 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4135780.0, ans=0.125 2024-08-18 22:35:50,376 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-18 22:36:05,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4135980.0, ans=0.0 2024-08-18 22:36:07,704 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 12500, loss[loss=0.1148, beats_loss=0.008637, ecapa_loss=0.0001541, whisper_loss=0.1046, over 21062.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01044, ecapa_loss=0.0001429, whisper_loss=0.09015, over 3857566.24 frames. ], batch size: 84, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:36:14,658 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 22:36:16,122 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4136080.0, ans=0.125 2024-08-18 22:36:25,092 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.340e+01 2.580e+01 2.958e+01 4.069e+01, threshold=5.161e+01, percent-clipped=0.0 2024-08-18 22:36:26,116 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4136180.0, ans=0.0 2024-08-18 22:36:27,038 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 23 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-18 22:36:38,401 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4136280.0, ans=0.125 2024-08-18 22:36:40,193 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4136280.0, ans=0.1 2024-08-18 22:36:48,483 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4136280.0, ans=0.125 2024-08-18 22:36:48,573 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4136280.0, ans=0.125 2024-08-18 22:36:50,099 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4136280.0, ans=0.125 2024-08-18 22:36:51,374 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 22 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-18 22:36:59,371 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4136380.0, ans=0.2 2024-08-18 22:37:17,776 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-18 22:37:21,435 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 12550, loss[loss=0.1162, beats_loss=0.009299, ecapa_loss=0.0001486, whisper_loss=0.1054, over 22520.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01046, ecapa_loss=0.0001428, whisper_loss=0.09028, over 3839411.11 frames. ], batch size: 89, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:37:40,056 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.00 vs. limit=22.5 2024-08-18 22:37:44,111 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 22:38:08,956 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 17 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-18 22:38:26,342 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-18 22:38:30,554 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 12600, loss[loss=0.1073, beats_loss=0.009845, ecapa_loss=0.0001349, whisper_loss=0.09608, over 15496.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01048, ecapa_loss=0.000144, whisper_loss=0.09008, over 3846953.49 frames. ], batch size: 60, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:38:47,431 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.167e+01 2.472e+01 2.650e+01 1.085e+02, threshold=4.945e+01, percent-clipped=2.0 2024-08-18 22:38:49,194 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4137180.0, ans=0.125 2024-08-18 22:38:52,266 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.57 vs. limit=6.0 2024-08-18 22:39:05,073 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4137280.0, ans=0.125 2024-08-18 22:39:11,273 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=4137380.0, ans=0.95 2024-08-18 22:39:13,999 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4137380.0, ans=0.125 2024-08-18 22:39:23,098 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-18 22:39:29,660 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4137480.0, ans=0.2 2024-08-18 22:39:30,556 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 15 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-18 22:39:32,034 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 22:39:37,674 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 12650, loss[loss=0.1055, beats_loss=0.007252, ecapa_loss=0.000129, whisper_loss=0.09697, over 15321.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01049, ecapa_loss=0.000144, whisper_loss=0.09044, over 3846898.59 frames. ], batch size: 55, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:39:44,426 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 21 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-18 22:39:52,347 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 20 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-18 22:39:56,546 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4137680.0, ans=0.0 2024-08-18 22:39:59,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4137680.0, ans=0.125 2024-08-18 22:40:04,904 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 21 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-18 22:40:18,327 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 20 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-18 22:40:27,165 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4137880.0, ans=0.0 2024-08-18 22:40:36,102 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 20 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-18 22:40:38,071 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4137980.0, ans=0.125 2024-08-18 22:40:45,688 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 12700, loss[loss=0.1058, beats_loss=0.009814, ecapa_loss=0.0001354, whisper_loss=0.0946, over 17372.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01047, ecapa_loss=0.0001452, whisper_loss=0.09033, over 3845317.51 frames. ], batch size: 66, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:40:52,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4138080.0, ans=0.04949747468305833 2024-08-18 22:40:58,742 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 23 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-18 22:41:03,220 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.642e+01 2.392e+01 2.657e+01 3.204e+01 3.516e+02, threshold=5.313e+01, percent-clipped=1.0 2024-08-18 22:41:46,178 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 37 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-18 22:41:52,139 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4138480.0, ans=0.2 2024-08-18 22:41:54,423 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 22 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-18 22:41:55,494 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 12750, loss[loss=0.109, beats_loss=0.008144, ecapa_loss=0.0002004, whisper_loss=0.09888, over 15220.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01052, ecapa_loss=0.0001456, whisper_loss=0.09001, over 3877677.06 frames. ], batch size: 63, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:42:21,483 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-18 22:42:27,218 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 32 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-18 22:42:35,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4138880.0, ans=0.0 2024-08-18 22:42:37,652 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 22 from LS+wenet, 6 from Vox, 30 fro AS 2024-08-18 22:42:37,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4138880.0, ans=0.2 2024-08-18 22:42:42,226 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.89 vs. limit=6.0 2024-08-18 22:42:47,352 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 19 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-18 22:42:53,423 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4138980.0, ans=0.125 2024-08-18 22:43:00,240 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4138980.0, ans=0.2 2024-08-18 22:43:00,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4138980.0, ans=0.1 2024-08-18 22:43:04,017 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 12800, loss[loss=0.1198, beats_loss=0.008394, ecapa_loss=0.0001973, whisper_loss=0.1095, over 17840.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01054, ecapa_loss=0.0001457, whisper_loss=0.09015, over 3844974.63 frames. ], batch size: 78, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:43:07,884 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.00 vs. limit=15.0 2024-08-18 22:43:11,920 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.84 vs. limit=6.0 2024-08-18 22:43:15,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4139080.0, ans=0.1 2024-08-18 22:43:20,485 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+01 2.353e+01 2.654e+01 3.011e+01 1.383e+02, threshold=5.309e+01, percent-clipped=3.0 2024-08-18 22:43:24,672 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4139180.0, ans=0.04949747468305833 2024-08-18 22:43:28,773 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4139180.0, ans=0.05 2024-08-18 22:43:38,921 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4139280.0, ans=0.125 2024-08-18 22:43:41,843 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4139280.0, ans=0.0 2024-08-18 22:43:50,709 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-18 22:44:10,619 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 12850, loss[loss=0.1012, beats_loss=0.009665, ecapa_loss=0.0001573, whisper_loss=0.08997, over 15559.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01064, ecapa_loss=0.000146, whisper_loss=0.08946, over 3819556.71 frames. ], batch size: 65, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:44:22,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4139680.0, ans=0.1 2024-08-18 22:44:32,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4139680.0, ans=0.1 2024-08-18 22:44:35,400 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4139680.0, ans=0.0 2024-08-18 22:44:55,211 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4139880.0, ans=0.1 2024-08-18 22:44:58,617 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 14 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-18 22:45:18,907 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 12900, loss[loss=0.1223, beats_loss=0.008666, ecapa_loss=0.0001181, whisper_loss=0.1125, over 17891.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01064, ecapa_loss=0.0001456, whisper_loss=0.08934, over 3833832.78 frames. ], batch size: 66, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:45:28,516 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4140080.0, ans=0.125 2024-08-18 22:45:35,657 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.698e+01 2.228e+01 2.406e+01 2.715e+01 2.764e+02, threshold=4.812e+01, percent-clipped=1.0 2024-08-18 22:45:59,564 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4140380.0, ans=0.0 2024-08-18 22:46:02,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4140380.0, ans=0.125 2024-08-18 22:46:21,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4140480.0, ans=0.125 2024-08-18 22:46:21,898 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 22:46:27,711 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 12950, loss[loss=0.1051, beats_loss=0.01128, ecapa_loss=0.0001316, whisper_loss=0.0925, over 22098.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01056, ecapa_loss=0.0001445, whisper_loss=0.08949, over 3840719.74 frames. ], batch size: 88, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:46:44,098 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4140680.0, ans=0.2 2024-08-18 22:46:46,938 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.94 vs. limit=6.0 2024-08-18 22:46:50,205 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 18 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-18 22:47:08,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4140880.0, ans=0.125 2024-08-18 22:47:09,438 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 19 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-18 22:47:12,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4140880.0, ans=0.0 2024-08-18 22:47:35,809 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 13000, loss[loss=0.09558, beats_loss=0.01204, ecapa_loss=0.0001466, whisper_loss=0.08208, over 22804.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01054, ecapa_loss=0.000144, whisper_loss=0.08989, over 3871238.59 frames. ], batch size: 92, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:47:37,415 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 16 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-18 22:47:47,438 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.59 vs. limit=15.0 2024-08-18 22:47:51,955 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+01 2.390e+01 2.751e+01 3.227e+01 4.665e+01, threshold=5.502e+01, percent-clipped=0.0 2024-08-18 22:48:04,696 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-18 22:48:06,155 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 23 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-18 22:48:12,292 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4141280.0, ans=0.07 2024-08-18 22:48:14,804 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4141280.0, ans=0.0 2024-08-18 22:48:16,122 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4141380.0, ans=0.125 2024-08-18 22:48:20,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4141380.0, ans=0.125 2024-08-18 22:48:23,608 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.48 vs. limit=15.0 2024-08-18 22:48:24,463 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 18 from LS+wenet, 26 from Vox, 46 fro AS 2024-08-18 22:48:30,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4141480.0, ans=0.0 2024-08-18 22:48:35,238 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4141480.0, ans=0.2 2024-08-18 22:48:44,337 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 13050, loss[loss=0.1147, beats_loss=0.008803, ecapa_loss=0.000172, whisper_loss=0.1042, over 21123.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01055, ecapa_loss=0.0001439, whisper_loss=0.08975, over 3855357.61 frames. ], batch size: 87, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:49:23,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4141780.0, ans=0.125 2024-08-18 22:49:31,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4141880.0, ans=0.125 2024-08-18 22:49:34,235 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.08 vs. limit=15.0 2024-08-18 22:49:55,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4142080.0, ans=0.2 2024-08-18 22:49:55,847 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 13100, loss[loss=0.1001, beats_loss=0.01081, ecapa_loss=0.0001278, whisper_loss=0.08806, over 23909.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01059, ecapa_loss=0.0001438, whisper_loss=0.08912, over 3872208.05 frames. ], batch size: 93, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:49:58,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4142080.0, ans=0.05 2024-08-18 22:50:14,316 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.210e+01 2.445e+01 2.730e+01 3.721e+01, threshold=4.891e+01, percent-clipped=0.0 2024-08-18 22:50:30,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4142280.0, ans=0.125 2024-08-18 22:50:39,843 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.69 vs. limit=22.5 2024-08-18 22:51:10,354 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 13150, loss[loss=0.1065, beats_loss=0.01215, ecapa_loss=0.0001092, whisper_loss=0.09331, over 23941.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.0106, ecapa_loss=0.0001438, whisper_loss=0.08878, over 3878997.43 frames. ], batch size: 93, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:51:24,806 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 24 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-18 22:51:56,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4142880.0, ans=0.0 2024-08-18 22:51:57,477 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 35 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-18 22:52:19,726 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 16 from LS+wenet, 11 from Vox, 35 fro AS 2024-08-18 22:52:20,930 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-18 22:52:21,791 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.24 vs. limit=22.5 2024-08-18 22:52:23,471 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 13200, loss[loss=0.09687, beats_loss=0.01342, ecapa_loss=0.000143, whisper_loss=0.08202, over 22364.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01054, ecapa_loss=0.0001433, whisper_loss=0.08942, over 3843507.50 frames. ], batch size: 89, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:52:37,544 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-18 22:52:38,999 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4143180.0, ans=0.1 2024-08-18 22:52:39,723 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.334e+01 2.573e+01 2.889e+01 3.948e+01, threshold=5.147e+01, percent-clipped=0.0 2024-08-18 22:53:03,094 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-18 22:53:17,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4143380.0, ans=0.125 2024-08-18 22:53:21,398 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 19 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-18 22:53:26,532 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4143480.0, ans=0.0 2024-08-18 22:53:29,340 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4143480.0, ans=0.125 2024-08-18 22:53:29,462 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4143480.0, ans=0.2 2024-08-18 22:53:30,718 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-18 22:53:32,942 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 13250, loss[loss=0.08988, beats_loss=0.008509, ecapa_loss=0.0001739, whisper_loss=0.07964, over 14288.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01054, ecapa_loss=0.000144, whisper_loss=0.0895, over 3805496.99 frames. ], batch size: 56, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:53:44,632 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-18 22:53:58,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4143680.0, ans=0.125 2024-08-18 22:54:10,149 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-18 22:54:34,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4143980.0, ans=0.0 2024-08-18 22:54:36,459 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-18 22:54:46,126 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 13300, loss[loss=0.1346, beats_loss=0.007843, ecapa_loss=0.000151, whisper_loss=0.1252, over 22502.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01046, ecapa_loss=0.0001441, whisper_loss=0.08994, over 3840103.24 frames. ], batch size: 88, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:54:51,274 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2024-08-18 22:54:58,066 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4144080.0, ans=0.125 2024-08-18 22:55:01,701 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-18 22:55:02,711 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.713e+01 2.463e+01 2.740e+01 3.023e+01 1.147e+02, threshold=5.480e+01, percent-clipped=2.0 2024-08-18 22:55:11,484 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 25 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-18 22:55:18,983 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-18 22:55:29,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4144380.0, ans=0.1 2024-08-18 22:55:30,643 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4144380.0, ans=10.0 2024-08-18 22:55:32,906 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 21 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-18 22:55:44,111 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4144480.0, ans=0.0 2024-08-18 22:55:59,126 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 13350, loss[loss=0.1076, beats_loss=0.01251, ecapa_loss=0.0001081, whisper_loss=0.09405, over 21848.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01039, ecapa_loss=0.0001447, whisper_loss=0.09065, over 3867437.72 frames. ], batch size: 85, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:56:02,021 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-18 22:56:19,639 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.69 vs. limit=12.0 2024-08-18 22:56:24,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4144680.0, ans=0.125 2024-08-18 22:56:27,600 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4144780.0, ans=0.09899494936611666 2024-08-18 22:56:43,972 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4144880.0, ans=0.0 2024-08-18 22:57:06,200 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 18 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-18 22:57:09,988 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 13400, loss[loss=0.07414, beats_loss=0.01129, ecapa_loss=0.0001451, whisper_loss=0.0614, over 13164.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01047, ecapa_loss=0.0001443, whisper_loss=0.08997, over 3817692.27 frames. ], batch size: 54, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:57:25,917 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.309e+01 2.613e+01 2.893e+01 4.161e+01, threshold=5.227e+01, percent-clipped=0.0 2024-08-18 22:57:49,096 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4145280.0, ans=0.125 2024-08-18 22:58:05,144 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4145480.0, ans=0.2 2024-08-18 22:58:11,216 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-18 22:58:15,472 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4145480.0, ans=0.0 2024-08-18 22:58:16,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4145580.0, ans=0.05 2024-08-18 22:58:17,570 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 13450, loss[loss=0.1137, beats_loss=0.009462, ecapa_loss=0.0002074, whisper_loss=0.1022, over 21672.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01043, ecapa_loss=0.000145, whisper_loss=0.09039, over 3825037.17 frames. ], batch size: 92, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:58:25,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4145580.0, ans=0.125 2024-08-18 22:58:28,532 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-18 22:58:39,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4145680.0, ans=0.125 2024-08-18 22:59:01,612 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 20 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-18 22:59:05,154 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-18 22:59:08,319 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4145880.0, ans=10.0 2024-08-18 22:59:09,793 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 32 from Vox, 31 fro AS 2024-08-18 22:59:15,806 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4145980.0, ans=0.1 2024-08-18 22:59:24,361 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4146080.0, ans=0.2 2024-08-18 22:59:25,122 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 13500, loss[loss=0.1105, beats_loss=0.009796, ecapa_loss=0.0001393, whisper_loss=0.09936, over 20446.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01049, ecapa_loss=0.0001444, whisper_loss=0.08962, over 3831066.95 frames. ], batch size: 79, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:59:31,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4146080.0, ans=0.125 2024-08-18 22:59:42,036 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.292e+01 2.450e+01 2.682e+01 3.330e+01, threshold=4.901e+01, percent-clipped=0.0 2024-08-18 22:59:48,206 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-18 22:59:48,523 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4146180.0, ans=0.2 2024-08-18 22:59:52,773 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4146280.0, ans=0.1 2024-08-18 22:59:56,438 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 15 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-18 22:59:59,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4146280.0, ans=0.05 2024-08-18 23:00:22,806 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.18 vs. limit=15.0 2024-08-18 23:00:25,180 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.56 vs. limit=22.5 2024-08-18 23:00:27,609 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4146480.0, ans=0.2 2024-08-18 23:00:32,316 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 13550, loss[loss=0.0752, beats_loss=0.01271, ecapa_loss=0.0001199, whisper_loss=0.06129, over 16419.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01048, ecapa_loss=0.0001444, whisper_loss=0.08977, over 3834450.72 frames. ], batch size: 65, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 23:00:33,642 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-18 23:00:35,963 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 12 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-18 23:01:00,933 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-18 23:01:08,277 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4146780.0, ans=0.125 2024-08-18 23:01:09,182 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 23 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-18 23:01:10,852 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4146780.0, ans=0.2 2024-08-18 23:01:31,542 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.82 vs. limit=6.0 2024-08-18 23:01:32,970 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4146980.0, ans=0.125 2024-08-18 23:01:41,365 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 13600, loss[loss=0.09858, beats_loss=0.009128, ecapa_loss=0.0001938, whisper_loss=0.08752, over 19217.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01047, ecapa_loss=0.0001445, whisper_loss=0.08972, over 3819949.80 frames. ], batch size: 81, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 23:01:41,563 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-18 23:01:57,186 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4147180.0, ans=0.125 2024-08-18 23:01:57,879 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.820e+01 2.313e+01 2.521e+01 2.832e+01 3.898e+01, threshold=5.043e+01, percent-clipped=0.0 2024-08-18 23:02:04,535 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.29 vs. limit=22.5 2024-08-18 23:02:17,414 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.57 vs. limit=22.5 2024-08-18 23:02:43,716 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.85 vs. limit=15.0 2024-08-18 23:02:45,692 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-18 23:02:49,598 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 13650, loss[loss=0.09556, beats_loss=0.01062, ecapa_loss=0.0001686, whisper_loss=0.08325, over 21918.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01058, ecapa_loss=0.0001436, whisper_loss=0.08989, over 3821017.23 frames. ], batch size: 94, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 23:02:53,948 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4147580.0, ans=0.125 2024-08-18 23:02:57,783 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 27 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-18 23:03:03,482 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.52 vs. limit=10.0 2024-08-18 23:03:05,605 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4147680.0, ans=0.1 2024-08-18 23:03:32,617 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-18 23:03:48,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4147980.0, ans=0.0 2024-08-18 23:03:53,121 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.22 vs. limit=15.0 2024-08-18 23:03:55,462 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4148080.0, ans=0.0 2024-08-18 23:03:56,209 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 13700, loss[loss=0.1328, beats_loss=0.007865, ecapa_loss=0.000148, whisper_loss=0.1234, over 23050.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01057, ecapa_loss=0.0001442, whisper_loss=0.09052, over 3844437.40 frames. ], batch size: 90, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 23:03:56,345 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-18 23:04:01,928 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4148080.0, ans=0.2 2024-08-18 23:04:03,310 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4148080.0, ans=0.0 2024-08-18 23:04:11,952 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.317e+01 2.575e+01 2.845e+01 4.715e+01, threshold=5.149e+01, percent-clipped=0.0 2024-08-18 23:04:25,481 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 15 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-18 23:04:43,810 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 18 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-18 23:04:56,858 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 14 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-18 23:05:01,939 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 13750, loss[loss=0.1259, beats_loss=0.00822, ecapa_loss=0.0001602, whisper_loss=0.1161, over 15865.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0106, ecapa_loss=0.0001436, whisper_loss=0.0897, over 3829145.78 frames. ], batch size: 60, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:05:02,412 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4148580.0, ans=0.0 2024-08-18 23:05:03,683 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 41 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-18 23:05:22,420 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4148680.0, ans=0.125 2024-08-18 23:05:23,710 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4148680.0, ans=0.0 2024-08-18 23:05:26,420 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 25 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-18 23:05:26,643 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4148680.0, ans=0.2 2024-08-18 23:05:26,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4148680.0, ans=0.125 2024-08-18 23:05:38,430 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 27 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-18 23:05:40,273 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.42 vs. limit=10.0 2024-08-18 23:05:51,416 WARNING [optim.py:496] (2/4) Scaling gradients by 0.06066558510065079, model_norm_threshold=51.49205017089844 2024-08-18 23:05:51,577 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.22, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.619e+05, grad_sumsq=1.558e+07, orig_rms_sq=1.039e-02 2024-08-18 23:06:06,426 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4148980.0, ans=0.0 2024-08-18 23:06:08,288 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 13800, loss[loss=0.106, beats_loss=0.009813, ecapa_loss=0.0001458, whisper_loss=0.09473, over 16296.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01058, ecapa_loss=0.0001438, whisper_loss=0.09038, over 3843111.73 frames. ], batch size: 66, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:06:12,150 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4149080.0, ans=0.0 2024-08-18 23:06:19,064 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4149080.0, ans=0.04949747468305833 2024-08-18 23:06:25,232 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.320e+01 2.516e+01 2.780e+01 8.488e+02, threshold=5.033e+01, percent-clipped=1.0 2024-08-18 23:06:38,793 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4149280.0, ans=0.125 2024-08-18 23:06:42,831 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.17 vs. limit=15.0 2024-08-18 23:06:44,962 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4149280.0, ans=0.1 2024-08-18 23:06:46,529 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.31 vs. limit=22.5 2024-08-18 23:06:58,288 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4149380.0, ans=0.125 2024-08-18 23:07:00,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4149480.0, ans=0.1 2024-08-18 23:07:03,870 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.36 vs. limit=10.0 2024-08-18 23:07:14,668 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 13850, loss[loss=0.08295, beats_loss=0.01167, ecapa_loss=0.0001343, whisper_loss=0.06993, over 16993.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01051, ecapa_loss=0.0001442, whisper_loss=0.09029, over 3839957.20 frames. ], batch size: 68, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:07:22,602 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-18 23:07:28,340 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-18 23:07:37,327 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4149680.0, ans=0.0 2024-08-18 23:07:37,365 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4149680.0, ans=0.1 2024-08-18 23:07:40,152 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.57 vs. limit=15.0 2024-08-18 23:07:44,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4149780.0, ans=0.0 2024-08-18 23:07:46,302 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4149780.0, ans=0.07 2024-08-18 23:08:03,801 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.95 vs. limit=15.0 2024-08-18 23:08:11,885 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4149980.0, ans=0.1 2024-08-18 23:08:23,818 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 13900, loss[loss=0.09577, beats_loss=0.01002, ecapa_loss=0.0001727, whisper_loss=0.08401, over 20951.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01052, ecapa_loss=0.0001433, whisper_loss=0.09052, over 3868205.05 frames. ], batch size: 88, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:08:40,363 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.243e+01 2.562e+01 2.808e+01 4.472e+01, threshold=5.123e+01, percent-clipped=0.0 2024-08-18 23:08:45,754 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-18 23:08:55,395 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 22 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-18 23:09:06,730 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.42 vs. limit=15.0 2024-08-18 23:09:14,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4150380.0, ans=0.125 2024-08-18 23:09:26,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4150480.0, ans=0.125 2024-08-18 23:09:30,915 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-18 23:09:31,911 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 13950, loss[loss=0.09658, beats_loss=0.009609, ecapa_loss=0.0001527, whisper_loss=0.08545, over 17980.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01047, ecapa_loss=0.0001432, whisper_loss=0.09012, over 3852523.59 frames. ], batch size: 72, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:09:34,462 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4150580.0, ans=0.125 2024-08-18 23:09:55,373 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4150680.0, ans=0.0 2024-08-18 23:10:00,241 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-18 23:10:09,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4150780.0, ans=0.1 2024-08-18 23:10:16,088 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 18 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-18 23:10:17,914 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4150880.0, ans=0.125 2024-08-18 23:10:27,393 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 41 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-18 23:10:29,427 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.40 vs. limit=15.0 2024-08-18 23:10:38,876 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 14000, loss[loss=0.09362, beats_loss=0.01192, ecapa_loss=0.0001385, whisper_loss=0.08032, over 22936.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01043, ecapa_loss=0.0001425, whisper_loss=0.09045, over 3840513.99 frames. ], batch size: 94, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:10:47,268 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4151080.0, ans=0.125 2024-08-18 23:10:48,411 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-18 23:10:48,640 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=4151080.0, ans=0.025 2024-08-18 23:10:55,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4151180.0, ans=0.2 2024-08-18 23:10:56,368 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.301e+01 2.559e+01 2.803e+01 3.762e+01, threshold=5.117e+01, percent-clipped=0.0 2024-08-18 23:10:57,108 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4151180.0, ans=0.125 2024-08-18 23:10:58,165 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4151180.0, ans=0.0 2024-08-18 23:11:08,948 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4151280.0, ans=0.125 2024-08-18 23:11:32,607 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 39 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-18 23:11:35,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4151480.0, ans=0.1 2024-08-18 23:11:40,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4151480.0, ans=0.1 2024-08-18 23:11:40,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4151480.0, ans=0.125 2024-08-18 23:11:41,027 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-18 23:11:46,250 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-18 23:11:49,160 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 14050, loss[loss=0.1017, beats_loss=0.01035, ecapa_loss=0.0001398, whisper_loss=0.0899, over 23348.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01036, ecapa_loss=0.0001421, whisper_loss=0.09145, over 3866566.80 frames. ], batch size: 93, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:11:50,630 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.55 vs. limit=15.0 2024-08-18 23:11:56,304 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.93 vs. limit=15.0 2024-08-18 23:11:57,699 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4151580.0, ans=0.2 2024-08-18 23:12:05,727 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4151680.0, ans=0.125 2024-08-18 23:12:08,445 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 17 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-18 23:12:17,298 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 15 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-18 23:12:27,420 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.917e+00 2024-08-18 23:12:33,713 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 22 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-18 23:12:40,386 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.97 vs. limit=15.0 2024-08-18 23:12:41,996 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.39 vs. limit=15.0 2024-08-18 23:12:50,323 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4151980.0, ans=0.125 2024-08-18 23:12:59,947 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 14100, loss[loss=0.09767, beats_loss=0.008535, ecapa_loss=0.0001516, whisper_loss=0.08762, over 22721.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01031, ecapa_loss=0.0001423, whisper_loss=0.09168, over 3833754.41 frames. ], batch size: 94, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:13:03,158 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=15.0 2024-08-18 23:13:17,464 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.301e+01 2.548e+01 2.823e+01 5.332e+01, threshold=5.096e+01, percent-clipped=1.0 2024-08-18 23:13:19,956 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4152180.0, ans=0.125 2024-08-18 23:13:20,237 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.32 vs. limit=6.0 2024-08-18 23:13:48,150 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4152380.0, ans=0.125 2024-08-18 23:14:12,140 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 14150, loss[loss=0.0802, beats_loss=0.01208, ecapa_loss=0.0001285, whisper_loss=0.06684, over 21743.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01038, ecapa_loss=0.0001421, whisper_loss=0.09148, over 3832177.79 frames. ], batch size: 88, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:14:37,536 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4152680.0, ans=0.2 2024-08-18 23:14:37,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4152680.0, ans=0.125 2024-08-18 23:14:38,647 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-18 23:14:46,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4152780.0, ans=0.125 2024-08-18 23:14:52,069 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4152780.0, ans=0.125 2024-08-18 23:15:11,432 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4152980.0, ans=0.125 2024-08-18 23:15:27,355 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 14200, loss[loss=0.1031, beats_loss=0.01079, ecapa_loss=0.000139, whisper_loss=0.09093, over 21801.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01035, ecapa_loss=0.000142, whisper_loss=0.09122, over 3821018.17 frames. ], batch size: 90, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:15:36,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4153080.0, ans=0.0 2024-08-18 23:15:43,034 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 25 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-18 23:15:44,711 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.04 vs. limit=15.0 2024-08-18 23:15:45,260 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.295e+01 2.597e+01 2.935e+01 2.411e+02, threshold=5.195e+01, percent-clipped=3.0 2024-08-18 23:15:56,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4153280.0, ans=0.1 2024-08-18 23:16:01,879 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-18 23:16:06,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4153280.0, ans=0.1 2024-08-18 23:16:18,461 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4153380.0, ans=0.0 2024-08-18 23:16:29,067 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-18 23:16:34,636 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.85 vs. limit=15.0 2024-08-18 23:16:36,463 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-18 23:16:40,948 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 14250, loss[loss=0.1251, beats_loss=0.006967, ecapa_loss=0.0001532, whisper_loss=0.1166, over 22766.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01035, ecapa_loss=0.0001418, whisper_loss=0.09107, over 3852303.00 frames. ], batch size: 86, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:16:43,490 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4153580.0, ans=0.2 2024-08-18 23:16:51,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4153580.0, ans=0.2 2024-08-18 23:16:54,459 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 15 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-18 23:17:12,011 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4153780.0, ans=0.1 2024-08-18 23:17:37,847 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 17 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-18 23:17:45,185 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4153980.0, ans=0.125 2024-08-18 23:17:46,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4153980.0, ans=0.1 2024-08-18 23:17:54,770 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 14300, loss[loss=0.07626, beats_loss=0.01066, ecapa_loss=0.0001623, whisper_loss=0.06398, over 15349.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01041, ecapa_loss=0.0001412, whisper_loss=0.09067, over 3852212.74 frames. ], batch size: 64, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:18:12,370 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.726e+01 2.299e+01 2.504e+01 2.790e+01 4.018e+02, threshold=5.008e+01, percent-clipped=2.0 2024-08-18 23:18:22,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4154280.0, ans=0.2 2024-08-18 23:18:34,376 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.87 vs. limit=15.0 2024-08-18 23:18:53,096 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 18 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-18 23:18:59,621 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 20 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-18 23:19:04,719 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 14350, loss[loss=0.09574, beats_loss=0.01034, ecapa_loss=0.0001435, whisper_loss=0.08396, over 20286.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01039, ecapa_loss=0.0001412, whisper_loss=0.09049, over 3849301.07 frames. ], batch size: 81, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:19:12,603 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 36 from LS+wenet, 13 from Vox, 42 fro AS 2024-08-18 23:19:18,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4154680.0, ans=0.0 2024-08-18 23:19:26,966 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4154680.0, ans=10.0 2024-08-18 23:19:36,386 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-18 23:19:47,790 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-18 23:19:52,756 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4154880.0, ans=0.125 2024-08-18 23:19:59,934 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-18 23:20:17,344 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 14400, loss[loss=0.1008, beats_loss=0.01177, ecapa_loss=0.0001287, whisper_loss=0.0877, over 22847.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01039, ecapa_loss=0.000142, whisper_loss=0.09082, over 3840221.58 frames. ], batch size: 92, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:20:34,732 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.269e+01 2.603e+01 3.055e+01 4.750e+01, threshold=5.206e+01, percent-clipped=0.0 2024-08-18 23:20:46,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4155280.0, ans=0.0 2024-08-18 23:20:58,924 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4155280.0, ans=0.0 2024-08-18 23:21:05,300 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-18 23:21:10,984 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 15 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-18 23:21:12,675 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 22 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-18 23:21:29,096 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-18 23:21:31,727 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4155580.0, ans=0.0 2024-08-18 23:21:32,435 INFO [train_multi_KD3.py:1116] (2/4) Epoch 28, batch 14450, loss[loss=0.1143, beats_loss=0.01034, ecapa_loss=0.0001299, whisper_loss=0.1027, over 23030.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01037, ecapa_loss=0.0001436, whisper_loss=0.09065, over 3823137.20 frames. ], batch size: 91, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:21:43,623 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 23:21:43,668 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4155580.0, ans=0.1 2024-08-18 23:21:49,974 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 19 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-18 23:22:35,397 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.37 vs. limit=10.0 2024-08-18 23:22:41,338 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 18 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-18 23:22:42,995 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-18 23:23:31,032 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 0, loss[loss=0.09531, beats_loss=0.009959, ecapa_loss=0.0001301, whisper_loss=0.08405, over 13716.00 frames. ], tot_loss[loss=0.09531, beats_loss=0.009959, ecapa_loss=0.0001301, whisper_loss=0.08405, over 13716.00 frames. ], batch size: 54, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:23:31,032 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-18 23:24:08,468 INFO [train_multi_KD3.py:1149] (2/4) Epoch 29, validation on ASR_libri: loss=0.2527, beats_loss=0, ecapa_loss=0.0005265, whisper_loss=0.2475, over 922467.00 frames. 2024-08-18 23:24:23,755 INFO [train_multi_KD3.py:1149] (2/4) Epoch 29, validation on SV_voxceleb1: loss=0.004049, beats_loss=0, ecapa_loss=0.0004049, whisper_loss=0, over 939242.00 frames. 2024-08-18 23:26:08,930 INFO [train_multi_KD3.py:1149] (2/4) Epoch 29, validation on AT_audioset: loss=0.02325, beats_loss=0.02325, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 23:26:08,933 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31584MB 2024-08-18 23:26:13,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4156070.0, ans=0.0 2024-08-18 23:26:39,472 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.24 vs. limit=15.0 2024-08-18 23:26:41,483 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.401e+01 2.612e+01 3.065e+01 1.665e+02, threshold=5.224e+01, percent-clipped=1.0 2024-08-18 23:26:43,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4156170.0, ans=10.0 2024-08-18 23:27:32,888 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-18 23:27:42,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4156370.0, ans=0.2 2024-08-18 23:28:10,066 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 50, loss[loss=0.1053, beats_loss=0.009102, ecapa_loss=0.0001439, whisper_loss=0.09481, over 22951.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.009486, ecapa_loss=0.0001407, whisper_loss=0.09048, over 898239.57 frames. ], batch size: 92, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:28:16,462 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4156570.0, ans=0.5 2024-08-18 23:28:18,984 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4156570.0, ans=0.1 2024-08-18 23:28:36,235 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 29 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-18 23:28:50,480 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.50 vs. limit=15.0 2024-08-18 23:29:13,431 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4156770.0, ans=0.0 2024-08-18 23:29:13,671 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=4156770.0, ans=15.0 2024-08-18 23:29:44,049 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 20 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-18 23:30:01,594 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 100, loss[loss=0.07029, beats_loss=0.01088, ecapa_loss=0.0001484, whisper_loss=0.05793, over 18978.00 frames. ], tot_loss[loss=0.0994, beats_loss=0.00941, ecapa_loss=0.0001423, whisper_loss=0.08857, over 1530842.45 frames. ], batch size: 80, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:30:06,731 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4157070.0, ans=0.125 2024-08-18 23:30:08,966 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2024-08-18 23:30:30,204 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.602e+01 2.871e+01 3.209e+01 4.618e+01, threshold=5.741e+01, percent-clipped=0.0 2024-08-18 23:30:35,083 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4157170.0, ans=0.125 2024-08-18 23:30:47,794 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-18 23:31:07,483 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4157370.0, ans=0.125 2024-08-18 23:31:15,464 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.53 vs. limit=15.0 2024-08-18 23:31:36,399 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=4157470.0, ans=0.05 2024-08-18 23:31:42,671 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 150, loss[loss=0.1078, beats_loss=0.00883, ecapa_loss=0.0001354, whisper_loss=0.09759, over 22581.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.009371, ecapa_loss=0.0001427, whisper_loss=0.08959, over 2036420.94 frames. ], batch size: 87, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:31:46,181 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-18 23:31:49,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4157570.0, ans=0.0 2024-08-18 23:31:56,368 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4157570.0, ans=0.2 2024-08-18 23:31:59,927 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-18 23:32:03,088 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-18 23:32:23,420 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4157770.0, ans=0.0 2024-08-18 23:32:42,546 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 14 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-18 23:32:48,478 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4157970.0, ans=0.125 2024-08-18 23:32:55,592 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4157970.0, ans=0.0 2024-08-18 23:33:01,258 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 200, loss[loss=0.09944, beats_loss=0.009336, ecapa_loss=0.0001429, whisper_loss=0.08868, over 17320.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.009592, ecapa_loss=0.0001437, whisper_loss=0.08911, over 2379718.45 frames. ], batch size: 64, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:33:02,514 WARNING [optim.py:496] (2/4) Scaling gradients by 0.05759067460894585, model_norm_threshold=57.41115951538086 2024-08-18 23:33:02,678 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.48, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.794e+05, grad_sumsq=4.619e+07, orig_rms_sq=1.038e-02 2024-08-18 23:33:19,910 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.990e+01 2.401e+01 2.608e+01 2.915e+01 9.969e+02, threshold=5.216e+01, percent-clipped=2.0 2024-08-18 23:33:20,787 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.98 vs. limit=15.0 2024-08-18 23:33:23,297 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4158170.0, ans=0.0 2024-08-18 23:33:58,270 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 27 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-18 23:33:58,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4158470.0, ans=0.125 2024-08-18 23:34:11,759 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 250, loss[loss=0.1134, beats_loss=0.01104, ecapa_loss=0.0001361, whisper_loss=0.101, over 22970.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.009826, ecapa_loss=0.0001434, whisper_loss=0.08932, over 2703811.22 frames. ], batch size: 89, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:34:17,419 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-18 23:34:28,355 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4158670.0, ans=0.0 2024-08-18 23:34:47,478 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4158770.0, ans=0.125 2024-08-18 23:34:48,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4158770.0, ans=0.125 2024-08-18 23:35:00,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4158870.0, ans=0.1 2024-08-18 23:35:08,052 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.72 vs. limit=10.0 2024-08-18 23:35:08,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4158970.0, ans=0.125 2024-08-18 23:35:17,729 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 23:35:18,463 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 300, loss[loss=0.1214, beats_loss=0.008993, ecapa_loss=0.0001744, whisper_loss=0.1106, over 21006.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01002, ecapa_loss=0.0001443, whisper_loss=0.08921, over 2942906.20 frames. ], batch size: 90, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:35:28,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4159070.0, ans=0.1 2024-08-18 23:35:31,079 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 20 from LS+wenet, 18 from Vox, 17 fro AS 2024-08-18 23:35:35,797 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.849e+01 2.277e+01 2.615e+01 3.023e+01 6.948e+01, threshold=5.231e+01, percent-clipped=1.0 2024-08-18 23:36:05,200 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-18 23:36:09,511 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 22 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-18 23:36:16,330 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4159470.0, ans=0.2 2024-08-18 23:36:26,455 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 350, loss[loss=0.1144, beats_loss=0.008604, ecapa_loss=0.0001313, whisper_loss=0.1044, over 17745.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01016, ecapa_loss=0.0001423, whisper_loss=0.0889, over 3132879.98 frames. ], batch size: 68, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:36:37,012 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4159570.0, ans=0.125 2024-08-18 23:36:40,793 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 17 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-18 23:36:42,789 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.26 vs. limit=15.0 2024-08-18 23:36:50,949 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-18 23:37:17,192 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-18 23:37:28,263 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4159970.0, ans=0.0 2024-08-18 23:37:35,788 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4159970.0, ans=0.125 2024-08-18 23:37:38,408 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 400, loss[loss=0.108, beats_loss=0.00957, ecapa_loss=0.000117, whisper_loss=0.09725, over 17473.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01013, ecapa_loss=0.0001427, whisper_loss=0.08955, over 3280214.45 frames. ], batch size: 64, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:37:47,860 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4160070.0, ans=0.125 2024-08-18 23:37:51,325 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-18 23:37:55,256 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.335e+01 2.628e+01 2.920e+01 4.432e+01, threshold=5.257e+01, percent-clipped=0.0 2024-08-18 23:38:26,183 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-18 23:38:27,752 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4160370.0, ans=0.1 2024-08-18 23:38:29,242 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4160370.0, ans=0.125 2024-08-18 23:38:30,608 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4160370.0, ans=0.125 2024-08-18 23:38:46,446 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 450, loss[loss=0.1247, beats_loss=0.008965, ecapa_loss=0.0001693, whisper_loss=0.114, over 23156.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01017, ecapa_loss=0.0001433, whisper_loss=0.09001, over 3426457.07 frames. ], batch size: 90, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:38:52,995 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4160570.0, ans=0.0 2024-08-18 23:39:18,543 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.95 vs. limit=10.0 2024-08-18 23:39:32,127 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 23 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-18 23:39:36,196 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 17 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-18 23:39:38,863 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 13 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-18 23:39:55,449 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 500, loss[loss=0.1023, beats_loss=0.01121, ecapa_loss=0.0001772, whisper_loss=0.08929, over 14670.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01019, ecapa_loss=0.0001419, whisper_loss=0.08974, over 3480255.29 frames. ], batch size: 63, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:39:58,736 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 33 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-18 23:40:04,394 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4161070.0, ans=0.0 2024-08-18 23:40:11,029 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 22 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-18 23:40:13,474 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.408e+01 2.730e+01 3.089e+01 1.165e+02, threshold=5.459e+01, percent-clipped=2.0 2024-08-18 23:40:14,851 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-18 23:40:23,052 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-18 23:40:41,818 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4161370.0, ans=0.125 2024-08-18 23:40:43,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4161370.0, ans=0.1 2024-08-18 23:41:05,127 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 550, loss[loss=0.1226, beats_loss=0.009505, ecapa_loss=0.0001368, whisper_loss=0.1117, over 18894.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01028, ecapa_loss=0.000142, whisper_loss=0.08993, over 3596883.90 frames. ], batch size: 73, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:41:11,111 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 9 from Vox, 34 fro AS 2024-08-18 23:41:14,225 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4161570.0, ans=0.0 2024-08-18 23:41:17,965 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4161670.0, ans=0.125 2024-08-18 23:41:24,327 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-18 23:41:39,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4161770.0, ans=0.125 2024-08-18 23:41:43,919 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.926e+01 2024-08-18 23:42:07,541 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-18 23:42:11,601 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-18 23:42:12,632 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 600, loss[loss=0.09612, beats_loss=0.01174, ecapa_loss=0.0001268, whisper_loss=0.08311, over 21389.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0103, ecapa_loss=0.000141, whisper_loss=0.09009, over 3653931.33 frames. ], batch size: 84, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:42:25,431 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4162170.0, ans=0.1 2024-08-18 23:42:30,322 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.259e+01 2.406e+01 2.695e+01 1.044e+02, threshold=4.812e+01, percent-clipped=1.0 2024-08-18 23:42:49,464 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-18 23:42:55,822 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.65 vs. limit=15.0 2024-08-18 23:43:07,955 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 35 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-18 23:43:16,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4162470.0, ans=0.2 2024-08-18 23:43:19,963 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 650, loss[loss=0.08244, beats_loss=0.01219, ecapa_loss=0.0001373, whisper_loss=0.06887, over 18826.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01033, ecapa_loss=0.0001411, whisper_loss=0.09037, over 3704803.04 frames. ], batch size: 80, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:43:35,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4162670.0, ans=0.1 2024-08-18 23:43:44,524 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 21 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-18 23:43:46,169 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4162770.0, ans=0.125 2024-08-18 23:44:07,785 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.38 vs. limit=15.0 2024-08-18 23:44:19,937 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=4162970.0, ans=15.0 2024-08-18 23:44:26,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4163070.0, ans=0.125 2024-08-18 23:44:27,117 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 700, loss[loss=0.1092, beats_loss=0.00932, ecapa_loss=0.0002071, whisper_loss=0.09779, over 21490.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01029, ecapa_loss=0.0001422, whisper_loss=0.08977, over 3711842.41 frames. ], batch size: 92, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:44:28,424 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 18 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-18 23:44:44,846 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.232e+01 2.495e+01 2.687e+01 4.242e+01, threshold=4.990e+01, percent-clipped=0.0 2024-08-18 23:44:46,853 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4163170.0, ans=0.125 2024-08-18 23:44:58,735 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.53 vs. limit=15.0 2024-08-18 23:44:59,631 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.94 vs. limit=15.0 2024-08-18 23:45:20,393 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4163470.0, ans=0.0 2024-08-18 23:45:27,473 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.05 vs. limit=15.0 2024-08-18 23:45:34,231 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 750, loss[loss=0.07944, beats_loss=0.01289, ecapa_loss=0.000145, whisper_loss=0.0651, over 15109.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01035, ecapa_loss=0.0001423, whisper_loss=0.08985, over 3764727.06 frames. ], batch size: 64, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:45:40,877 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4163570.0, ans=0.125 2024-08-18 23:45:44,541 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4163570.0, ans=0.1 2024-08-18 23:45:47,104 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 38 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-18 23:45:48,710 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4163670.0, ans=0.125 2024-08-18 23:46:04,488 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 26 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-18 23:46:08,035 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.77 vs. limit=10.0 2024-08-18 23:46:09,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4163770.0, ans=0.04949747468305833 2024-08-18 23:46:19,855 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4163870.0, ans=0.125 2024-08-18 23:46:26,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4163870.0, ans=0.0 2024-08-18 23:46:26,374 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4163870.0, ans=0.0 2024-08-18 23:46:41,105 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-18 23:46:42,185 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 800, loss[loss=0.09725, beats_loss=0.008687, ecapa_loss=0.000179, whisper_loss=0.08678, over 20774.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01026, ecapa_loss=0.0001433, whisper_loss=0.08988, over 3791755.20 frames. ], batch size: 83, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:46:56,192 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.23 vs. limit=6.0 2024-08-18 23:46:59,126 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.180e+01 2.421e+01 2.736e+01 5.483e+01, threshold=4.843e+01, percent-clipped=1.0 2024-08-18 23:47:26,733 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 23:47:32,773 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4164370.0, ans=0.125 2024-08-18 23:47:49,240 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.94 vs. limit=10.0 2024-08-18 23:47:49,599 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 850, loss[loss=0.09979, beats_loss=0.01001, ecapa_loss=0.0001208, whisper_loss=0.08857, over 20033.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01033, ecapa_loss=0.000142, whisper_loss=0.08933, over 3804859.00 frames. ], batch size: 77, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:47:51,698 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4164570.0, ans=0.125 2024-08-18 23:47:52,747 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 21 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-18 23:48:05,045 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-18 23:48:22,099 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4164770.0, ans=0.125 2024-08-18 23:48:31,567 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4164870.0, ans=0.0 2024-08-18 23:48:35,376 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-18 23:48:47,548 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.57 vs. limit=15.0 2024-08-18 23:48:57,715 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 900, loss[loss=0.1132, beats_loss=0.01067, ecapa_loss=0.0001507, whisper_loss=0.101, over 23501.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01034, ecapa_loss=0.0001403, whisper_loss=0.08947, over 3815744.06 frames. ], batch size: 93, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:49:01,393 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4165070.0, ans=0.125 2024-08-18 23:49:11,483 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4165170.0, ans=0.125 2024-08-18 23:49:14,965 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.209e+01 2.441e+01 2.717e+01 4.171e+01, threshold=4.882e+01, percent-clipped=0.0 2024-08-18 23:49:29,093 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.360e+05 2024-08-18 23:49:35,554 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 27 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-18 23:49:44,330 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4165370.0, ans=0.125 2024-08-18 23:50:01,433 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 18 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-18 23:50:03,804 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.77 vs. limit=15.0 2024-08-18 23:50:05,282 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 950, loss[loss=0.1139, beats_loss=0.0111, ecapa_loss=0.0001216, whisper_loss=0.1016, over 23183.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01036, ecapa_loss=0.0001393, whisper_loss=0.08879, over 3800348.22 frames. ], batch size: 91, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:50:09,806 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4165570.0, ans=0.2 2024-08-18 23:50:22,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4165670.0, ans=0.125 2024-08-18 23:50:35,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4165770.0, ans=0.2 2024-08-18 23:50:45,253 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 19 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-18 23:50:53,955 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.07 vs. limit=22.5 2024-08-18 23:51:13,496 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 1000, loss[loss=0.1099, beats_loss=0.01078, ecapa_loss=0.0001273, whisper_loss=0.09789, over 20242.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01044, ecapa_loss=0.0001393, whisper_loss=0.08828, over 3804654.29 frames. ], batch size: 78, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:51:22,894 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.54 vs. limit=22.5 2024-08-18 23:51:31,121 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.375e+01 2.563e+01 2.832e+01 6.372e+01, threshold=5.125e+01, percent-clipped=2.0 2024-08-18 23:51:31,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4166170.0, ans=0.0 2024-08-18 23:51:36,929 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-18 23:51:47,851 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 27 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-18 23:51:54,747 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 27 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-18 23:52:04,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4166370.0, ans=0.125 2024-08-18 23:52:10,043 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-18 23:52:19,420 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4166470.0, ans=0.025 2024-08-18 23:52:21,789 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 1050, loss[loss=0.08887, beats_loss=0.01179, ecapa_loss=0.0001431, whisper_loss=0.07565, over 18203.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01043, ecapa_loss=0.0001391, whisper_loss=0.08865, over 3831603.52 frames. ], batch size: 74, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:52:25,095 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4166570.0, ans=0.0 2024-08-18 23:52:45,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=4166670.0, ans=22.5 2024-08-18 23:53:06,117 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 16 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-18 23:53:19,273 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4166970.0, ans=0.125 2024-08-18 23:53:29,500 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 1100, loss[loss=0.07986, beats_loss=0.01363, ecapa_loss=0.0001068, whisper_loss=0.06516, over 20119.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01046, ecapa_loss=0.0001375, whisper_loss=0.0884, over 3829546.03 frames. ], batch size: 80, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:53:31,059 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-18 23:53:36,961 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4167070.0, ans=0.125 2024-08-18 23:53:39,213 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-18 23:53:45,276 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 16 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-18 23:53:46,318 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.816e+01 2.348e+01 2.551e+01 2.998e+01 5.491e+01, threshold=5.102e+01, percent-clipped=1.0 2024-08-18 23:53:48,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4167170.0, ans=0.0 2024-08-18 23:53:50,918 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-18 23:53:54,969 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-18 23:54:28,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4167470.0, ans=0.125 2024-08-18 23:54:36,199 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 1150, loss[loss=0.1014, beats_loss=0.009713, ecapa_loss=0.0001515, whisper_loss=0.09015, over 18362.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01037, ecapa_loss=0.0001385, whisper_loss=0.08862, over 3795308.34 frames. ], batch size: 74, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:54:43,726 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4167570.0, ans=0.05 2024-08-18 23:54:53,723 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 19 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-18 23:55:09,212 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-18 23:55:32,121 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4167970.0, ans=0.1 2024-08-18 23:55:43,874 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 1200, loss[loss=0.0987, beats_loss=0.01123, ecapa_loss=0.0001382, whisper_loss=0.08609, over 21933.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01042, ecapa_loss=0.0001386, whisper_loss=0.08863, over 3806626.41 frames. ], batch size: 88, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:55:50,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4168070.0, ans=0.125 2024-08-18 23:56:00,457 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 24 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-18 23:56:01,575 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.277e+01 2.488e+01 2.834e+01 2.594e+02, threshold=4.975e+01, percent-clipped=3.0 2024-08-18 23:56:03,301 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4168170.0, ans=0.95 2024-08-18 23:56:10,138 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4168270.0, ans=0.0 2024-08-18 23:56:14,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4168270.0, ans=0.125 2024-08-18 23:56:16,534 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4168270.0, ans=0.1 2024-08-18 23:56:51,136 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 1250, loss[loss=0.1004, beats_loss=0.01017, ecapa_loss=0.0001458, whisper_loss=0.08877, over 20151.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01039, ecapa_loss=0.0001392, whisper_loss=0.08837, over 3792840.98 frames. ], batch size: 81, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:57:09,624 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 23 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-18 23:57:15,989 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 23 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-18 23:57:16,273 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4168670.0, ans=0.0 2024-08-18 23:57:58,112 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 1300, loss[loss=0.09112, beats_loss=0.01203, ecapa_loss=0.0001508, whisper_loss=0.07758, over 22237.00 frames. ], tot_loss[loss=0.1, beats_loss=0.01043, ecapa_loss=0.0001397, whisper_loss=0.08821, over 3785158.54 frames. ], batch size: 95, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:58:10,984 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4169170.0, ans=0.0 2024-08-18 23:58:11,038 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4169170.0, ans=0.1 2024-08-18 23:58:17,206 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.654e+01 2.196e+01 2.537e+01 2.815e+01 5.238e+01, threshold=5.075e+01, percent-clipped=1.0 2024-08-18 23:58:22,869 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-18 23:58:46,980 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 28 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-18 23:59:02,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4169470.0, ans=0.0 2024-08-18 23:59:05,042 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 18 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-18 23:59:06,416 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 1350, loss[loss=0.09065, beats_loss=0.009153, ecapa_loss=0.0001003, whisper_loss=0.0805, over 17418.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01044, ecapa_loss=0.0001396, whisper_loss=0.08827, over 3800044.35 frames. ], batch size: 63, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:59:09,521 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 36 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-18 23:59:18,880 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-18 23:59:21,419 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 22 from LS+wenet, 26 from Vox, 20 fro AS 2024-08-18 23:59:39,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4169770.0, ans=0.0 2024-08-18 23:59:44,074 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.72 vs. limit=10.0 2024-08-19 00:00:08,212 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=4169970.0, ans=0.95 2024-08-19 00:00:14,464 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 1400, loss[loss=0.1087, beats_loss=0.009575, ecapa_loss=0.0001255, whisper_loss=0.09786, over 17742.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01035, ecapa_loss=0.0001392, whisper_loss=0.08923, over 3810673.83 frames. ], batch size: 68, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:00:16,225 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4170070.0, ans=0.1 2024-08-19 00:00:33,077 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.158e+01 2.419e+01 2.652e+01 3.776e+01, threshold=4.839e+01, percent-clipped=0.0 2024-08-19 00:00:45,927 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4170270.0, ans=0.125 2024-08-19 00:00:49,822 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 23 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-19 00:00:52,889 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.98 vs. limit=15.0 2024-08-19 00:01:22,101 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 1450, loss[loss=0.1174, beats_loss=0.009347, ecapa_loss=0.0001566, whisper_loss=0.1065, over 19041.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01037, ecapa_loss=0.0001386, whisper_loss=0.08868, over 3845663.62 frames. ], batch size: 75, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:02:10,260 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 26 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-19 00:02:11,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4170670.0, ans=0.0 2024-08-19 00:02:19,927 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.18 vs. limit=10.0 2024-08-19 00:02:20,949 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 19 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-19 00:02:50,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4170970.0, ans=0.125 2024-08-19 00:02:52,407 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4170970.0, ans=0.125 2024-08-19 00:03:04,880 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4171070.0, ans=0.0 2024-08-19 00:03:05,593 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 1500, loss[loss=0.09739, beats_loss=0.01093, ecapa_loss=0.0001282, whisper_loss=0.08518, over 20657.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01039, ecapa_loss=0.0001388, whisper_loss=0.08846, over 3849136.92 frames. ], batch size: 81, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:03:19,374 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.37 vs. limit=15.0 2024-08-19 00:03:20,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4171170.0, ans=0.2 2024-08-19 00:03:21,519 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-19 00:03:24,769 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4171170.0, ans=0.125 2024-08-19 00:03:27,137 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.743e+01 2.239e+01 2.511e+01 2.818e+01 6.129e+01, threshold=5.023e+01, percent-clipped=1.0 2024-08-19 00:03:39,346 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 18 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-19 00:03:48,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4171270.0, ans=0.2 2024-08-19 00:03:53,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4171370.0, ans=0.125 2024-08-19 00:03:55,899 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 19 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-19 00:04:02,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4171370.0, ans=0.0 2024-08-19 00:04:21,037 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 1550, loss[loss=0.09387, beats_loss=0.01223, ecapa_loss=0.0001105, whisper_loss=0.08053, over 22743.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01039, ecapa_loss=0.000139, whisper_loss=0.08866, over 3849198.35 frames. ], batch size: 91, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:04:36,823 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 37 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-19 00:04:40,227 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4171670.0, ans=0.07 2024-08-19 00:04:46,985 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 17 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-19 00:05:06,652 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 23 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-19 00:05:07,128 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4171870.0, ans=0.125 2024-08-19 00:05:34,416 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 1600, loss[loss=0.102, beats_loss=0.008641, ecapa_loss=0.0001636, whisper_loss=0.09175, over 22863.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01041, ecapa_loss=0.0001394, whisper_loss=0.08898, over 3852517.24 frames. ], batch size: 92, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:05:47,340 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 25 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-19 00:05:56,213 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.800e+01 2.337e+01 2.619e+01 2.865e+01 4.288e+01, threshold=5.238e+01, percent-clipped=0.0 2024-08-19 00:05:56,613 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 28 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-19 00:05:59,993 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.34 vs. limit=6.0 2024-08-19 00:06:20,717 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-08-19 00:06:22,904 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4172370.0, ans=10.0 2024-08-19 00:06:27,994 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 19 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-19 00:06:33,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4172470.0, ans=0.125 2024-08-19 00:06:47,194 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 1650, loss[loss=0.09062, beats_loss=0.0109, ecapa_loss=0.0001418, whisper_loss=0.0783, over 17510.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01033, ecapa_loss=0.0001391, whisper_loss=0.08922, over 3844933.78 frames. ], batch size: 72, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:06:57,838 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-19 00:07:06,785 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4172670.0, ans=0.0 2024-08-19 00:07:17,687 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 27 from LS+wenet, 27 from Vox, 21 fro AS 2024-08-19 00:07:41,894 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-19 00:07:51,362 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.36 vs. limit=15.0 2024-08-19 00:07:59,462 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 1700, loss[loss=0.1137, beats_loss=0.01089, ecapa_loss=0.0001279, whisper_loss=0.1015, over 18777.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0103, ecapa_loss=0.0001405, whisper_loss=0.08962, over 3864588.49 frames. ], batch size: 73, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:08:04,833 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-19 00:08:16,743 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4173170.0, ans=0.0 2024-08-19 00:08:18,167 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 27 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-19 00:08:18,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4173170.0, ans=0.125 2024-08-19 00:08:22,441 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.820e+01 2.287e+01 2.515e+01 2.871e+01 4.134e+01, threshold=5.030e+01, percent-clipped=0.0 2024-08-19 00:09:05,155 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.47 vs. limit=15.0 2024-08-19 00:09:08,578 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 30 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-19 00:09:20,780 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 16 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-19 00:09:27,557 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 1750, loss[loss=0.1095, beats_loss=0.009753, ecapa_loss=0.0001327, whisper_loss=0.09839, over 21072.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01025, ecapa_loss=0.0001402, whisper_loss=0.08998, over 3852779.48 frames. ], batch size: 81, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:09:53,187 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-19 00:09:55,004 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 32 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-19 00:10:00,804 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-19 00:10:01,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4173770.0, ans=0.0 2024-08-19 00:10:15,652 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.27 vs. limit=22.5 2024-08-19 00:10:24,926 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 19 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-19 00:10:25,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4173870.0, ans=0.125 2024-08-19 00:10:30,208 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 17 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-19 00:10:31,340 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.51 vs. limit=15.0 2024-08-19 00:10:43,928 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4173970.0, ans=0.125 2024-08-19 00:10:52,448 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 1800, loss[loss=0.1006, beats_loss=0.01023, ecapa_loss=0.0001592, whisper_loss=0.08882, over 15644.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01025, ecapa_loss=0.0001397, whisper_loss=0.08955, over 3830055.71 frames. ], batch size: 63, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:10:59,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4174070.0, ans=0.125 2024-08-19 00:11:04,719 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.41 vs. limit=22.5 2024-08-19 00:11:07,087 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2024-08-19 00:11:19,705 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.235e+01 2.459e+01 2.784e+01 3.423e+01, threshold=4.917e+01, percent-clipped=0.0 2024-08-19 00:11:20,630 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.60 vs. limit=15.0 2024-08-19 00:11:54,103 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 20 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-19 00:11:54,411 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4174370.0, ans=0.2 2024-08-19 00:12:19,596 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.70 vs. limit=15.0 2024-08-19 00:12:23,490 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.408e-02 2024-08-19 00:12:26,198 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.97 vs. limit=15.0 2024-08-19 00:12:35,641 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 1850, loss[loss=0.08289, beats_loss=0.01317, ecapa_loss=0.0001177, whisper_loss=0.06854, over 19032.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01034, ecapa_loss=0.0001393, whisper_loss=0.08937, over 3816586.78 frames. ], batch size: 76, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:12:43,490 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.41 vs. limit=22.5 2024-08-19 00:13:15,456 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 12 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-19 00:13:16,933 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-19 00:13:29,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4174770.0, ans=0.2 2024-08-19 00:14:02,464 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4174970.0, ans=0.2 2024-08-19 00:14:14,756 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4174970.0, ans=0.125 2024-08-19 00:14:19,592 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 1900, loss[loss=0.09832, beats_loss=0.01256, ecapa_loss=0.0001293, whisper_loss=0.08447, over 19208.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01034, ecapa_loss=0.0001406, whisper_loss=0.08926, over 3818266.05 frames. ], batch size: 78, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:14:48,750 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.965e+01 2.320e+01 2.567e+01 2.860e+01 4.992e+01, threshold=5.134e+01, percent-clipped=1.0 2024-08-19 00:15:12,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4175270.0, ans=0.125 2024-08-19 00:15:13,515 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4175270.0, ans=0.125 2024-08-19 00:15:37,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4175370.0, ans=0.125 2024-08-19 00:15:48,913 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-19 00:15:58,872 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 1950, loss[loss=0.06358, beats_loss=0.01171, ecapa_loss=0.0001478, whisper_loss=0.05039, over 13757.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01037, ecapa_loss=0.00014, whisper_loss=0.08913, over 3811356.42 frames. ], batch size: 56, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:16:05,860 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-19 00:16:14,489 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 28 from Vox, 25 fro AS 2024-08-19 00:16:14,679 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=4175670.0, ans=0.2 2024-08-19 00:16:17,561 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 26 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-19 00:16:23,132 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 26 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-19 00:16:30,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4175770.0, ans=0.1 2024-08-19 00:16:32,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4175770.0, ans=0.07 2024-08-19 00:16:40,301 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-19 00:16:42,738 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 17 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-19 00:16:44,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4175870.0, ans=0.0 2024-08-19 00:16:48,418 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-19 00:16:49,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4175870.0, ans=0.035 2024-08-19 00:16:56,126 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 18 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-19 00:16:58,230 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=7.675e-03 2024-08-19 00:17:02,530 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4175970.0, ans=0.125 2024-08-19 00:17:04,412 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.88 vs. limit=8.0 2024-08-19 00:17:05,565 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=4175970.0, ans=15.0 2024-08-19 00:17:10,454 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 2000, loss[loss=0.1116, beats_loss=0.008126, ecapa_loss=0.0001285, whisper_loss=0.1022, over 17645.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01033, ecapa_loss=0.0001395, whisper_loss=0.08928, over 3807741.80 frames. ], batch size: 65, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:17:15,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4176070.0, ans=0.125 2024-08-19 00:17:29,892 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 15 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-19 00:17:30,900 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.704e+01 2.262e+01 2.428e+01 2.730e+01 5.039e+01, threshold=4.856e+01, percent-clipped=0.0 2024-08-19 00:17:31,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4176170.0, ans=0.2 2024-08-19 00:17:48,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4176270.0, ans=0.0 2024-08-19 00:17:54,798 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.19 vs. limit=10.0 2024-08-19 00:17:57,261 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4176370.0, ans=0.0 2024-08-19 00:18:18,662 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 00:18:22,235 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 2050, loss[loss=0.1005, beats_loss=0.0109, ecapa_loss=0.0001362, whisper_loss=0.08824, over 22742.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.0103, ecapa_loss=0.0001394, whisper_loss=0.0892, over 3842898.11 frames. ], batch size: 91, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:18:31,981 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 22 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-19 00:18:35,704 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4176670.0, ans=0.125 2024-08-19 00:19:00,437 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-19 00:19:09,807 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-19 00:19:20,235 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.71 vs. limit=15.0 2024-08-19 00:19:31,236 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4176970.0, ans=0.125 2024-08-19 00:19:32,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4177070.0, ans=0.1 2024-08-19 00:19:33,328 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 2100, loss[loss=0.109, beats_loss=0.00827, ecapa_loss=0.0001518, whisper_loss=0.09921, over 20252.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01036, ecapa_loss=0.0001385, whisper_loss=0.0889, over 3821310.97 frames. ], batch size: 83, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:19:54,052 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.287e+01 2.483e+01 2.876e+01 4.561e+01, threshold=4.966e+01, percent-clipped=0.0 2024-08-19 00:20:35,496 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.92 vs. limit=6.0 2024-08-19 00:20:42,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4177470.0, ans=0.1 2024-08-19 00:20:42,103 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4177470.0, ans=0.125 2024-08-19 00:20:42,757 WARNING [optim.py:496] (2/4) Scaling gradients by 0.019715236499905586, model_norm_threshold=49.664920806884766 2024-08-19 00:20:42,926 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.827e+05, grad_sumsq=8.827e+05, orig_rms_sq=1.000e+00 2024-08-19 00:20:44,497 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 2150, loss[loss=0.1121, beats_loss=0.009991, ecapa_loss=0.0001427, whisper_loss=0.1007, over 22697.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01037, ecapa_loss=0.0001378, whisper_loss=0.08959, over 3833257.83 frames. ], batch size: 91, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:20:49,012 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-19 00:20:50,769 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-19 00:20:51,849 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-19 00:21:03,056 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 39 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-19 00:21:03,512 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.66 vs. limit=6.0 2024-08-19 00:21:05,002 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4177670.0, ans=0.125 2024-08-19 00:21:05,048 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4177670.0, ans=0.125 2024-08-19 00:21:06,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4177670.0, ans=0.1 2024-08-19 00:21:08,841 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-19 00:21:16,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4177770.0, ans=0.0 2024-08-19 00:21:43,777 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-19 00:21:45,722 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4177970.0, ans=0.0 2024-08-19 00:21:46,693 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 21 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-19 00:21:54,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4178070.0, ans=0.0 2024-08-19 00:21:54,903 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 2200, loss[loss=0.09898, beats_loss=0.009217, ecapa_loss=0.0001516, whisper_loss=0.08825, over 17032.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01034, ecapa_loss=0.0001377, whisper_loss=0.09067, over 3844715.38 frames. ], batch size: 68, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:22:02,558 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 35 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-19 00:22:14,861 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.733e+01 2.401e+01 2.663e+01 2.914e+01 2.519e+03, threshold=5.327e+01, percent-clipped=2.0 2024-08-19 00:22:17,936 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 38 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-19 00:22:18,195 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4178170.0, ans=0.125 2024-08-19 00:22:33,889 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4178270.0, ans=0.125 2024-08-19 00:22:35,001 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4178270.0, ans=0.125 2024-08-19 00:22:45,590 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.075e-02 2024-08-19 00:23:06,179 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 2250, loss[loss=0.1133, beats_loss=0.01019, ecapa_loss=0.0001261, whisper_loss=0.1019, over 19321.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0104, ecapa_loss=0.0001387, whisper_loss=0.09122, over 3877315.34 frames. ], batch size: 73, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:23:09,477 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4178570.0, ans=0.125 2024-08-19 00:23:24,955 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-19 00:23:32,144 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-19 00:23:40,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4178770.0, ans=0.0 2024-08-19 00:23:41,622 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 22 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-19 00:24:10,223 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4178970.0, ans=0.125 2024-08-19 00:24:18,226 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 2300, loss[loss=0.08466, beats_loss=0.009383, ecapa_loss=0.0001657, whisper_loss=0.07362, over 18684.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01046, ecapa_loss=0.000139, whisper_loss=0.09054, over 3884983.72 frames. ], batch size: 76, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:24:26,921 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 22 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-19 00:24:30,109 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4179070.0, ans=0.0 2024-08-19 00:24:38,206 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.772e+01 2.287e+01 2.550e+01 2.826e+01 7.255e+01, threshold=5.099e+01, percent-clipped=2.0 2024-08-19 00:24:41,206 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4179170.0, ans=0.1 2024-08-19 00:24:42,327 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-19 00:24:44,761 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-19 00:24:47,149 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4179270.0, ans=0.1 2024-08-19 00:24:55,751 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4179270.0, ans=0.125 2024-08-19 00:25:00,073 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4179370.0, ans=0.2 2024-08-19 00:25:02,048 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.13 vs. limit=15.0 2024-08-19 00:25:10,497 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 30 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-19 00:25:16,970 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.49 vs. limit=15.0 2024-08-19 00:25:28,987 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 2350, loss[loss=0.1227, beats_loss=0.009589, ecapa_loss=0.0001547, whisper_loss=0.1116, over 22279.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01047, ecapa_loss=0.0001401, whisper_loss=0.09011, over 3893810.00 frames. ], batch size: 87, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:25:50,630 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 28 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-19 00:26:02,933 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2024-08-19 00:26:19,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4179870.0, ans=0.1 2024-08-19 00:26:33,970 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 28 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-19 00:26:39,341 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 2400, loss[loss=0.07901, beats_loss=0.01054, ecapa_loss=0.0001491, whisper_loss=0.06697, over 22621.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01041, ecapa_loss=0.0001404, whisper_loss=0.09, over 3869628.49 frames. ], batch size: 93, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:26:56,984 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 24 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-19 00:26:59,949 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.410e+01 2.624e+01 2.909e+01 9.467e+01, threshold=5.248e+01, percent-clipped=3.0 2024-08-19 00:27:02,707 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-19 00:27:10,636 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4180270.0, ans=0.2 2024-08-19 00:27:11,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4180270.0, ans=0.125 2024-08-19 00:27:15,804 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 13 from Vox, 45 fro AS 2024-08-19 00:27:47,600 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4180470.0, ans=0.125 2024-08-19 00:27:49,916 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 2450, loss[loss=0.1142, beats_loss=0.008324, ecapa_loss=0.000147, whisper_loss=0.1044, over 15879.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01038, ecapa_loss=0.0001393, whisper_loss=0.09042, over 3880465.20 frames. ], batch size: 65, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:27:51,628 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 32 from Vox, 31 fro AS 2024-08-19 00:27:58,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4180570.0, ans=0.125 2024-08-19 00:28:08,519 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 30 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-19 00:28:08,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4180670.0, ans=0.125 2024-08-19 00:28:10,243 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-19 00:28:12,945 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 21 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-19 00:28:14,177 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 35 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-19 00:28:19,241 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4180770.0, ans=0.125 2024-08-19 00:28:33,174 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4180870.0, ans=0.125 2024-08-19 00:28:58,745 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 00:28:59,541 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 2500, loss[loss=0.1099, beats_loss=0.01019, ecapa_loss=0.0001344, whisper_loss=0.09832, over 18238.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01031, ecapa_loss=0.0001411, whisper_loss=0.09077, over 3900815.26 frames. ], batch size: 72, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:29:06,728 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4181070.0, ans=0.125 2024-08-19 00:29:14,341 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 26 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-19 00:29:18,098 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.844e+01 2.325e+01 2.522e+01 2.873e+01 3.781e+01, threshold=5.044e+01, percent-clipped=0.0 2024-08-19 00:29:35,319 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4181270.0, ans=0.0 2024-08-19 00:29:50,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4181370.0, ans=0.04949747468305833 2024-08-19 00:29:53,429 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 24 from LS+wenet, 25 from Vox, 47 fro AS 2024-08-19 00:29:53,855 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.21 vs. limit=15.0 2024-08-19 00:29:58,221 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 27 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-19 00:30:03,602 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-19 00:30:07,472 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 2550, loss[loss=0.1028, beats_loss=0.01077, ecapa_loss=0.0001595, whisper_loss=0.09045, over 21441.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01031, ecapa_loss=0.000141, whisper_loss=0.09105, over 3902467.93 frames. ], batch size: 90, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:30:14,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4181570.0, ans=0.125 2024-08-19 00:30:14,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4181570.0, ans=0.1 2024-08-19 00:30:14,541 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4181570.0, ans=0.125 2024-08-19 00:30:17,034 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 19 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-19 00:30:23,879 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4181670.0, ans=0.0 2024-08-19 00:30:36,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4181770.0, ans=0.125 2024-08-19 00:30:37,798 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.62 vs. limit=15.0 2024-08-19 00:30:51,911 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 22 from LS+wenet, 12 from Vox, 20 fro AS 2024-08-19 00:31:00,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4181970.0, ans=0.2 2024-08-19 00:31:05,395 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 18 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-19 00:31:14,233 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 2600, loss[loss=0.1005, beats_loss=0.01109, ecapa_loss=9.15e-05, whisper_loss=0.08853, over 23617.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01043, ecapa_loss=0.0001416, whisper_loss=0.09013, over 3889643.73 frames. ], batch size: 90, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:31:14,657 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 24 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-19 00:31:32,364 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.892e+01 2.363e+01 2.681e+01 3.007e+01 2.480e+02, threshold=5.362e+01, percent-clipped=3.0 2024-08-19 00:31:37,963 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 34 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-19 00:31:43,087 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4182270.0, ans=0.125 2024-08-19 00:31:50,261 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 35 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-19 00:32:01,034 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.84 vs. limit=15.0 2024-08-19 00:32:01,756 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 21 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-19 00:32:18,423 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 2650, loss[loss=0.1087, beats_loss=0.0114, ecapa_loss=0.0001275, whisper_loss=0.09605, over 22415.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01041, ecapa_loss=0.0001417, whisper_loss=0.09014, over 3877549.24 frames. ], batch size: 88, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:32:30,306 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4182670.0, ans=0.125 2024-08-19 00:32:40,360 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=4182670.0, ans=0.02 2024-08-19 00:32:41,286 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 13 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-19 00:32:45,087 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 21 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-19 00:32:45,637 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.23 vs. limit=12.0 2024-08-19 00:32:54,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4182770.0, ans=0.0 2024-08-19 00:32:56,626 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 18 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-19 00:33:03,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4182870.0, ans=0.1 2024-08-19 00:33:04,227 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4182870.0, ans=0.125 2024-08-19 00:33:05,219 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-19 00:33:14,262 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 24 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-19 00:33:19,838 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4182970.0, ans=0.125 2024-08-19 00:33:21,988 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 2700, loss[loss=0.07515, beats_loss=0.01169, ecapa_loss=0.0001237, whisper_loss=0.06222, over 19523.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01037, ecapa_loss=0.0001418, whisper_loss=0.09, over 3854965.07 frames. ], batch size: 78, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:33:27,179 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 23 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-19 00:33:33,797 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 30 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-19 00:33:35,442 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.39 vs. limit=22.5 2024-08-19 00:33:39,787 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.289e+01 2.477e+01 2.694e+01 4.905e+01, threshold=4.954e+01, percent-clipped=0.0 2024-08-19 00:33:42,945 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 00:33:52,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4183270.0, ans=0.1 2024-08-19 00:33:53,083 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4183270.0, ans=0.0 2024-08-19 00:33:57,154 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.61 vs. limit=10.0 2024-08-19 00:34:10,478 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 21 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-19 00:34:13,491 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4183470.0, ans=0.1 2024-08-19 00:34:20,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4183470.0, ans=0.125 2024-08-19 00:34:24,780 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-19 00:34:25,790 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 2750, loss[loss=0.1124, beats_loss=0.00917, ecapa_loss=0.0001548, whisper_loss=0.1017, over 23724.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01038, ecapa_loss=0.0001411, whisper_loss=0.09014, over 3887516.80 frames. ], batch size: 93, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:34:29,883 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-19 00:34:42,446 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 25 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-19 00:34:43,151 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.18 vs. limit=15.0 2024-08-19 00:34:54,976 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-19 00:35:10,593 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 24 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-19 00:35:12,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4183870.0, ans=0.2 2024-08-19 00:35:16,804 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 11 from LS+wenet, 13 from Vox, 41 fro AS 2024-08-19 00:35:18,181 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 28 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-19 00:35:24,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4183970.0, ans=0.0 2024-08-19 00:35:29,675 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 2800, loss[loss=0.08998, beats_loss=0.01128, ecapa_loss=0.0001241, whisper_loss=0.07746, over 15748.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01038, ecapa_loss=0.0001404, whisper_loss=0.09071, over 3884851.70 frames. ], batch size: 61, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:35:47,524 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.992e+01 2.309e+01 2.538e+01 2.850e+01 3.934e+01, threshold=5.076e+01, percent-clipped=0.0 2024-08-19 00:35:50,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4184170.0, ans=0.1 2024-08-19 00:36:11,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4184370.0, ans=0.0 2024-08-19 00:36:17,559 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4184370.0, ans=0.2 2024-08-19 00:36:33,823 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 2850, loss[loss=0.09744, beats_loss=0.01074, ecapa_loss=0.0001383, whisper_loss=0.08531, over 22413.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01047, ecapa_loss=0.0001397, whisper_loss=0.09034, over 3880808.46 frames. ], batch size: 94, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:36:34,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4184570.0, ans=0.0 2024-08-19 00:36:36,054 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.74 vs. limit=5.0 2024-08-19 00:36:37,704 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 26 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-19 00:36:44,576 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.61 vs. limit=22.5 2024-08-19 00:37:03,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4184770.0, ans=0.125 2024-08-19 00:37:04,902 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 25 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-19 00:37:10,734 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.69 vs. limit=15.0 2024-08-19 00:37:13,879 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 25 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-19 00:37:30,516 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 24 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-19 00:37:32,982 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 21 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-19 00:37:37,744 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 2900, loss[loss=0.08597, beats_loss=0.0108, ecapa_loss=0.0001869, whisper_loss=0.0733, over 15015.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01048, ecapa_loss=0.0001414, whisper_loss=0.09006, over 3895791.04 frames. ], batch size: 64, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:37:38,442 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=17.38 vs. limit=15.0 2024-08-19 00:37:44,766 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4185070.0, ans=0.0 2024-08-19 00:37:48,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4185070.0, ans=0.1 2024-08-19 00:37:54,274 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-19 00:37:56,753 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.844e+01 2.390e+01 2.645e+01 3.019e+01 5.767e+01, threshold=5.291e+01, percent-clipped=1.0 2024-08-19 00:37:58,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4185170.0, ans=0.2 2024-08-19 00:38:16,586 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=4185370.0, ans=15.0 2024-08-19 00:38:19,792 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-19 00:38:26,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4185370.0, ans=0.125 2024-08-19 00:38:33,988 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4185470.0, ans=0.125 2024-08-19 00:38:41,210 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 2950, loss[loss=0.102, beats_loss=0.009339, ecapa_loss=0.0001833, whisper_loss=0.09082, over 21921.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01045, ecapa_loss=0.0001429, whisper_loss=0.09057, over 3888584.81 frames. ], batch size: 90, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:38:51,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4185570.0, ans=0.125 2024-08-19 00:39:00,807 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4185670.0, ans=0.125 2024-08-19 00:39:08,051 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 24 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-19 00:39:18,090 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4185870.0, ans=0.125 2024-08-19 00:39:24,416 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4185870.0, ans=0.1 2024-08-19 00:39:32,951 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-19 00:39:35,538 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-19 00:39:42,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4185970.0, ans=0.1 2024-08-19 00:39:43,610 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4186070.0, ans=0.1 2024-08-19 00:39:44,453 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 3000, loss[loss=0.09775, beats_loss=0.009419, ecapa_loss=0.0001276, whisper_loss=0.08706, over 14760.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0105, ecapa_loss=0.0001429, whisper_loss=0.09052, over 3898603.91 frames. ], batch size: 55, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:39:44,453 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-19 00:40:22,146 INFO [train_multi_KD3.py:1149] (2/4) Epoch 29, validation on ASR_libri: loss=0.2528, beats_loss=0, ecapa_loss=0.0005176, whisper_loss=0.2476, over 922467.00 frames. 2024-08-19 00:40:37,522 INFO [train_multi_KD3.py:1149] (2/4) Epoch 29, validation on SV_voxceleb1: loss=0.004065, beats_loss=0, ecapa_loss=0.0004065, whisper_loss=0, over 939242.00 frames. 2024-08-19 00:42:25,783 INFO [train_multi_KD3.py:1149] (2/4) Epoch 29, validation on AT_audioset: loss=0.02301, beats_loss=0.02301, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 00:42:25,786 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31584MB 2024-08-19 00:42:30,718 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 18 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-19 00:42:40,144 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4186170.0, ans=0.0 2024-08-19 00:42:44,914 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.286e+01 2.543e+01 2.791e+01 3.821e+01, threshold=5.086e+01, percent-clipped=0.0 2024-08-19 00:42:57,907 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-19 00:43:07,242 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4186370.0, ans=0.2 2024-08-19 00:43:08,167 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 36 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-19 00:43:09,371 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-19 00:43:18,410 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4186470.0, ans=0.1 2024-08-19 00:43:27,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4186470.0, ans=0.0 2024-08-19 00:43:29,449 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 3050, loss[loss=0.1184, beats_loss=0.009207, ecapa_loss=0.0001513, whisper_loss=0.1077, over 23267.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01043, ecapa_loss=0.0001427, whisper_loss=0.09154, over 3930622.96 frames. ], batch size: 91, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:43:43,518 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-19 00:43:47,360 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-19 00:43:52,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4186670.0, ans=0.125 2024-08-19 00:43:59,957 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 9 from Vox, 30 fro AS 2024-08-19 00:44:01,243 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 17 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-19 00:44:25,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4186970.0, ans=0.2 2024-08-19 00:44:32,912 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 3100, loss[loss=0.1133, beats_loss=0.008817, ecapa_loss=9.506e-05, whisper_loss=0.1035, over 15942.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0105, ecapa_loss=0.0001436, whisper_loss=0.09043, over 3927783.87 frames. ], batch size: 55, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:44:35,645 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-19 00:44:41,963 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 23 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-19 00:44:52,005 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.319e+01 2.529e+01 2.804e+01 4.634e+01, threshold=5.057e+01, percent-clipped=0.0 2024-08-19 00:44:54,607 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-19 00:45:11,733 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-19 00:45:18,311 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4187370.0, ans=0.125 2024-08-19 00:45:36,806 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 3150, loss[loss=0.1048, beats_loss=0.01038, ecapa_loss=0.0001392, whisper_loss=0.09301, over 20052.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01055, ecapa_loss=0.0001429, whisper_loss=0.09063, over 3926509.32 frames. ], batch size: 80, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:45:39,113 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.15 vs. limit=10.0 2024-08-19 00:45:49,083 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4187670.0, ans=0.0 2024-08-19 00:46:05,204 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 28 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-19 00:46:14,310 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4187870.0, ans=0.125 2024-08-19 00:46:26,764 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 29 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-19 00:46:35,251 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4187970.0, ans=0.2 2024-08-19 00:46:38,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4187970.0, ans=0.125 2024-08-19 00:46:39,962 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4188070.0, ans=0.2 2024-08-19 00:46:40,750 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 3200, loss[loss=0.09802, beats_loss=0.01091, ecapa_loss=0.0002032, whisper_loss=0.08507, over 16844.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01053, ecapa_loss=0.0001429, whisper_loss=0.09106, over 3930814.74 frames. ], batch size: 73, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:46:41,136 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4188070.0, ans=0.0 2024-08-19 00:46:42,250 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4188070.0, ans=0.125 2024-08-19 00:46:42,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4188070.0, ans=0.125 2024-08-19 00:46:49,346 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.17 vs. limit=15.0 2024-08-19 00:46:59,733 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.318e+01 2.522e+01 2.852e+01 4.136e+01, threshold=5.044e+01, percent-clipped=0.0 2024-08-19 00:47:32,085 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.46 vs. limit=10.0 2024-08-19 00:47:38,072 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4188470.0, ans=0.0 2024-08-19 00:47:40,227 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 17 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-19 00:47:40,793 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.49 vs. limit=22.5 2024-08-19 00:47:43,689 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 3250, loss[loss=0.1043, beats_loss=0.01267, ecapa_loss=0.0001391, whisper_loss=0.09025, over 21183.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01057, ecapa_loss=0.0001423, whisper_loss=0.09114, over 3910048.59 frames. ], batch size: 89, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:47:46,406 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 29 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-19 00:48:03,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4188670.0, ans=0.2 2024-08-19 00:48:20,368 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.62 vs. limit=15.0 2024-08-19 00:48:24,899 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 15 from Vox, 51 fro AS 2024-08-19 00:48:25,247 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4188870.0, ans=0.125 2024-08-19 00:48:33,421 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-19 00:48:45,695 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.24 vs. limit=22.5 2024-08-19 00:48:47,429 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 3300, loss[loss=0.09089, beats_loss=0.01131, ecapa_loss=0.0001488, whisper_loss=0.07809, over 17451.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01058, ecapa_loss=0.0001416, whisper_loss=0.09098, over 3918991.99 frames. ], batch size: 67, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:48:50,449 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4189070.0, ans=0.0 2024-08-19 00:48:50,520 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4189070.0, ans=0.125 2024-08-19 00:49:00,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=4189170.0, ans=0.5 2024-08-19 00:49:02,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4189170.0, ans=0.125 2024-08-19 00:49:06,245 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.329e+01 2.531e+01 2.845e+01 1.091e+02, threshold=5.061e+01, percent-clipped=2.0 2024-08-19 00:49:12,593 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 22 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-19 00:49:20,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4189270.0, ans=0.125 2024-08-19 00:49:34,294 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 24 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-19 00:49:34,737 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.01 vs. limit=15.0 2024-08-19 00:49:41,976 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 36 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-19 00:49:47,598 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.06 vs. limit=15.0 2024-08-19 00:49:50,411 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 3350, loss[loss=0.1183, beats_loss=0.0104, ecapa_loss=0.0001199, whisper_loss=0.1067, over 23398.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01045, ecapa_loss=0.0001431, whisper_loss=0.09152, over 3917274.14 frames. ], batch size: 89, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:49:51,858 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-19 00:49:53,577 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.49 vs. limit=15.0 2024-08-19 00:49:54,419 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 29 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-19 00:50:15,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4189770.0, ans=0.125 2024-08-19 00:50:20,784 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4189770.0, ans=0.0 2024-08-19 00:50:24,681 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4189770.0, ans=0.125 2024-08-19 00:50:25,614 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-19 00:50:27,184 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4189770.0, ans=0.125 2024-08-19 00:50:35,052 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.59 vs. limit=15.0 2024-08-19 00:50:36,423 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.64 vs. limit=10.0 2024-08-19 00:50:43,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4189970.0, ans=0.125 2024-08-19 00:50:47,621 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4189970.0, ans=0.125 2024-08-19 00:50:54,747 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 3400, loss[loss=0.1237, beats_loss=0.008251, ecapa_loss=0.0001411, whisper_loss=0.114, over 22922.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01045, ecapa_loss=0.0001432, whisper_loss=0.09121, over 3903974.95 frames. ], batch size: 92, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:50:57,177 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.67 vs. limit=15.0 2024-08-19 00:51:01,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4190070.0, ans=0.125 2024-08-19 00:51:02,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4190070.0, ans=0.1 2024-08-19 00:51:06,622 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4190170.0, ans=0.0 2024-08-19 00:51:13,514 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.286e+01 2.558e+01 2.999e+01 2.108e+02, threshold=5.116e+01, percent-clipped=4.0 2024-08-19 00:51:25,669 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 22 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-19 00:51:25,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4190270.0, ans=0.125 2024-08-19 00:51:27,173 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4190270.0, ans=0.07 2024-08-19 00:51:29,979 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.88 vs. limit=15.0 2024-08-19 00:51:35,139 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.83 vs. limit=15.0 2024-08-19 00:51:36,149 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 12 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-19 00:51:38,082 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.47 vs. limit=15.0 2024-08-19 00:51:51,591 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 22 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-19 00:51:59,237 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 3450, loss[loss=0.08454, beats_loss=0.01388, ecapa_loss=0.0001119, whisper_loss=0.06954, over 16838.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01061, ecapa_loss=0.0001429, whisper_loss=0.08977, over 3898839.91 frames. ], batch size: 65, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:52:12,279 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4190670.0, ans=0.2 2024-08-19 00:52:16,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4190670.0, ans=0.125 2024-08-19 00:52:23,747 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4190770.0, ans=0.125 2024-08-19 00:52:27,512 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4190770.0, ans=0.05 2024-08-19 00:52:31,634 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.45 vs. limit=15.0 2024-08-19 00:52:40,406 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.70 vs. limit=15.0 2024-08-19 00:52:42,374 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 23 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-19 00:52:50,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4190970.0, ans=0.0 2024-08-19 00:53:03,027 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 3500, loss[loss=0.1046, beats_loss=0.01028, ecapa_loss=0.0001619, whisper_loss=0.09265, over 18397.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0105, ecapa_loss=0.0001429, whisper_loss=0.09086, over 3916180.38 frames. ], batch size: 74, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:53:11,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4191070.0, ans=0.0 2024-08-19 00:53:22,264 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.660e+01 2.230e+01 2.489e+01 2.768e+01 5.626e+01, threshold=4.978e+01, percent-clipped=1.0 2024-08-19 00:53:26,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4191170.0, ans=0.2 2024-08-19 00:53:29,201 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4191270.0, ans=0.1 2024-08-19 00:53:43,676 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.97 vs. limit=15.0 2024-08-19 00:53:49,250 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4191370.0, ans=0.125 2024-08-19 00:54:06,693 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 3550, loss[loss=0.07614, beats_loss=0.01132, ecapa_loss=0.000116, whisper_loss=0.06366, over 13668.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01051, ecapa_loss=0.0001421, whisper_loss=0.09052, over 3903093.93 frames. ], batch size: 55, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:54:26,506 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4191670.0, ans=0.1 2024-08-19 00:55:10,337 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 3600, loss[loss=0.09814, beats_loss=0.008226, ecapa_loss=0.0001563, whisper_loss=0.08835, over 13773.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01042, ecapa_loss=0.0001431, whisper_loss=0.0904, over 3888748.96 frames. ], batch size: 54, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:55:17,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4192070.0, ans=0.125 2024-08-19 00:55:27,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4192170.0, ans=0.0 2024-08-19 00:55:29,599 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.764e+01 2.283e+01 2.473e+01 2.802e+01 1.020e+02, threshold=4.947e+01, percent-clipped=3.0 2024-08-19 00:55:44,947 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-19 00:56:05,410 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.78 vs. limit=22.5 2024-08-19 00:56:07,528 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.78 vs. limit=15.0 2024-08-19 00:56:14,945 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 3650, loss[loss=0.1279, beats_loss=0.008038, ecapa_loss=0.0001469, whisper_loss=0.1184, over 23444.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01046, ecapa_loss=0.0001436, whisper_loss=0.09026, over 3899397.09 frames. ], batch size: 91, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:56:20,155 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4192570.0, ans=0.125 2024-08-19 00:56:21,336 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 17 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-19 00:56:24,394 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.46 vs. limit=15.0 2024-08-19 00:56:38,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4192670.0, ans=0.125 2024-08-19 00:56:44,382 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-19 00:56:46,827 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 20 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-19 00:56:48,321 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4192770.0, ans=0.125 2024-08-19 00:56:48,746 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=4192770.0, ans=15.0 2024-08-19 00:56:53,967 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.79 vs. limit=12.0 2024-08-19 00:57:18,974 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 3700, loss[loss=0.09069, beats_loss=0.01264, ecapa_loss=0.0001645, whisper_loss=0.0764, over 21680.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01046, ecapa_loss=0.000144, whisper_loss=0.0899, over 3863308.84 frames. ], batch size: 93, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:57:23,469 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.14 vs. limit=12.0 2024-08-19 00:57:32,325 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4193170.0, ans=0.035 2024-08-19 00:57:33,631 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4193170.0, ans=0.1 2024-08-19 00:57:36,010 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4193170.0, ans=0.125 2024-08-19 00:57:38,287 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.771e+01 2.353e+01 2.591e+01 2.874e+01 4.771e+02, threshold=5.181e+01, percent-clipped=2.0 2024-08-19 00:57:39,832 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 22 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-19 00:57:44,885 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4193270.0, ans=0.125 2024-08-19 00:57:49,588 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 31 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-19 00:58:00,455 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 00:58:01,685 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4193370.0, ans=0.125 2024-08-19 00:58:10,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4193470.0, ans=0.1 2024-08-19 00:58:10,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4193470.0, ans=0.0 2024-08-19 00:58:11,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4193470.0, ans=0.125 2024-08-19 00:58:18,594 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.88 vs. limit=15.0 2024-08-19 00:58:22,871 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 3750, loss[loss=0.1105, beats_loss=0.007176, ecapa_loss=0.000163, whisper_loss=0.1017, over 20517.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01046, ecapa_loss=0.0001432, whisper_loss=0.08991, over 3831491.78 frames. ], batch size: 81, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:58:24,236 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 17 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-19 00:58:31,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4193570.0, ans=0.125 2024-08-19 00:58:33,210 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-19 00:58:35,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=4193670.0, ans=10.0 2024-08-19 00:58:58,774 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4193770.0, ans=0.0 2024-08-19 00:59:04,055 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4193870.0, ans=0.1 2024-08-19 00:59:05,359 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4193870.0, ans=0.1 2024-08-19 00:59:11,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4193870.0, ans=0.0 2024-08-19 00:59:19,710 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4193970.0, ans=0.0 2024-08-19 00:59:28,407 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 3800, loss[loss=0.09517, beats_loss=0.01017, ecapa_loss=0.0001671, whisper_loss=0.08333, over 19892.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01052, ecapa_loss=0.0001434, whisper_loss=0.08949, over 3812818.43 frames. ], batch size: 83, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:59:39,193 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.03 vs. limit=15.0 2024-08-19 00:59:41,092 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 24 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-19 00:59:41,355 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4194170.0, ans=0.125 2024-08-19 00:59:48,902 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.244e+01 2.516e+01 2.813e+01 3.693e+01, threshold=5.032e+01, percent-clipped=0.0 2024-08-19 00:59:50,550 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-19 01:00:11,653 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 20 from LS+wenet, 19 from Vox, 51 fro AS 2024-08-19 01:00:12,116 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4194370.0, ans=0.125 2024-08-19 01:00:12,950 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 32 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-19 01:00:27,971 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-19 01:00:31,153 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4194470.0, ans=0.2 2024-08-19 01:00:31,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4194470.0, ans=0.1 2024-08-19 01:00:35,736 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 3850, loss[loss=0.1139, beats_loss=0.009025, ecapa_loss=0.0001636, whisper_loss=0.1033, over 21072.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01052, ecapa_loss=0.0001431, whisper_loss=0.09064, over 3826283.89 frames. ], batch size: 87, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:00:41,636 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4194570.0, ans=0.125 2024-08-19 01:00:41,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4194570.0, ans=0.0 2024-08-19 01:00:47,434 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.62 vs. limit=15.0 2024-08-19 01:00:48,531 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4194670.0, ans=0.125 2024-08-19 01:00:52,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4194670.0, ans=0.0 2024-08-19 01:01:21,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4194870.0, ans=0.125 2024-08-19 01:01:23,675 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4194870.0, ans=0.0 2024-08-19 01:01:24,665 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 25 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-19 01:01:37,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4194970.0, ans=0.125 2024-08-19 01:01:38,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4194970.0, ans=0.0 2024-08-19 01:01:43,816 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 3900, loss[loss=0.1042, beats_loss=0.01096, ecapa_loss=0.0001373, whisper_loss=0.09185, over 19053.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01052, ecapa_loss=0.0001428, whisper_loss=0.09134, over 3853131.46 frames. ], batch size: 72, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:01:56,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4195170.0, ans=0.1 2024-08-19 01:01:58,196 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 11 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-19 01:02:03,757 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.347e+01 2.563e+01 2.946e+01 1.381e+02, threshold=5.126e+01, percent-clipped=1.0 2024-08-19 01:02:36,699 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=4195470.0, ans=0.95 2024-08-19 01:02:45,841 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 25 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-19 01:02:50,759 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 3950, loss[loss=0.1107, beats_loss=0.01157, ecapa_loss=0.0001354, whisper_loss=0.09779, over 22163.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01051, ecapa_loss=0.000144, whisper_loss=0.09084, over 3842931.23 frames. ], batch size: 91, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:02:55,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4195570.0, ans=0.0 2024-08-19 01:03:04,261 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.82 vs. limit=15.0 2024-08-19 01:03:04,958 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 26 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-19 01:03:08,135 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.22 vs. limit=15.0 2024-08-19 01:03:20,578 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.24 vs. limit=15.0 2024-08-19 01:03:29,042 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 20 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-19 01:03:31,705 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.31 vs. limit=12.0 2024-08-19 01:03:38,049 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 22 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-19 01:03:59,391 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 4000, loss[loss=0.1095, beats_loss=0.00922, ecapa_loss=0.0001629, whisper_loss=0.09867, over 17802.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01045, ecapa_loss=0.0001435, whisper_loss=0.09066, over 3841935.82 frames. ], batch size: 72, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:04:12,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4196170.0, ans=0.1 2024-08-19 01:04:18,738 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.259e+01 2.561e+01 2.838e+01 1.741e+02, threshold=5.122e+01, percent-clipped=1.0 2024-08-19 01:04:46,161 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.55 vs. limit=12.0 2024-08-19 01:04:47,321 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4196370.0, ans=0.125 2024-08-19 01:05:04,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4196470.0, ans=0.125 2024-08-19 01:05:05,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4196570.0, ans=0.125 2024-08-19 01:05:06,392 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 4050, loss[loss=0.1094, beats_loss=0.01085, ecapa_loss=0.0001315, whisper_loss=0.09728, over 22015.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01047, ecapa_loss=0.000144, whisper_loss=0.09068, over 3862515.59 frames. ], batch size: 89, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:05:22,518 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.90 vs. limit=15.0 2024-08-19 01:05:37,590 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4196770.0, ans=0.125 2024-08-19 01:05:47,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4196870.0, ans=0.1 2024-08-19 01:05:48,112 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 30 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-19 01:05:49,819 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4196870.0, ans=0.125 2024-08-19 01:05:52,244 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 15 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-19 01:06:03,478 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-19 01:06:04,997 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4196970.0, ans=0.125 2024-08-19 01:06:14,102 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.25 vs. limit=10.0 2024-08-19 01:06:14,411 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 4100, loss[loss=0.07754, beats_loss=0.01218, ecapa_loss=0.0001746, whisper_loss=0.06361, over 20425.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01044, ecapa_loss=0.0001445, whisper_loss=0.09068, over 3849019.73 frames. ], batch size: 88, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:06:15,203 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4197070.0, ans=0.1 2024-08-19 01:06:18,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4197070.0, ans=0.125 2024-08-19 01:06:24,719 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4197070.0, ans=0.125 2024-08-19 01:06:32,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4197170.0, ans=0.125 2024-08-19 01:06:34,694 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.750e+01 2.255e+01 2.514e+01 2.841e+01 8.921e+01, threshold=5.028e+01, percent-clipped=1.0 2024-08-19 01:06:36,276 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-19 01:06:50,101 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 27 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-19 01:06:59,866 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 8 from Vox, 33 fro AS 2024-08-19 01:07:03,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4197370.0, ans=0.1 2024-08-19 01:07:16,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4197470.0, ans=0.2 2024-08-19 01:07:17,740 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4197470.0, ans=0.1 2024-08-19 01:07:19,450 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4197470.0, ans=0.0 2024-08-19 01:07:24,215 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 4150, loss[loss=0.1069, beats_loss=0.01021, ecapa_loss=0.0001325, whisper_loss=0.0954, over 19811.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01054, ecapa_loss=0.0001435, whisper_loss=0.09086, over 3873627.10 frames. ], batch size: 78, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:07:25,909 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4197570.0, ans=0.125 2024-08-19 01:07:33,317 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4197570.0, ans=0.125 2024-08-19 01:07:52,795 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-19 01:08:03,711 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 13 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-19 01:08:12,064 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4197870.0, ans=0.125 2024-08-19 01:08:12,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4197870.0, ans=0.2 2024-08-19 01:08:26,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4197970.0, ans=0.1 2024-08-19 01:08:32,646 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 4200, loss[loss=0.09376, beats_loss=0.01175, ecapa_loss=0.0001394, whisper_loss=0.08062, over 19208.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01059, ecapa_loss=0.0001433, whisper_loss=0.09036, over 3870802.30 frames. ], batch size: 78, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:08:48,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4198170.0, ans=0.0 2024-08-19 01:08:52,133 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.638e+01 2.233e+01 2.469e+01 2.829e+01 1.799e+02, threshold=4.938e+01, percent-clipped=1.0 2024-08-19 01:08:54,260 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.72 vs. limit=15.0 2024-08-19 01:09:01,333 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 16 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-19 01:09:13,536 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4198370.0, ans=0.2 2024-08-19 01:09:17,162 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 22 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-19 01:09:20,060 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4198370.0, ans=0.0 2024-08-19 01:09:22,286 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 23 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-19 01:09:27,830 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 16 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-19 01:09:29,201 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=7.751e-03 2024-08-19 01:09:31,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4198470.0, ans=0.1 2024-08-19 01:09:38,390 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 4250, loss[loss=0.09285, beats_loss=0.009307, ecapa_loss=0.0001783, whisper_loss=0.08176, over 16669.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01063, ecapa_loss=0.0001438, whisper_loss=0.09034, over 3863216.21 frames. ], batch size: 71, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:09:53,430 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.63 vs. limit=15.0 2024-08-19 01:09:56,360 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-19 01:09:57,830 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-19 01:10:28,906 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4198870.0, ans=0.125 2024-08-19 01:10:32,660 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 11 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-19 01:10:43,760 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 4300, loss[loss=0.09456, beats_loss=0.008965, ecapa_loss=0.0001555, whisper_loss=0.08404, over 18562.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01056, ecapa_loss=0.0001426, whisper_loss=0.09018, over 3849659.74 frames. ], batch size: 74, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:11:03,594 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.233e+01 2.491e+01 2.683e+01 4.196e+01, threshold=4.981e+01, percent-clipped=0.0 2024-08-19 01:11:17,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4199270.0, ans=0.2 2024-08-19 01:11:19,387 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-19 01:11:23,166 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 20 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-19 01:11:35,076 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 24 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-19 01:11:37,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4199470.0, ans=0.0 2024-08-19 01:11:41,767 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4199470.0, ans=0.0 2024-08-19 01:11:45,359 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 01:11:47,018 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4199470.0, ans=0.125 2024-08-19 01:11:48,213 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4199570.0, ans=0.09899494936611666 2024-08-19 01:11:48,356 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.49 vs. limit=12.0 2024-08-19 01:11:48,960 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 4350, loss[loss=0.1132, beats_loss=0.009038, ecapa_loss=0.0001553, whisper_loss=0.1026, over 22366.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0105, ecapa_loss=0.0001433, whisper_loss=0.08993, over 3832194.72 frames. ], batch size: 90, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:11:52,074 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4199570.0, ans=0.2 2024-08-19 01:12:09,930 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 27 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-19 01:12:25,067 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4199770.0, ans=0.1 2024-08-19 01:12:34,041 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 29 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-19 01:12:50,995 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 19 from LS+wenet, 24 from Vox, 48 fro AS 2024-08-19 01:12:52,180 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 32 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-19 01:12:56,792 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4200070.0, ans=0.125 2024-08-19 01:12:57,538 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 4400, loss[loss=0.1087, beats_loss=0.01121, ecapa_loss=0.0001578, whisper_loss=0.09596, over 20076.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01046, ecapa_loss=0.0001429, whisper_loss=0.09051, over 3839466.34 frames. ], batch size: 81, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:13:07,677 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 30 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-19 01:13:18,299 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.359e+01 2.700e+01 2.927e+01 4.297e+01, threshold=5.400e+01, percent-clipped=0.0 2024-08-19 01:13:35,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4200270.0, ans=0.125 2024-08-19 01:13:38,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4200370.0, ans=0.125 2024-08-19 01:13:43,052 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-08-19 01:14:05,798 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 4450, loss[loss=0.1071, beats_loss=0.01141, ecapa_loss=0.0001136, whisper_loss=0.09458, over 22644.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01043, ecapa_loss=0.000142, whisper_loss=0.09039, over 3847269.50 frames. ], batch size: 89, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:14:13,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4200570.0, ans=0.0 2024-08-19 01:14:20,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4200670.0, ans=0.125 2024-08-19 01:14:40,640 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 17 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-19 01:15:02,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4200970.0, ans=0.125 2024-08-19 01:15:13,945 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 4500, loss[loss=0.0967, beats_loss=0.01038, ecapa_loss=0.0001347, whisper_loss=0.08497, over 23413.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01048, ecapa_loss=0.0001415, whisper_loss=0.08956, over 3853293.92 frames. ], batch size: 94, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:15:14,066 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-19 01:15:18,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4201070.0, ans=0.125 2024-08-19 01:15:22,138 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 16 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-19 01:15:34,077 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.278e+01 2.479e+01 2.905e+01 4.149e+01, threshold=4.957e+01, percent-clipped=0.0 2024-08-19 01:15:34,731 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2024-08-19 01:16:01,435 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4201370.0, ans=0.125 2024-08-19 01:16:15,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4201470.0, ans=0.125 2024-08-19 01:16:16,467 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.97 vs. limit=12.0 2024-08-19 01:16:22,402 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 4550, loss[loss=0.08835, beats_loss=0.01138, ecapa_loss=0.0001609, whisper_loss=0.07536, over 21266.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01053, ecapa_loss=0.0001426, whisper_loss=0.0895, over 3858685.94 frames. ], batch size: 90, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:16:31,045 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 25 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-19 01:16:51,203 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4201770.0, ans=0.0 2024-08-19 01:16:52,316 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4201770.0, ans=0.125 2024-08-19 01:17:07,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4201870.0, ans=0.1 2024-08-19 01:17:14,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4201870.0, ans=0.2 2024-08-19 01:17:15,385 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 14 from Vox, 48 fro AS 2024-08-19 01:17:31,563 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 4600, loss[loss=0.1023, beats_loss=0.008802, ecapa_loss=0.0001389, whisper_loss=0.09212, over 19268.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01049, ecapa_loss=0.0001441, whisper_loss=0.0897, over 3842458.60 frames. ], batch size: 75, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:17:41,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4202070.0, ans=0.125 2024-08-19 01:17:41,937 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4202070.0, ans=0.2 2024-08-19 01:17:52,501 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.339e+01 2.630e+01 2.939e+01 5.094e+01, threshold=5.261e+01, percent-clipped=1.0 2024-08-19 01:18:01,319 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4202270.0, ans=0.0 2024-08-19 01:18:10,658 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 20 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-19 01:18:13,898 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 24 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-19 01:18:14,196 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4202370.0, ans=0.0 2024-08-19 01:18:15,108 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 22 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-19 01:18:28,141 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4202470.0, ans=0.2 2024-08-19 01:18:28,999 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 37 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-19 01:18:31,331 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4202470.0, ans=0.0 2024-08-19 01:18:36,249 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-19 01:18:41,887 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 4650, loss[loss=0.1132, beats_loss=0.009498, ecapa_loss=0.00017, whisper_loss=0.102, over 16714.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01055, ecapa_loss=0.0001433, whisper_loss=0.08969, over 3846032.19 frames. ], batch size: 67, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:19:09,775 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=4202770.0, ans=0.05 2024-08-19 01:19:29,513 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-19 01:19:38,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4202970.0, ans=0.125 2024-08-19 01:19:41,065 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.76 vs. limit=15.0 2024-08-19 01:19:48,012 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.65 vs. limit=22.5 2024-08-19 01:19:53,410 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 4700, loss[loss=0.1004, beats_loss=0.01004, ecapa_loss=0.0001842, whisper_loss=0.08857, over 14042.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01054, ecapa_loss=0.0001431, whisper_loss=0.09025, over 3860356.12 frames. ], batch size: 58, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:19:55,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4203070.0, ans=0.0 2024-08-19 01:19:58,037 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 17 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-19 01:20:10,427 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.06 vs. limit=15.0 2024-08-19 01:20:13,707 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.364e+01 2.626e+01 2.952e+01 4.706e+01, threshold=5.252e+01, percent-clipped=0.0 2024-08-19 01:20:16,341 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 18 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-19 01:20:19,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4203270.0, ans=0.2 2024-08-19 01:20:20,017 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4203270.0, ans=0.2 2024-08-19 01:20:21,644 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4203270.0, ans=0.125 2024-08-19 01:20:33,074 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 35 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-19 01:20:34,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4203370.0, ans=0.0 2024-08-19 01:20:44,078 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4203370.0, ans=0.125 2024-08-19 01:20:45,458 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4203370.0, ans=0.2 2024-08-19 01:20:50,483 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 34 from Vox, 35 fro AS 2024-08-19 01:20:56,244 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4203470.0, ans=0.0 2024-08-19 01:21:01,436 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 4750, loss[loss=0.1115, beats_loss=0.009863, ecapa_loss=0.0001436, whisper_loss=0.1002, over 17782.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01048, ecapa_loss=0.0001437, whisper_loss=0.0903, over 3870913.22 frames. ], batch size: 70, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:21:12,773 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=4203570.0, ans=0.2 2024-08-19 01:21:15,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4203670.0, ans=0.125 2024-08-19 01:21:15,727 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.29 vs. limit=15.0 2024-08-19 01:21:26,365 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.65 vs. limit=12.0 2024-08-19 01:21:29,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4203770.0, ans=0.2 2024-08-19 01:21:35,112 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-08-19 01:21:35,536 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-19 01:21:36,793 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 18 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-19 01:21:50,284 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.23 vs. limit=22.5 2024-08-19 01:21:55,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4203970.0, ans=0.125 2024-08-19 01:22:08,376 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 4800, loss[loss=0.07721, beats_loss=0.01238, ecapa_loss=0.0001455, whisper_loss=0.06338, over 21528.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01057, ecapa_loss=0.0001436, whisper_loss=0.08944, over 3877793.78 frames. ], batch size: 90, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:22:26,904 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-19 01:22:27,888 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.297e+01 2.510e+01 2.780e+01 4.241e+01, threshold=5.020e+01, percent-clipped=0.0 2024-08-19 01:22:37,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4204270.0, ans=0.125 2024-08-19 01:22:44,963 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 28 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-19 01:22:55,280 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 29 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-19 01:23:16,832 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 4850, loss[loss=0.07754, beats_loss=0.01151, ecapa_loss=0.0001161, whisper_loss=0.06487, over 16124.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01062, ecapa_loss=0.0001431, whisper_loss=0.08947, over 3888957.13 frames. ], batch size: 63, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:23:31,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=4204670.0, ans=0.1 2024-08-19 01:23:36,260 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.50 vs. limit=6.0 2024-08-19 01:23:51,378 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 01:23:52,889 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4204770.0, ans=0.125 2024-08-19 01:23:56,539 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-19 01:24:05,493 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 28 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-19 01:24:09,611 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 28 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-19 01:24:10,064 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.25 vs. limit=15.0 2024-08-19 01:24:24,019 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4204970.0, ans=0.125 2024-08-19 01:24:26,357 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 4900, loss[loss=0.07496, beats_loss=0.01092, ecapa_loss=0.0001233, whisper_loss=0.06281, over 18151.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0106, ecapa_loss=0.000143, whisper_loss=0.08952, over 3865429.45 frames. ], batch size: 74, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:24:44,808 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 29 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-19 01:24:45,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4205170.0, ans=0.1 2024-08-19 01:24:49,157 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.329e+01 2.525e+01 2.915e+01 4.394e+02, threshold=5.050e+01, percent-clipped=2.0 2024-08-19 01:24:53,642 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.14 vs. limit=12.0 2024-08-19 01:24:56,785 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 23 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-19 01:24:59,550 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-19 01:25:18,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4205370.0, ans=0.125 2024-08-19 01:25:27,029 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.25 vs. limit=22.5 2024-08-19 01:25:34,563 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-19 01:25:36,837 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 4950, loss[loss=0.09851, beats_loss=0.01262, ecapa_loss=0.0001252, whisper_loss=0.08463, over 20004.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0106, ecapa_loss=0.0001434, whisper_loss=0.08992, over 3874186.18 frames. ], batch size: 81, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:25:46,450 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4205570.0, ans=0.1 2024-08-19 01:25:46,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4205570.0, ans=0.2 2024-08-19 01:26:05,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4205770.0, ans=0.125 2024-08-19 01:26:11,516 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.47 vs. limit=15.0 2024-08-19 01:26:13,566 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-19 01:26:20,115 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.74 vs. limit=22.5 2024-08-19 01:26:41,083 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=4205970.0, ans=6.0 2024-08-19 01:26:46,257 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 5000, loss[loss=0.1094, beats_loss=0.009266, ecapa_loss=0.0001253, whisper_loss=0.09891, over 17930.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01061, ecapa_loss=0.0001431, whisper_loss=0.08965, over 3863122.59 frames. ], batch size: 69, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:26:48,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4206070.0, ans=0.125 2024-08-19 01:27:07,009 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.303e+01 2.548e+01 2.762e+01 6.852e+01, threshold=5.096e+01, percent-clipped=1.0 2024-08-19 01:27:28,813 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4206370.0, ans=0.125 2024-08-19 01:27:59,239 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 5050, loss[loss=0.1106, beats_loss=0.009643, ecapa_loss=0.0001601, whisper_loss=0.09937, over 20246.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0107, ecapa_loss=0.0001426, whisper_loss=0.08918, over 3903754.39 frames. ], batch size: 78, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:28:01,870 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-19 01:28:05,272 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-19 01:28:14,987 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4206670.0, ans=0.125 2024-08-19 01:28:34,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4206770.0, ans=0.2 2024-08-19 01:28:41,955 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4206770.0, ans=0.0 2024-08-19 01:28:52,802 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 23 from LS+wenet, 32 from Vox, 38 fro AS 2024-08-19 01:28:55,648 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-19 01:28:59,955 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4206970.0, ans=0.125 2024-08-19 01:29:04,839 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4206970.0, ans=0.125 2024-08-19 01:29:07,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4206970.0, ans=0.125 2024-08-19 01:29:12,631 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 5100, loss[loss=0.09267, beats_loss=0.008768, ecapa_loss=0.0001198, whisper_loss=0.08271, over 14867.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01062, ecapa_loss=0.0001427, whisper_loss=0.08936, over 3915053.71 frames. ], batch size: 56, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:29:19,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4207070.0, ans=0.0 2024-08-19 01:29:33,051 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4207170.0, ans=0.0 2024-08-19 01:29:33,759 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.980e+01 2.355e+01 2.571e+01 2.795e+01 7.505e+01, threshold=5.142e+01, percent-clipped=1.0 2024-08-19 01:29:34,663 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 32 from LS+wenet, 12 from Vox, 42 fro AS 2024-08-19 01:29:35,316 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.52 vs. limit=15.0 2024-08-19 01:29:40,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4207170.0, ans=0.2 2024-08-19 01:29:51,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4207270.0, ans=0.125 2024-08-19 01:30:00,536 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4207370.0, ans=10.0 2024-08-19 01:30:03,462 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4207370.0, ans=0.125 2024-08-19 01:30:07,187 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4207370.0, ans=0.125 2024-08-19 01:30:16,921 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4207470.0, ans=0.0 2024-08-19 01:30:25,296 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 5150, loss[loss=0.1111, beats_loss=0.01168, ecapa_loss=0.0001378, whisper_loss=0.09802, over 22397.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01074, ecapa_loss=0.000142, whisper_loss=0.08916, over 3938691.04 frames. ], batch size: 89, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:30:25,423 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 18 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-19 01:30:33,414 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4207570.0, ans=0.0 2024-08-19 01:30:38,763 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 24 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-19 01:30:47,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4207670.0, ans=0.1 2024-08-19 01:30:51,880 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 9 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-19 01:31:06,247 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-19 01:31:17,889 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.48 vs. limit=10.0 2024-08-19 01:31:21,684 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.79 vs. limit=10.0 2024-08-19 01:31:23,059 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.06 vs. limit=15.0 2024-08-19 01:31:24,943 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.80 vs. limit=15.0 2024-08-19 01:31:26,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4207970.0, ans=0.1 2024-08-19 01:31:36,257 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 21 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-19 01:31:37,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4207970.0, ans=0.125 2024-08-19 01:31:40,246 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 5200, loss[loss=0.1172, beats_loss=0.01104, ecapa_loss=0.0001338, whisper_loss=0.1048, over 23454.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01065, ecapa_loss=0.0001425, whisper_loss=0.08906, over 3898503.95 frames. ], batch size: 92, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:31:40,634 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4208070.0, ans=0.125 2024-08-19 01:31:50,119 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4208070.0, ans=0.125 2024-08-19 01:31:52,517 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-19 01:31:54,139 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-19 01:32:00,708 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.849e+01 2.279e+01 2.485e+01 2.777e+01 3.905e+01, threshold=4.969e+01, percent-clipped=0.0 2024-08-19 01:32:40,339 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-19 01:32:40,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4208470.0, ans=0.0 2024-08-19 01:32:43,244 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 27 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-19 01:32:49,124 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-19 01:32:52,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=4208570.0, ans=0.5 2024-08-19 01:32:53,285 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 5250, loss[loss=0.105, beats_loss=0.008646, ecapa_loss=0.0001411, whisper_loss=0.09493, over 22282.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01065, ecapa_loss=0.0001428, whisper_loss=0.0887, over 3927748.45 frames. ], batch size: 85, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:32:58,090 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-19 01:33:01,977 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 30 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-19 01:33:18,999 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4208670.0, ans=0.1 2024-08-19 01:33:22,831 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4208670.0, ans=0.125 2024-08-19 01:33:36,110 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4208770.0, ans=0.5 2024-08-19 01:33:49,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4208870.0, ans=10.0 2024-08-19 01:33:53,553 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-19 01:33:58,961 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.57 vs. limit=22.5 2024-08-19 01:34:02,837 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 17 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-19 01:34:06,237 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.94 vs. limit=15.0 2024-08-19 01:34:08,313 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 26 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-19 01:34:10,179 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 5300, loss[loss=0.1181, beats_loss=0.008661, ecapa_loss=0.0001642, whisper_loss=0.1078, over 17666.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01063, ecapa_loss=0.0001414, whisper_loss=0.08903, over 3930154.23 frames. ], batch size: 68, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:34:32,570 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.347e+01 2.623e+01 3.004e+01 4.261e+01, threshold=5.246e+01, percent-clipped=0.0 2024-08-19 01:34:36,016 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-19 01:34:49,013 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4209270.0, ans=0.1 2024-08-19 01:34:53,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4209270.0, ans=0.2 2024-08-19 01:35:02,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4209370.0, ans=0.125 2024-08-19 01:35:03,732 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 21 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-19 01:35:04,549 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4209370.0, ans=0.125 2024-08-19 01:35:04,833 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.37 vs. limit=15.0 2024-08-19 01:35:07,240 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4209370.0, ans=0.0 2024-08-19 01:35:28,132 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 5350, loss[loss=0.09716, beats_loss=0.01067, ecapa_loss=0.000112, whisper_loss=0.08537, over 17480.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01054, ecapa_loss=0.0001418, whisper_loss=0.08971, over 3904356.07 frames. ], batch size: 67, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:35:34,866 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4209570.0, ans=0.0 2024-08-19 01:35:49,478 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-19 01:35:57,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4209670.0, ans=0.2 2024-08-19 01:36:00,549 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4209770.0, ans=0.125 2024-08-19 01:36:00,578 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4209770.0, ans=0.0 2024-08-19 01:36:24,003 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 29 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-19 01:36:25,503 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-19 01:36:32,423 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 22 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-19 01:36:32,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4209970.0, ans=0.1 2024-08-19 01:36:33,818 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 27 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-19 01:36:34,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4209970.0, ans=0.07 2024-08-19 01:36:48,248 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 5400, loss[loss=0.1208, beats_loss=0.009704, ecapa_loss=0.0001316, whisper_loss=0.1098, over 24968.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0105, ecapa_loss=0.000142, whisper_loss=0.09002, over 3903643.88 frames. ], batch size: 93, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:36:49,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4210070.0, ans=0.125 2024-08-19 01:36:49,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4210070.0, ans=0.0 2024-08-19 01:36:51,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4210070.0, ans=0.0 2024-08-19 01:36:53,649 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.53 vs. limit=22.5 2024-08-19 01:36:54,796 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 01:37:03,418 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 17 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-19 01:37:11,656 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.661e+01 2.258e+01 2.667e+01 2.927e+01 2.051e+02, threshold=5.334e+01, percent-clipped=3.0 2024-08-19 01:37:19,560 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-19 01:37:20,035 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4210270.0, ans=0.1 2024-08-19 01:37:20,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4210270.0, ans=0.0 2024-08-19 01:37:27,063 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.65 vs. limit=6.0 2024-08-19 01:37:44,120 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4210370.0, ans=0.04949747468305833 2024-08-19 01:37:54,296 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-19 01:38:05,765 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4210470.0, ans=0.125 2024-08-19 01:38:08,054 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 5450, loss[loss=0.1083, beats_loss=0.009114, ecapa_loss=0.0001697, whisper_loss=0.09749, over 18854.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01053, ecapa_loss=0.0001424, whisper_loss=0.09031, over 3911240.67 frames. ], batch size: 78, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:38:13,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4210570.0, ans=0.1 2024-08-19 01:38:13,223 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4210570.0, ans=0.125 2024-08-19 01:38:14,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4210570.0, ans=0.07 2024-08-19 01:38:17,686 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.82 vs. limit=22.5 2024-08-19 01:38:28,185 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4210670.0, ans=0.5 2024-08-19 01:38:50,657 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.97 vs. limit=10.0 2024-08-19 01:39:08,725 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.43 vs. limit=12.0 2024-08-19 01:39:16,599 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4210970.0, ans=0.07 2024-08-19 01:39:17,743 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4210970.0, ans=0.125 2024-08-19 01:39:21,325 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 5500, loss[loss=0.0949, beats_loss=0.01293, ecapa_loss=0.0001371, whisper_loss=0.08059, over 21991.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01052, ecapa_loss=0.0001428, whisper_loss=0.08975, over 3886968.79 frames. ], batch size: 93, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:39:29,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4211070.0, ans=0.125 2024-08-19 01:39:33,227 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 14 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-19 01:39:37,790 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4211170.0, ans=0.125 2024-08-19 01:39:43,073 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+01 2.383e+01 2.517e+01 2.787e+01 3.399e+01, threshold=5.035e+01, percent-clipped=0.0 2024-08-19 01:39:52,291 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4211270.0, ans=0.1 2024-08-19 01:39:53,240 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 35 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-19 01:39:58,146 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 19 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-19 01:40:02,674 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 22 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-19 01:40:06,407 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.63 vs. limit=22.5 2024-08-19 01:40:08,563 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4211370.0, ans=0.0 2024-08-19 01:40:10,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4211370.0, ans=0.025 2024-08-19 01:40:20,559 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4211470.0, ans=0.125 2024-08-19 01:40:30,230 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 5550, loss[loss=0.09937, beats_loss=0.01069, ecapa_loss=0.0001319, whisper_loss=0.08735, over 23355.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01046, ecapa_loss=0.0001436, whisper_loss=0.09012, over 3914516.33 frames. ], batch size: 94, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:40:34,694 INFO [train_multi_KD3.py:844] (2/4) A total of 97 cuts. 23 from LS+wenet, 26 from Vox, 48 fro AS 2024-08-19 01:40:43,125 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4211670.0, ans=0.1 2024-08-19 01:40:48,165 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4211670.0, ans=0.125 2024-08-19 01:40:54,183 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-19 01:41:19,153 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 30 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-19 01:41:22,603 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-19 01:41:26,534 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 24 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-19 01:41:30,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4211970.0, ans=0.125 2024-08-19 01:41:36,434 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 5600, loss[loss=0.0717, beats_loss=0.01239, ecapa_loss=0.0001206, whisper_loss=0.0581, over 13347.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01057, ecapa_loss=0.0001421, whisper_loss=0.09001, over 3910465.15 frames. ], batch size: 54, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:41:52,442 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4212170.0, ans=0.125 2024-08-19 01:41:56,851 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.373e+01 2.558e+01 2.737e+01 3.942e+01, threshold=5.116e+01, percent-clipped=0.0 2024-08-19 01:42:02,958 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.77 vs. limit=15.0 2024-08-19 01:42:03,048 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.41 vs. limit=10.0 2024-08-19 01:42:14,396 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4212270.0, ans=0.0 2024-08-19 01:42:43,357 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 5650, loss[loss=0.09815, beats_loss=0.01137, ecapa_loss=0.0001455, whisper_loss=0.08532, over 22989.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01058, ecapa_loss=0.0001415, whisper_loss=0.08997, over 3907336.05 frames. ], batch size: 92, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:42:43,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4212570.0, ans=0.0 2024-08-19 01:42:46,380 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-19 01:42:53,157 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 27 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-19 01:42:58,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4212670.0, ans=0.0 2024-08-19 01:43:11,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4212770.0, ans=0.1 2024-08-19 01:43:13,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4212770.0, ans=0.2 2024-08-19 01:43:51,632 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 5700, loss[loss=0.09775, beats_loss=0.01142, ecapa_loss=0.0001286, whisper_loss=0.08505, over 22499.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01061, ecapa_loss=0.0001429, whisper_loss=0.09007, over 3953207.83 frames. ], batch size: 89, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:44:12,713 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.394e+01 2.632e+01 3.086e+01 9.254e+01, threshold=5.264e+01, percent-clipped=1.0 2024-08-19 01:44:13,402 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.66 vs. limit=22.5 2024-08-19 01:44:15,238 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4213170.0, ans=0.125 2024-08-19 01:44:18,595 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.73 vs. limit=15.0 2024-08-19 01:44:19,971 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.55 vs. limit=10.0 2024-08-19 01:44:23,565 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4213270.0, ans=0.125 2024-08-19 01:44:29,964 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 01:44:31,180 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-19 01:44:49,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4213470.0, ans=0.125 2024-08-19 01:44:54,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4213470.0, ans=0.125 2024-08-19 01:45:01,728 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 5750, loss[loss=0.1135, beats_loss=0.009573, ecapa_loss=0.0001557, whisper_loss=0.1024, over 22506.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01055, ecapa_loss=0.0001432, whisper_loss=0.09014, over 3953448.73 frames. ], batch size: 92, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:45:10,340 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 29 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-19 01:45:13,261 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4213570.0, ans=0.0 2024-08-19 01:45:28,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4213770.0, ans=0.0 2024-08-19 01:45:29,813 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 26 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-19 01:46:05,211 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4213970.0, ans=0.0 2024-08-19 01:46:13,291 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 5800, loss[loss=0.08141, beats_loss=0.0127, ecapa_loss=0.0001415, whisper_loss=0.0673, over 20554.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01051, ecapa_loss=0.0001446, whisper_loss=0.08995, over 3927830.04 frames. ], batch size: 90, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:46:22,524 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4214070.0, ans=0.0 2024-08-19 01:46:31,385 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4214170.0, ans=0.05 2024-08-19 01:46:35,942 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.883e+01 2.316e+01 2.608e+01 2.916e+01 4.627e+01, threshold=5.217e+01, percent-clipped=0.0 2024-08-19 01:46:44,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4214270.0, ans=0.1 2024-08-19 01:46:53,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4214270.0, ans=0.1 2024-08-19 01:47:05,065 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4214370.0, ans=0.125 2024-08-19 01:47:10,667 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4214470.0, ans=0.125 2024-08-19 01:47:10,677 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4214470.0, ans=0.125 2024-08-19 01:47:14,898 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4214470.0, ans=0.125 2024-08-19 01:47:18,260 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 23 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-19 01:47:23,584 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 36 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-19 01:47:24,606 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 5850, loss[loss=0.1267, beats_loss=0.00991, ecapa_loss=0.0001087, whisper_loss=0.1157, over 24137.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01058, ecapa_loss=0.0001433, whisper_loss=0.08969, over 3952645.22 frames. ], batch size: 87, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:47:27,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4214570.0, ans=0.0 2024-08-19 01:47:46,277 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 26 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-19 01:47:47,776 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 01:47:49,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4214670.0, ans=0.0 2024-08-19 01:47:50,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4214770.0, ans=0.125 2024-08-19 01:47:55,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4214770.0, ans=0.0 2024-08-19 01:47:57,733 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-19 01:48:00,211 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 23 from LS+wenet, 10 from Vox, 23 fro AS 2024-08-19 01:48:02,523 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4214770.0, ans=0.1 2024-08-19 01:48:15,100 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4214870.0, ans=0.0 2024-08-19 01:48:16,309 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 15 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-19 01:48:17,527 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-19 01:48:25,839 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-19 01:48:26,068 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=7.243e-02 2024-08-19 01:48:28,137 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 21 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-19 01:48:31,397 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 16 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-19 01:48:37,179 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 5900, loss[loss=0.1069, beats_loss=0.009057, ecapa_loss=0.0001416, whisper_loss=0.09639, over 14950.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01064, ecapa_loss=0.0001435, whisper_loss=0.08863, over 3894695.54 frames. ], batch size: 60, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:48:40,340 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4215070.0, ans=0.125 2024-08-19 01:48:53,672 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4215170.0, ans=0.125 2024-08-19 01:48:55,263 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4215170.0, ans=0.0 2024-08-19 01:48:57,706 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.293e+01 2.484e+01 2.776e+01 5.070e+01, threshold=4.968e+01, percent-clipped=0.0 2024-08-19 01:49:06,422 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4215270.0, ans=0.125 2024-08-19 01:49:13,548 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 21 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-19 01:49:51,561 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 5950, loss[loss=0.08156, beats_loss=0.01016, ecapa_loss=0.0001759, whisper_loss=0.06964, over 13299.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01067, ecapa_loss=0.000143, whisper_loss=0.08847, over 3872733.27 frames. ], batch size: 56, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:50:11,741 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-19 01:50:17,623 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4215670.0, ans=0.125 2024-08-19 01:50:23,567 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4215670.0, ans=0.09899494936611666 2024-08-19 01:50:53,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4215870.0, ans=0.0 2024-08-19 01:50:55,164 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-19 01:51:06,136 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-19 01:51:13,370 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-19 01:51:16,609 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4215970.0, ans=0.125 2024-08-19 01:51:22,548 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.77 vs. limit=12.0 2024-08-19 01:51:23,363 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-19 01:51:24,388 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 6000, loss[loss=0.08629, beats_loss=0.009417, ecapa_loss=0.0001693, whisper_loss=0.07518, over 15909.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01062, ecapa_loss=0.0001428, whisper_loss=0.08904, over 3895195.79 frames. ], batch size: 64, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:51:24,389 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-19 01:52:18,303 INFO [train_multi_KD3.py:1149] (2/4) Epoch 29, validation on ASR_libri: loss=0.2515, beats_loss=0, ecapa_loss=0.0005229, whisper_loss=0.2463, over 922467.00 frames. 2024-08-19 01:52:36,636 INFO [train_multi_KD3.py:1149] (2/4) Epoch 29, validation on SV_voxceleb1: loss=0.003944, beats_loss=0, ecapa_loss=0.0003944, whisper_loss=0, over 939242.00 frames. 2024-08-19 01:55:19,280 INFO [train_multi_KD3.py:1149] (2/4) Epoch 29, validation on AT_audioset: loss=0.02306, beats_loss=0.02306, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 01:55:19,284 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31584MB 2024-08-19 01:55:20,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4216070.0, ans=0.125 2024-08-19 01:55:32,774 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 11 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-19 01:55:47,894 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.73 vs. limit=15.0 2024-08-19 01:55:48,134 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.845e+01 2.345e+01 2.612e+01 2.923e+01 8.240e+01, threshold=5.224e+01, percent-clipped=1.0 2024-08-19 01:55:52,411 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-19 01:55:55,144 WARNING [optim.py:496] (2/4) Scaling gradients by 0.028258753940463066, model_norm_threshold=52.240760803222656 2024-08-19 01:55:55,317 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.18, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.066e+05, grad_sumsq=6.066e+05, orig_rms_sq=1.000e+00 2024-08-19 01:56:39,583 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 14 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-19 01:56:49,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4216470.0, ans=0.125 2024-08-19 01:56:56,758 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 6050, loss[loss=0.0983, beats_loss=0.01073, ecapa_loss=0.0001387, whisper_loss=0.08618, over 16536.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01065, ecapa_loss=0.000143, whisper_loss=0.08836, over 3871276.93 frames. ], batch size: 64, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:57:01,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4216570.0, ans=10.0 2024-08-19 01:57:24,453 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-19 01:57:31,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4216770.0, ans=0.125 2024-08-19 01:57:37,937 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-19 01:57:52,286 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4216870.0, ans=0.125 2024-08-19 01:58:08,849 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-19 01:58:09,300 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.40 vs. limit=15.0 2024-08-19 01:58:11,249 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 6100, loss[loss=0.1109, beats_loss=0.008298, ecapa_loss=0.0001837, whisper_loss=0.1007, over 20995.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01063, ecapa_loss=0.0001433, whisper_loss=0.08859, over 3867806.54 frames. ], batch size: 89, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:58:13,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4217070.0, ans=0.125 2024-08-19 01:58:19,244 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 31 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-19 01:58:22,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4217070.0, ans=0.2 2024-08-19 01:58:31,664 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.816e+01 2.328e+01 2.727e+01 2.996e+01 1.849e+03, threshold=5.454e+01, percent-clipped=1.0 2024-08-19 01:58:36,370 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.10 vs. limit=15.0 2024-08-19 01:58:43,843 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-19 01:58:49,966 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4217270.0, ans=0.05 2024-08-19 01:58:56,389 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 19 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-19 01:59:09,406 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 01:59:10,466 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 25 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-19 01:59:18,751 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4217570.0, ans=0.2 2024-08-19 01:59:19,532 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 6150, loss[loss=0.07632, beats_loss=0.01397, ecapa_loss=0.0001795, whisper_loss=0.06056, over 18382.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01059, ecapa_loss=0.0001434, whisper_loss=0.08884, over 3867303.41 frames. ], batch size: 81, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:59:31,502 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4217570.0, ans=0.125 2024-08-19 01:59:32,852 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4217670.0, ans=0.0 2024-08-19 01:59:34,366 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4217670.0, ans=0.125 2024-08-19 01:59:40,824 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=4217670.0, ans=0.5 2024-08-19 01:59:50,272 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4217770.0, ans=0.2 2024-08-19 01:59:51,889 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4217770.0, ans=0.125 2024-08-19 02:00:00,272 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4217770.0, ans=0.125 2024-08-19 02:00:04,070 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4217870.0, ans=0.0 2024-08-19 02:00:06,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4217870.0, ans=0.125 2024-08-19 02:00:13,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4217870.0, ans=0.0 2024-08-19 02:00:13,962 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4217870.0, ans=0.0 2024-08-19 02:00:15,032 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4217970.0, ans=0.125 2024-08-19 02:00:29,879 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 6200, loss[loss=0.1317, beats_loss=0.008686, ecapa_loss=0.0001519, whisper_loss=0.1214, over 17011.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01062, ecapa_loss=0.0001424, whisper_loss=0.08892, over 3886579.85 frames. ], batch size: 69, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:00:42,860 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.42 vs. limit=15.0 2024-08-19 02:00:43,423 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-19 02:00:45,565 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4218170.0, ans=0.07 2024-08-19 02:00:45,772 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.93 vs. limit=10.0 2024-08-19 02:00:50,575 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.664e+01 2.249e+01 2.459e+01 2.825e+01 3.741e+01, threshold=4.918e+01, percent-clipped=0.0 2024-08-19 02:00:55,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4218170.0, ans=0.0 2024-08-19 02:00:59,484 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4218270.0, ans=0.0 2024-08-19 02:01:34,775 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.82 vs. limit=15.0 2024-08-19 02:01:40,525 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.17 vs. limit=15.0 2024-08-19 02:01:40,952 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 6250, loss[loss=0.117, beats_loss=0.01154, ecapa_loss=0.000145, whisper_loss=0.104, over 23083.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01062, ecapa_loss=0.0001432, whisper_loss=0.08938, over 3902587.49 frames. ], batch size: 91, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:01:52,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4218570.0, ans=0.1 2024-08-19 02:02:00,397 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-19 02:02:23,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4218870.0, ans=0.2 2024-08-19 02:02:26,031 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.93 vs. limit=15.0 2024-08-19 02:02:29,212 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.74 vs. limit=10.0 2024-08-19 02:02:33,170 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=4218870.0, ans=0.1 2024-08-19 02:02:40,005 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.15 vs. limit=15.0 2024-08-19 02:02:44,943 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.70 vs. limit=15.0 2024-08-19 02:02:50,741 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 6300, loss[loss=0.1069, beats_loss=0.01317, ecapa_loss=0.0001268, whisper_loss=0.09248, over 14303.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01061, ecapa_loss=0.000144, whisper_loss=0.08927, over 3898063.67 frames. ], batch size: 56, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:02:55,371 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 23 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-19 02:02:59,242 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 20 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-19 02:03:05,090 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4219170.0, ans=0.125 2024-08-19 02:03:05,188 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4219170.0, ans=0.125 2024-08-19 02:03:06,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4219170.0, ans=0.125 2024-08-19 02:03:11,745 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.391e+01 2.571e+01 3.038e+01 7.293e+01, threshold=5.142e+01, percent-clipped=2.0 2024-08-19 02:03:59,783 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 6350, loss[loss=0.09516, beats_loss=0.01007, ecapa_loss=0.000134, whisper_loss=0.08376, over 16751.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01057, ecapa_loss=0.0001433, whisper_loss=0.0891, over 3877040.29 frames. ], batch size: 66, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:04:18,401 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.26 vs. limit=15.0 2024-08-19 02:04:47,548 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4219870.0, ans=0.125 2024-08-19 02:05:09,551 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 6400, loss[loss=0.1021, beats_loss=0.012, ecapa_loss=0.0001374, whisper_loss=0.08877, over 22388.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01056, ecapa_loss=0.000143, whisper_loss=0.08944, over 3883785.99 frames. ], batch size: 92, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:05:21,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4220070.0, ans=0.1 2024-08-19 02:05:22,955 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-19 02:05:23,184 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4220170.0, ans=0.2 2024-08-19 02:05:24,487 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4220170.0, ans=0.0 2024-08-19 02:05:31,206 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.582e+01 2.283e+01 2.522e+01 2.730e+01 4.061e+01, threshold=5.044e+01, percent-clipped=0.0 2024-08-19 02:05:40,440 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.82 vs. limit=15.0 2024-08-19 02:05:44,261 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.87 vs. limit=22.5 2024-08-19 02:05:47,229 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-19 02:05:55,876 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4220370.0, ans=0.1 2024-08-19 02:06:00,238 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4220370.0, ans=0.1 2024-08-19 02:06:01,600 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 17 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-19 02:06:19,672 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 6450, loss[loss=0.0763, beats_loss=0.01184, ecapa_loss=0.0001278, whisper_loss=0.06317, over 15039.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01065, ecapa_loss=0.0001437, whisper_loss=0.08873, over 3889811.31 frames. ], batch size: 58, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:06:25,183 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4220570.0, ans=0.125 2024-08-19 02:06:36,814 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.58 vs. limit=15.0 2024-08-19 02:06:57,275 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4220770.0, ans=0.0 2024-08-19 02:06:58,730 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 32 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-19 02:07:03,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4220870.0, ans=0.1 2024-08-19 02:07:08,901 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=5.284e-01 2024-08-19 02:07:15,689 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 24 from LS+wenet, 9 from Vox, 29 fro AS 2024-08-19 02:07:15,940 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4220970.0, ans=0.0 2024-08-19 02:07:25,347 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4220970.0, ans=0.0 2024-08-19 02:07:29,961 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 6500, loss[loss=0.07879, beats_loss=0.01117, ecapa_loss=0.0001351, whisper_loss=0.06628, over 18644.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01059, ecapa_loss=0.0001426, whisper_loss=0.08979, over 3915222.71 frames. ], batch size: 78, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:07:33,406 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4221070.0, ans=0.125 2024-08-19 02:07:50,817 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.411e+01 2.588e+01 2.957e+01 3.943e+01, threshold=5.175e+01, percent-clipped=0.0 2024-08-19 02:07:53,829 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-19 02:07:55,537 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4221170.0, ans=0.125 2024-08-19 02:07:59,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4221270.0, ans=0.0 2024-08-19 02:08:15,430 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.10 vs. limit=15.0 2024-08-19 02:08:15,481 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.89 vs. limit=15.0 2024-08-19 02:08:35,569 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.61 vs. limit=15.0 2024-08-19 02:08:38,492 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 6550, loss[loss=0.1088, beats_loss=0.01092, ecapa_loss=0.0001772, whisper_loss=0.0961, over 13161.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01056, ecapa_loss=0.0001439, whisper_loss=0.09002, over 3882196.41 frames. ], batch size: 54, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:09:00,491 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-19 02:09:13,194 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 23 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-19 02:09:14,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4221770.0, ans=0.0 2024-08-19 02:09:27,737 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-19 02:09:33,393 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4221870.0, ans=0.125 2024-08-19 02:09:46,995 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-19 02:09:49,333 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 6600, loss[loss=0.1237, beats_loss=0.009869, ecapa_loss=0.0001416, whisper_loss=0.1124, over 23248.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01055, ecapa_loss=0.0001438, whisper_loss=0.09016, over 3920958.19 frames. ], batch size: 92, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:10:01,917 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 21 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-19 02:10:03,903 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4222170.0, ans=0.125 2024-08-19 02:10:07,969 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=4222170.0, ans=15.0 2024-08-19 02:10:09,811 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.434e+01 2.631e+01 2.972e+01 4.626e+01, threshold=5.261e+01, percent-clipped=0.0 2024-08-19 02:10:11,810 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4222170.0, ans=0.09899494936611666 2024-08-19 02:10:16,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4222270.0, ans=0.125 2024-08-19 02:10:26,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4222270.0, ans=0.0 2024-08-19 02:10:41,826 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4222370.0, ans=0.0 2024-08-19 02:10:50,278 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-19 02:10:53,512 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 16 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-19 02:10:58,593 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.61 vs. limit=15.0 2024-08-19 02:10:58,866 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 6650, loss[loss=0.1143, beats_loss=0.01012, ecapa_loss=0.0001487, whisper_loss=0.1027, over 17942.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0105, ecapa_loss=0.0001436, whisper_loss=0.09068, over 3925109.24 frames. ], batch size: 73, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:10:59,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4222570.0, ans=0.1 2024-08-19 02:11:10,286 WARNING [optim.py:496] (2/4) Scaling gradients by 0.04424963891506195, model_norm_threshold=52.611846923828125 2024-08-19 02:11:10,456 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.224e+05, grad_sumsq=2.136e+07, orig_rms_sq=1.041e-02 2024-08-19 02:11:23,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4222670.0, ans=0.2 2024-08-19 02:11:39,075 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4222770.0, ans=0.0 2024-08-19 02:11:42,939 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-19 02:12:09,166 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 6700, loss[loss=0.0724, beats_loss=0.01294, ecapa_loss=0.0001569, whisper_loss=0.05789, over 16134.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01053, ecapa_loss=0.0001441, whisper_loss=0.09035, over 3894560.13 frames. ], batch size: 70, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:12:26,962 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.33 vs. limit=22.5 2024-08-19 02:12:31,459 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.376e+01 2.691e+01 2.989e+01 1.189e+03, threshold=5.381e+01, percent-clipped=5.0 2024-08-19 02:12:35,978 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-19 02:12:47,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4223270.0, ans=0.1 2024-08-19 02:13:06,635 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4223470.0, ans=0.125 2024-08-19 02:13:10,405 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.42 vs. limit=15.0 2024-08-19 02:13:15,567 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 16 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-19 02:13:20,623 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 6750, loss[loss=0.106, beats_loss=0.009987, ecapa_loss=0.000125, whisper_loss=0.09473, over 16834.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01052, ecapa_loss=0.000143, whisper_loss=0.09007, over 3906993.95 frames. ], batch size: 67, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:13:27,016 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-19 02:13:34,470 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4223670.0, ans=0.015 2024-08-19 02:13:38,819 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4223670.0, ans=0.0 2024-08-19 02:13:50,339 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-19 02:13:59,138 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4223770.0, ans=0.0 2024-08-19 02:14:03,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=4223870.0, ans=15.0 2024-08-19 02:14:28,907 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 6800, loss[loss=0.09493, beats_loss=0.0123, ecapa_loss=0.0001473, whisper_loss=0.08116, over 15895.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0105, ecapa_loss=0.0001436, whisper_loss=0.09033, over 3924976.49 frames. ], batch size: 66, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:14:40,531 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-19 02:14:42,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4224170.0, ans=0.0 2024-08-19 02:14:48,986 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-19 02:14:49,986 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.425e+01 2.571e+01 2.858e+01 3.712e+02, threshold=5.143e+01, percent-clipped=3.0 2024-08-19 02:15:00,821 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 23 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-19 02:15:13,484 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-19 02:15:15,067 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.127e-01 2024-08-19 02:15:17,309 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 17 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-19 02:15:19,965 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4224370.0, ans=0.2 2024-08-19 02:15:37,308 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 6850, loss[loss=0.09562, beats_loss=0.01224, ecapa_loss=0.0001062, whisper_loss=0.08231, over 14623.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01056, ecapa_loss=0.0001437, whisper_loss=0.08928, over 3899964.13 frames. ], batch size: 56, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:15:47,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4224570.0, ans=0.0 2024-08-19 02:16:06,186 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4224770.0, ans=0.125 2024-08-19 02:16:12,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4224770.0, ans=0.125 2024-08-19 02:16:17,843 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 35 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-19 02:16:27,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4224870.0, ans=0.125 2024-08-19 02:16:30,507 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.51 vs. limit=15.0 2024-08-19 02:16:32,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4224970.0, ans=0.125 2024-08-19 02:16:34,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4224970.0, ans=0.1 2024-08-19 02:16:36,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4224970.0, ans=0.125 2024-08-19 02:16:45,896 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.49 vs. limit=22.5 2024-08-19 02:16:46,202 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 6900, loss[loss=0.1173, beats_loss=0.01022, ecapa_loss=0.0001523, whisper_loss=0.1056, over 21662.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0105, ecapa_loss=0.0001442, whisper_loss=0.09007, over 3893774.41 frames. ], batch size: 90, lr: 2.11e-03, grad_scale: 1.152921504606847e+18 2024-08-19 02:16:50,821 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4225070.0, ans=0.0 2024-08-19 02:17:02,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4225170.0, ans=0.0 2024-08-19 02:17:06,155 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.295e+01 2.515e+01 2.694e+01 1.143e+02, threshold=5.030e+01, percent-clipped=2.0 2024-08-19 02:17:06,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4225170.0, ans=0.125 2024-08-19 02:17:09,153 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-19 02:17:23,940 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4225270.0, ans=0.0 2024-08-19 02:17:52,723 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 6950, loss[loss=0.1012, beats_loss=0.009495, ecapa_loss=0.0001812, whisper_loss=0.08992, over 13951.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01046, ecapa_loss=0.0001442, whisper_loss=0.09071, over 3863651.68 frames. ], batch size: 59, lr: 2.11e-03, grad_scale: 1.152921504606847e+18 2024-08-19 02:18:12,310 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4225670.0, ans=0.2 2024-08-19 02:18:58,851 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 7000, loss[loss=0.09626, beats_loss=0.01245, ecapa_loss=0.0001376, whisper_loss=0.08243, over 22101.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01046, ecapa_loss=0.0001429, whisper_loss=0.09079, over 3871456.19 frames. ], batch size: 92, lr: 2.11e-03, grad_scale: 1.152921504606847e+18 2024-08-19 02:19:05,823 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.68 vs. limit=15.0 2024-08-19 02:19:10,987 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.91 vs. limit=15.0 2024-08-19 02:19:13,424 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.35 vs. limit=15.0 2024-08-19 02:19:15,524 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-19 02:19:18,044 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.286e+01 2.533e+01 2.808e+01 4.798e+01, threshold=5.066e+01, percent-clipped=0.0 2024-08-19 02:19:21,996 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4226170.0, ans=0.09899494936611666 2024-08-19 02:19:23,023 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 14 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-19 02:19:37,456 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 23 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-19 02:19:49,033 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4226470.0, ans=0.1 2024-08-19 02:19:53,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4226470.0, ans=0.2 2024-08-19 02:19:55,203 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 24 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-19 02:19:57,000 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=4226470.0, ans=0.5 2024-08-19 02:20:02,712 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 7050, loss[loss=0.1133, beats_loss=0.009788, ecapa_loss=0.0001163, whisper_loss=0.1023, over 20532.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01054, ecapa_loss=0.000143, whisper_loss=0.09011, over 3849520.73 frames. ], batch size: 74, lr: 2.11e-03, grad_scale: 1.152921504606847e+18 2024-08-19 02:20:06,523 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 19 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-19 02:20:07,761 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-19 02:20:29,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4226770.0, ans=0.0 2024-08-19 02:20:33,933 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.39 vs. limit=15.0 2024-08-19 02:20:38,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4226770.0, ans=0.125 2024-08-19 02:20:39,759 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-19 02:20:48,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4226870.0, ans=0.125 2024-08-19 02:20:51,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4226870.0, ans=0.125 2024-08-19 02:21:05,135 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=15.0 2024-08-19 02:21:05,478 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 7100, loss[loss=0.1037, beats_loss=0.007917, ecapa_loss=0.0001421, whisper_loss=0.09436, over 18372.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01058, ecapa_loss=0.0001424, whisper_loss=0.08953, over 3871409.98 frames. ], batch size: 71, lr: 2.11e-03, grad_scale: 1.152921504606847e+18 2024-08-19 02:21:07,904 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-19 02:21:23,813 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.314e+01 2.618e+01 2.953e+01 5.824e+01, threshold=5.237e+01, percent-clipped=1.0 2024-08-19 02:21:31,632 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 19 from LS+wenet, 14 from Vox, 47 fro AS 2024-08-19 02:21:57,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4227470.0, ans=0.0 2024-08-19 02:22:08,098 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 7150, loss[loss=0.08648, beats_loss=0.01104, ecapa_loss=0.0001668, whisper_loss=0.07377, over 22030.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01052, ecapa_loss=0.0001427, whisper_loss=0.09007, over 3866935.92 frames. ], batch size: 93, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:22:28,613 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.96 vs. limit=15.0 2024-08-19 02:22:33,782 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 20 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-19 02:22:36,312 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4227770.0, ans=0.07 2024-08-19 02:22:43,141 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-19 02:22:45,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4227870.0, ans=0.0 2024-08-19 02:22:46,101 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-19 02:23:02,358 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 23 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-19 02:23:04,287 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4227970.0, ans=0.0 2024-08-19 02:23:04,320 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4227970.0, ans=0.125 2024-08-19 02:23:04,606 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.00 vs. limit=15.0 2024-08-19 02:23:10,875 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4228070.0, ans=0.1 2024-08-19 02:23:11,470 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 7200, loss[loss=0.1057, beats_loss=0.01068, ecapa_loss=0.0001542, whisper_loss=0.0935, over 21779.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01053, ecapa_loss=0.0001428, whisper_loss=0.0896, over 3877445.31 frames. ], batch size: 92, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:23:11,588 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-19 02:23:13,443 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.52 vs. limit=15.0 2024-08-19 02:23:21,995 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.06 vs. limit=6.0 2024-08-19 02:23:25,781 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.64 vs. limit=22.5 2024-08-19 02:23:31,508 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.350e+01 2.585e+01 2.924e+01 4.669e+01, threshold=5.169e+01, percent-clipped=0.0 2024-08-19 02:23:33,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4228170.0, ans=0.125 2024-08-19 02:23:35,414 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 31 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-19 02:24:04,410 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.35 vs. limit=22.5 2024-08-19 02:24:13,879 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 7250, loss[loss=0.1013, beats_loss=0.009192, ecapa_loss=0.000165, whisper_loss=0.09044, over 18840.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01049, ecapa_loss=0.0001434, whisper_loss=0.08969, over 3885461.67 frames. ], batch size: 76, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:24:20,518 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-19 02:24:28,210 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 24 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-19 02:24:37,150 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4228670.0, ans=0.125 2024-08-19 02:24:48,037 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2024-08-19 02:24:58,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4228870.0, ans=0.0 2024-08-19 02:25:02,162 WARNING [optim.py:496] (2/4) Scaling gradients by 0.0587669275701046, model_norm_threshold=51.69389343261719 2024-08-19 02:25:02,330 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.3.norm.log_scale with proportion 0.11, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.897e+04, grad_sumsq=8.897e+04, orig_rms_sq=1.000e+00 2024-08-19 02:25:09,357 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4228970.0, ans=0.2 2024-08-19 02:25:18,055 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 7300, loss[loss=0.08537, beats_loss=0.012, ecapa_loss=0.0001177, whisper_loss=0.07219, over 22651.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01045, ecapa_loss=0.0001449, whisper_loss=0.08986, over 3861182.32 frames. ], batch size: 92, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:25:34,766 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4229170.0, ans=0.0 2024-08-19 02:25:35,715 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-19 02:25:38,109 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.817e+01 2.307e+01 2.519e+01 2.780e+01 8.796e+02, threshold=5.038e+01, percent-clipped=1.0 2024-08-19 02:26:02,908 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.96 vs. limit=15.0 2024-08-19 02:26:12,219 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4229470.0, ans=0.125 2024-08-19 02:26:20,634 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 7350, loss[loss=0.1127, beats_loss=0.01017, ecapa_loss=0.0001319, whisper_loss=0.1012, over 22611.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01048, ecapa_loss=0.0001449, whisper_loss=0.09043, over 3881956.46 frames. ], batch size: 89, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:26:24,320 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.35 vs. limit=15.0 2024-08-19 02:26:25,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=4229570.0, ans=10.0 2024-08-19 02:26:30,048 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-19 02:26:36,843 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-19 02:26:39,513 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4229670.0, ans=0.125 2024-08-19 02:26:45,941 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-19 02:27:18,506 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.77 vs. limit=12.0 2024-08-19 02:27:25,390 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 7400, loss[loss=0.09765, beats_loss=0.009961, ecapa_loss=0.0001562, whisper_loss=0.08612, over 19480.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01049, ecapa_loss=0.0001435, whisper_loss=0.09022, over 3892074.78 frames. ], batch size: 81, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:27:27,204 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4230070.0, ans=0.2 2024-08-19 02:27:29,067 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.33 vs. limit=15.0 2024-08-19 02:27:39,549 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.10 vs. limit=22.5 2024-08-19 02:27:44,150 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4230170.0, ans=0.1 2024-08-19 02:27:46,179 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.315e+01 2.515e+01 2.740e+01 4.360e+01, threshold=5.029e+01, percent-clipped=0.0 2024-08-19 02:28:04,433 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4230370.0, ans=0.1 2024-08-19 02:28:17,165 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4230470.0, ans=0.0 2024-08-19 02:28:20,922 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4230470.0, ans=0.035 2024-08-19 02:28:22,251 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4230470.0, ans=0.04949747468305833 2024-08-19 02:28:27,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4230470.0, ans=0.1 2024-08-19 02:28:29,076 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 7450, loss[loss=0.1133, beats_loss=0.01027, ecapa_loss=0.0001267, whisper_loss=0.1017, over 22888.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01051, ecapa_loss=0.0001443, whisper_loss=0.09013, over 3916122.57 frames. ], batch size: 90, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:28:32,382 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2024-08-19 02:28:42,593 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4230670.0, ans=0.1 2024-08-19 02:28:43,955 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 02:28:52,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4230670.0, ans=0.0 2024-08-19 02:28:59,041 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-19 02:29:07,565 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-19 02:29:13,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4230870.0, ans=0.125 2024-08-19 02:29:33,729 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 7500, loss[loss=0.1021, beats_loss=0.007909, ecapa_loss=0.0001362, whisper_loss=0.09288, over 15759.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01054, ecapa_loss=0.0001435, whisper_loss=0.09005, over 3882600.80 frames. ], batch size: 62, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:29:34,251 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4231070.0, ans=0.125 2024-08-19 02:29:36,547 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-19 02:29:50,376 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 18 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-19 02:29:54,018 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.202e+01 2.431e+01 2.767e+01 3.373e+01, threshold=4.863e+01, percent-clipped=0.0 2024-08-19 02:29:57,062 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 30 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-19 02:30:04,804 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4231270.0, ans=0.0 2024-08-19 02:30:14,952 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 14 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-19 02:30:30,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4231470.0, ans=0.1 2024-08-19 02:30:32,660 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-19 02:30:32,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4231470.0, ans=0.125 2024-08-19 02:30:37,046 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4231570.0, ans=0.125 2024-08-19 02:30:37,882 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 7550, loss[loss=0.0815, beats_loss=0.01175, ecapa_loss=0.0001194, whisper_loss=0.06855, over 16352.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01049, ecapa_loss=0.0001435, whisper_loss=0.08993, over 3854726.40 frames. ], batch size: 64, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:30:48,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=4231570.0, ans=0.5 2024-08-19 02:30:50,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4231670.0, ans=0.125 2024-08-19 02:30:50,965 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4231670.0, ans=0.125 2024-08-19 02:31:16,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4231870.0, ans=0.0 2024-08-19 02:31:21,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4231870.0, ans=0.125 2024-08-19 02:31:34,088 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-19 02:31:39,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4231970.0, ans=0.125 2024-08-19 02:31:41,788 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 7600, loss[loss=0.09324, beats_loss=0.01359, ecapa_loss=0.0001199, whisper_loss=0.07845, over 23198.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01051, ecapa_loss=0.0001427, whisper_loss=0.08962, over 3812363.28 frames. ], batch size: 93, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:32:01,696 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.345e+01 2.567e+01 2.795e+01 4.774e+01, threshold=5.135e+01, percent-clipped=0.0 2024-08-19 02:32:02,226 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4232170.0, ans=0.2 2024-08-19 02:32:18,961 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-19 02:32:21,578 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4232370.0, ans=0.0 2024-08-19 02:32:23,551 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-19 02:32:35,831 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4232470.0, ans=0.04949747468305833 2024-08-19 02:32:37,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4232470.0, ans=0.125 2024-08-19 02:32:45,041 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 7650, loss[loss=0.09305, beats_loss=0.01382, ecapa_loss=0.000127, whisper_loss=0.07796, over 21577.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01056, ecapa_loss=0.0001429, whisper_loss=0.08947, over 3843814.35 frames. ], batch size: 89, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:32:50,433 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-19 02:33:03,811 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.42 vs. limit=15.0 2024-08-19 02:33:29,618 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 26 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-19 02:33:32,679 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4232870.0, ans=0.2 2024-08-19 02:33:38,192 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 24 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-19 02:33:44,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4232970.0, ans=10.0 2024-08-19 02:33:48,024 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 7700, loss[loss=0.104, beats_loss=0.009156, ecapa_loss=0.0001766, whisper_loss=0.09305, over 20590.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01053, ecapa_loss=0.000143, whisper_loss=0.08999, over 3874965.81 frames. ], batch size: 85, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:33:48,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4233070.0, ans=0.125 2024-08-19 02:33:53,723 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4233070.0, ans=0.0 2024-08-19 02:34:02,774 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.66 vs. limit=8.0 2024-08-19 02:34:03,578 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.22 vs. limit=12.0 2024-08-19 02:34:05,852 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4233170.0, ans=0.125 2024-08-19 02:34:07,806 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.265e+01 2.506e+01 2.925e+01 4.294e+01, threshold=5.013e+01, percent-clipped=0.0 2024-08-19 02:34:22,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4233270.0, ans=0.2 2024-08-19 02:34:28,546 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 16 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-19 02:34:30,709 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4233370.0, ans=0.1 2024-08-19 02:34:35,745 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4233370.0, ans=0.0 2024-08-19 02:34:38,166 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4233470.0, ans=0.0 2024-08-19 02:34:39,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4233470.0, ans=0.0 2024-08-19 02:34:50,542 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 24 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-19 02:34:51,664 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 7750, loss[loss=0.09164, beats_loss=0.009655, ecapa_loss=0.0001623, whisper_loss=0.08037, over 19971.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01058, ecapa_loss=0.0001428, whisper_loss=0.09004, over 3906214.57 frames. ], batch size: 80, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:35:15,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4233770.0, ans=0.2 2024-08-19 02:35:32,237 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4233870.0, ans=0.125 2024-08-19 02:35:38,157 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-19 02:35:40,614 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-19 02:35:44,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4233970.0, ans=0.1 2024-08-19 02:35:50,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4233970.0, ans=0.1 2024-08-19 02:35:54,851 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 7800, loss[loss=0.1228, beats_loss=0.007052, ecapa_loss=0.0001562, whisper_loss=0.1142, over 17232.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01055, ecapa_loss=0.0001424, whisper_loss=0.09015, over 3901795.75 frames. ], batch size: 67, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:35:59,836 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-19 02:36:00,400 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.58 vs. limit=15.0 2024-08-19 02:36:05,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4234070.0, ans=10.0 2024-08-19 02:36:15,043 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.304e+01 2.560e+01 2.905e+01 1.988e+02, threshold=5.119e+01, percent-clipped=2.0 2024-08-19 02:36:19,606 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4234270.0, ans=0.07 2024-08-19 02:36:31,828 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 14 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-19 02:36:34,483 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4234370.0, ans=0.125 2024-08-19 02:36:45,855 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4234470.0, ans=0.1 2024-08-19 02:36:50,635 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4234470.0, ans=0.125 2024-08-19 02:36:54,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4234470.0, ans=0.125 2024-08-19 02:36:55,731 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4234470.0, ans=0.125 2024-08-19 02:36:57,827 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 7850, loss[loss=0.1026, beats_loss=0.01164, ecapa_loss=0.0001188, whisper_loss=0.08974, over 17579.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01053, ecapa_loss=0.0001422, whisper_loss=0.09043, over 3896429.04 frames. ], batch size: 65, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:37:00,317 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-19 02:37:10,858 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4234670.0, ans=0.0 2024-08-19 02:37:39,426 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 26 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-19 02:38:00,072 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 17 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-19 02:38:00,279 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4235070.0, ans=0.0 2024-08-19 02:38:01,013 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 7900, loss[loss=0.08372, beats_loss=0.01147, ecapa_loss=0.000155, whisper_loss=0.0707, over 15439.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01057, ecapa_loss=0.000142, whisper_loss=0.09037, over 3919169.72 frames. ], batch size: 64, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:38:05,293 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 19 from LS+wenet, 29 from Vox, 46 fro AS 2024-08-19 02:38:11,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4235070.0, ans=0.0 2024-08-19 02:38:20,850 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+01 2.393e+01 2.663e+01 2.999e+01 6.865e+01, threshold=5.327e+01, percent-clipped=3.0 2024-08-19 02:38:25,049 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 20 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-19 02:38:26,945 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.38 vs. limit=10.0 2024-08-19 02:38:35,366 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4235270.0, ans=0.125 2024-08-19 02:38:40,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4235370.0, ans=0.1 2024-08-19 02:38:43,753 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-19 02:38:50,693 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 18 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-19 02:38:53,199 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 29 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-19 02:39:03,382 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 7950, loss[loss=0.09755, beats_loss=0.009889, ecapa_loss=0.0001638, whisper_loss=0.08602, over 20222.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01062, ecapa_loss=0.0001416, whisper_loss=0.08991, over 3902319.05 frames. ], batch size: 84, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:39:05,995 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 15 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-19 02:39:06,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4235570.0, ans=0.07 2024-08-19 02:39:14,166 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.54 vs. limit=12.0 2024-08-19 02:39:22,105 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 18 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-19 02:39:41,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4235870.0, ans=0.0 2024-08-19 02:39:43,997 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4235870.0, ans=0.125 2024-08-19 02:39:57,807 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4235970.0, ans=0.125 2024-08-19 02:40:04,545 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 8000, loss[loss=0.1073, beats_loss=0.01006, ecapa_loss=0.0001276, whisper_loss=0.09592, over 15851.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01051, ecapa_loss=0.0001422, whisper_loss=0.09074, over 3889119.49 frames. ], batch size: 62, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:40:05,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4236070.0, ans=0.1 2024-08-19 02:40:14,465 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 22 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-19 02:40:19,607 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-19 02:40:21,026 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4236170.0, ans=0.0 2024-08-19 02:40:24,236 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.236e+01 2.508e+01 2.783e+01 4.268e+01, threshold=5.017e+01, percent-clipped=0.0 2024-08-19 02:40:24,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4236170.0, ans=0.0 2024-08-19 02:40:30,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4236270.0, ans=0.1 2024-08-19 02:40:32,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4236270.0, ans=0.2 2024-08-19 02:40:32,323 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4236270.0, ans=0.2 2024-08-19 02:41:00,269 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4236470.0, ans=0.125 2024-08-19 02:41:02,410 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 20 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-19 02:41:05,888 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 8050, loss[loss=0.1166, beats_loss=0.01017, ecapa_loss=0.0001111, whisper_loss=0.1053, over 22717.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01047, ecapa_loss=0.0001415, whisper_loss=0.09114, over 3892104.49 frames. ], batch size: 88, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:41:18,573 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4236670.0, ans=0.125 2024-08-19 02:41:21,475 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.56 vs. limit=15.0 2024-08-19 02:41:33,524 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 36 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-19 02:41:38,286 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 15 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-19 02:41:39,546 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 20 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-19 02:41:43,252 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 22 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-19 02:41:49,212 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-19 02:41:54,860 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4236970.0, ans=0.1 2024-08-19 02:41:59,463 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 34 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-19 02:42:04,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4236970.0, ans=0.1 2024-08-19 02:42:07,602 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 8100, loss[loss=0.1201, beats_loss=0.008559, ecapa_loss=0.0001531, whisper_loss=0.1101, over 22653.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01047, ecapa_loss=0.0001424, whisper_loss=0.09061, over 3873339.69 frames. ], batch size: 88, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:42:13,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4237070.0, ans=0.1 2024-08-19 02:42:15,709 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.90 vs. limit=10.0 2024-08-19 02:42:16,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=4237070.0, ans=10.0 2024-08-19 02:42:27,406 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.244e+01 2.525e+01 2.786e+01 3.995e+01, threshold=5.049e+01, percent-clipped=0.0 2024-08-19 02:42:32,239 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 18 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-19 02:43:06,281 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 35 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-19 02:43:08,503 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 8150, loss[loss=0.1058, beats_loss=0.009799, ecapa_loss=0.0001406, whisper_loss=0.09459, over 17373.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01046, ecapa_loss=0.0001424, whisper_loss=0.09077, over 3883837.23 frames. ], batch size: 69, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:43:20,747 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-19 02:43:23,075 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-19 02:43:41,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4237770.0, ans=0.125 2024-08-19 02:43:45,344 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4237870.0, ans=0.125 2024-08-19 02:43:47,409 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-19 02:43:52,752 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=6.104e+01 2024-08-19 02:43:58,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4237970.0, ans=0.2 2024-08-19 02:44:06,118 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-19 02:44:09,556 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 8200, loss[loss=0.09246, beats_loss=0.009206, ecapa_loss=0.0001876, whisper_loss=0.08138, over 15657.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01044, ecapa_loss=0.0001419, whisper_loss=0.09087, over 3919033.73 frames. ], batch size: 66, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:44:16,744 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 21 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-19 02:44:28,895 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.281e+01 2.470e+01 2.775e+01 3.796e+01, threshold=4.940e+01, percent-clipped=0.0 2024-08-19 02:44:32,728 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 18 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-19 02:44:33,825 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 24 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-19 02:44:36,328 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 35 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-19 02:44:36,612 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4238270.0, ans=0.125 2024-08-19 02:44:48,674 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-19 02:44:50,142 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4238370.0, ans=0.1 2024-08-19 02:44:50,205 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4238370.0, ans=0.125 2024-08-19 02:45:07,642 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.80 vs. limit=15.0 2024-08-19 02:45:10,507 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 8250, loss[loss=0.1012, beats_loss=0.009773, ecapa_loss=0.0001562, whisper_loss=0.08985, over 15394.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01048, ecapa_loss=0.0001428, whisper_loss=0.09023, over 3928020.92 frames. ], batch size: 59, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:45:17,371 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4238570.0, ans=0.125 2024-08-19 02:45:26,206 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.798e-03 2024-08-19 02:45:40,548 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 22 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-19 02:46:12,818 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 8300, loss[loss=0.1043, beats_loss=0.009016, ecapa_loss=0.0001342, whisper_loss=0.09396, over 15935.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01047, ecapa_loss=0.0001427, whisper_loss=0.09039, over 3919202.66 frames. ], batch size: 63, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:46:23,077 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-19 02:46:32,637 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.960e+01 2.439e+01 2.594e+01 2.906e+01 5.754e+01, threshold=5.188e+01, percent-clipped=1.0 2024-08-19 02:46:45,304 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4239270.0, ans=0.0 2024-08-19 02:46:46,424 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4239270.0, ans=0.09899494936611666 2024-08-19 02:46:56,178 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-19 02:47:00,855 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 16 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-19 02:47:13,979 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 8350, loss[loss=0.1152, beats_loss=0.007376, ecapa_loss=0.0001893, whisper_loss=0.1059, over 20958.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01049, ecapa_loss=0.0001425, whisper_loss=0.09027, over 3901703.46 frames. ], batch size: 89, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:47:37,216 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-19 02:47:46,626 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 31 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-19 02:47:49,128 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 22 from LS+wenet, 31 from Vox, 39 fro AS 2024-08-19 02:47:52,194 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.21 vs. limit=22.5 2024-08-19 02:47:52,745 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 26 from Vox, 22 fro AS 2024-08-19 02:47:56,430 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 14 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-19 02:47:57,892 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4239870.0, ans=0.125 2024-08-19 02:48:01,168 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 29 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-19 02:48:02,922 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4239970.0, ans=0.2 2024-08-19 02:48:12,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4239970.0, ans=0.1 2024-08-19 02:48:15,077 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-19 02:48:17,200 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 8400, loss[loss=0.112, beats_loss=0.01089, ecapa_loss=0.000135, whisper_loss=0.09975, over 16559.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01044, ecapa_loss=0.0001432, whisper_loss=0.09065, over 3897860.08 frames. ], batch size: 68, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:48:27,757 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.23 vs. limit=6.0 2024-08-19 02:48:29,581 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-19 02:48:30,136 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.69 vs. limit=22.5 2024-08-19 02:48:36,821 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.697e+01 2.296e+01 2.515e+01 2.691e+01 3.893e+01, threshold=5.031e+01, percent-clipped=0.0 2024-08-19 02:49:18,308 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 8450, loss[loss=0.1081, beats_loss=0.01064, ecapa_loss=0.0001619, whisper_loss=0.09588, over 20547.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01043, ecapa_loss=0.0001426, whisper_loss=0.09029, over 3887418.83 frames. ], batch size: 81, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:49:27,746 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.06 vs. limit=15.0 2024-08-19 02:49:36,760 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-19 02:49:48,525 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.77 vs. limit=15.0 2024-08-19 02:50:06,339 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4240970.0, ans=0.125 2024-08-19 02:50:08,765 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4240970.0, ans=0.1 2024-08-19 02:50:15,621 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4240970.0, ans=0.0 2024-08-19 02:50:18,814 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 8500, loss[loss=0.1044, beats_loss=0.008831, ecapa_loss=0.0001865, whisper_loss=0.09366, over 21704.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01052, ecapa_loss=0.0001415, whisper_loss=0.08987, over 3912017.30 frames. ], batch size: 92, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:50:26,675 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4241070.0, ans=0.0 2024-08-19 02:50:31,378 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4241170.0, ans=0.0 2024-08-19 02:50:32,272 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-19 02:50:37,937 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.287e+01 2.464e+01 2.780e+01 3.780e+01, threshold=4.928e+01, percent-clipped=0.0 2024-08-19 02:50:38,784 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2024-08-19 02:50:48,177 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4241270.0, ans=0.125 2024-08-19 02:50:58,853 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-19 02:51:00,064 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 24 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-19 02:51:02,997 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4241370.0, ans=0.0 2024-08-19 02:51:08,992 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4241470.0, ans=0.0 2024-08-19 02:51:10,244 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4241470.0, ans=0.125 2024-08-19 02:51:16,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4241470.0, ans=0.0 2024-08-19 02:51:19,386 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 8550, loss[loss=0.1112, beats_loss=0.01009, ecapa_loss=0.0001534, whisper_loss=0.09959, over 19242.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01055, ecapa_loss=0.0001409, whisper_loss=0.09002, over 3878520.41 frames. ], batch size: 77, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:51:22,080 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 22 from LS+wenet, 8 from Vox, 26 fro AS 2024-08-19 02:51:27,292 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 23 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-19 02:51:43,468 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 17 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-19 02:51:45,123 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4241770.0, ans=0.1 2024-08-19 02:51:55,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4241870.0, ans=0.0 2024-08-19 02:52:04,677 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4241870.0, ans=0.125 2024-08-19 02:52:07,286 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.25 vs. limit=15.0 2024-08-19 02:52:09,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4241970.0, ans=0.125 2024-08-19 02:52:21,322 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 8600, loss[loss=0.08623, beats_loss=0.01153, ecapa_loss=0.0001762, whisper_loss=0.07294, over 22034.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01055, ecapa_loss=0.0001417, whisper_loss=0.09022, over 3854202.27 frames. ], batch size: 93, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:52:29,020 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4242070.0, ans=0.1 2024-08-19 02:52:42,034 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.756e+01 2.302e+01 2.568e+01 2.849e+01 4.123e+01, threshold=5.135e+01, percent-clipped=0.0 2024-08-19 02:52:42,151 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 17 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-19 02:52:43,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4242170.0, ans=0.1 2024-08-19 02:52:49,104 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4242270.0, ans=0.1 2024-08-19 02:52:56,142 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 23 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-19 02:53:05,841 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-19 02:53:19,029 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=4242470.0, ans=0.2 2024-08-19 02:53:23,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4242470.0, ans=0.0 2024-08-19 02:53:30,359 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 8650, loss[loss=0.1247, beats_loss=0.006386, ecapa_loss=0.0002025, whisper_loss=0.1163, over 17043.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01052, ecapa_loss=0.0001428, whisper_loss=0.08946, over 3851123.33 frames. ], batch size: 67, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:54:04,690 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 21 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-19 02:54:12,073 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2024-08-19 02:54:19,308 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4242870.0, ans=0.0 2024-08-19 02:54:21,034 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4242870.0, ans=0.0 2024-08-19 02:54:21,185 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.54 vs. limit=22.5 2024-08-19 02:54:22,049 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4242870.0, ans=0.125 2024-08-19 02:54:22,096 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4242870.0, ans=0.125 2024-08-19 02:54:44,485 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 8700, loss[loss=0.09103, beats_loss=0.01177, ecapa_loss=0.0001117, whisper_loss=0.07814, over 16136.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01047, ecapa_loss=0.0001434, whisper_loss=0.08975, over 3854556.38 frames. ], batch size: 62, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:54:48,613 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4243070.0, ans=0.025 2024-08-19 02:55:01,569 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2024-08-19 02:55:04,108 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.365e+01 2.539e+01 2.840e+01 3.770e+01, threshold=5.079e+01, percent-clipped=0.0 2024-08-19 02:55:10,214 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-19 02:55:11,574 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 33 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-19 02:55:15,526 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.89 vs. limit=15.0 2024-08-19 02:55:36,924 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4243470.0, ans=0.1 2024-08-19 02:55:45,118 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 8750, loss[loss=0.1223, beats_loss=0.008001, ecapa_loss=0.0001335, whisper_loss=0.113, over 15316.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01037, ecapa_loss=0.0001433, whisper_loss=0.09058, over 3878579.16 frames. ], batch size: 55, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:55:51,647 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-19 02:55:55,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4243570.0, ans=0.0 2024-08-19 02:56:17,077 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 30 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-19 02:56:23,206 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-19 02:56:37,711 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 15 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-19 02:56:46,334 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 8800, loss[loss=0.1131, beats_loss=0.008963, ecapa_loss=0.0001006, whisper_loss=0.1031, over 14863.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01044, ecapa_loss=0.0001428, whisper_loss=0.09051, over 3889067.67 frames. ], batch size: 53, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:56:47,792 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=4244070.0, ans=10.0 2024-08-19 02:56:49,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4244070.0, ans=0.07 2024-08-19 02:56:56,261 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 02:57:00,385 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4244170.0, ans=0.0 2024-08-19 02:57:04,870 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 29 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-19 02:57:05,907 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.929e+01 2.300e+01 2.475e+01 2.796e+01 3.899e+01, threshold=4.950e+01, percent-clipped=0.0 2024-08-19 02:57:08,623 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4244170.0, ans=0.125 2024-08-19 02:57:35,252 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 27 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-19 02:57:39,195 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 28 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-19 02:57:44,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4244470.0, ans=0.1 2024-08-19 02:57:47,292 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.71 vs. limit=15.0 2024-08-19 02:57:47,820 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 8850, loss[loss=0.08982, beats_loss=0.01158, ecapa_loss=0.0001313, whisper_loss=0.07694, over 18792.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01058, ecapa_loss=0.0001421, whisper_loss=0.08943, over 3879123.81 frames. ], batch size: 73, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:57:54,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4244570.0, ans=0.125 2024-08-19 02:58:02,116 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.08 vs. limit=22.5 2024-08-19 02:58:06,330 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 23 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-19 02:58:09,989 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 22 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-19 02:58:11,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4244770.0, ans=0.2 2024-08-19 02:58:36,803 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.92 vs. limit=15.0 2024-08-19 02:58:41,387 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.79 vs. limit=22.5 2024-08-19 02:58:46,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4244970.0, ans=0.0 2024-08-19 02:58:46,495 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.95 vs. limit=15.0 2024-08-19 02:58:49,451 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 8900, loss[loss=0.09874, beats_loss=0.008994, ecapa_loss=0.0001603, whisper_loss=0.08814, over 21645.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01059, ecapa_loss=0.0001419, whisper_loss=0.08947, over 3874397.63 frames. ], batch size: 89, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:59:00,137 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.08 vs. limit=15.0 2024-08-19 02:59:08,990 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.326e+01 2.543e+01 2.750e+01 4.033e+01, threshold=5.087e+01, percent-clipped=0.0 2024-08-19 02:59:23,254 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.324e+00 2024-08-19 02:59:26,065 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.87 vs. limit=15.0 2024-08-19 02:59:35,442 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 15 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-19 02:59:40,819 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4245470.0, ans=0.125 2024-08-19 02:59:41,948 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4245470.0, ans=0.125 2024-08-19 02:59:47,254 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.93 vs. limit=22.5 2024-08-19 02:59:47,932 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-19 02:59:50,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4245570.0, ans=0.1 2024-08-19 02:59:51,576 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 8950, loss[loss=0.1139, beats_loss=0.01033, ecapa_loss=0.0001315, whisper_loss=0.1022, over 18827.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01063, ecapa_loss=0.0001416, whisper_loss=0.08904, over 3862713.30 frames. ], batch size: 74, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:59:56,771 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-19 02:59:58,290 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4245570.0, ans=0.0 2024-08-19 03:00:13,461 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4245670.0, ans=0.125 2024-08-19 03:00:24,331 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4245770.0, ans=0.125 2024-08-19 03:00:41,974 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4245970.0, ans=0.05 2024-08-19 03:00:45,599 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4245970.0, ans=0.04949747468305833 2024-08-19 03:00:53,891 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 9000, loss[loss=0.1012, beats_loss=0.0111, ecapa_loss=0.0001335, whisper_loss=0.0888, over 16959.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01061, ecapa_loss=0.000141, whisper_loss=0.08972, over 3884827.02 frames. ], batch size: 68, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 03:00:53,891 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-19 03:01:28,706 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.2738, 0.5956, 2.2655, 1.5222, 1.3647, 1.5890, 2.1789, 2.1249], device='cuda:2') 2024-08-19 03:01:30,308 INFO [train_multi_KD3.py:1149] (2/4) Epoch 29, validation on ASR_libri: loss=0.2527, beats_loss=0, ecapa_loss=0.0005203, whisper_loss=0.2475, over 922467.00 frames. 2024-08-19 03:01:46,101 INFO [train_multi_KD3.py:1149] (2/4) Epoch 29, validation on SV_voxceleb1: loss=0.004041, beats_loss=0, ecapa_loss=0.0004041, whisper_loss=0, over 939242.00 frames. 2024-08-19 03:03:34,147 INFO [train_multi_KD3.py:1149] (2/4) Epoch 29, validation on AT_audioset: loss=0.02316, beats_loss=0.02316, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 03:03:34,151 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31584MB 2024-08-19 03:03:38,129 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4246070.0, ans=0.125 2024-08-19 03:03:53,422 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.336e+01 2.586e+01 2.879e+01 3.784e+01, threshold=5.173e+01, percent-clipped=0.0 2024-08-19 03:04:10,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4246370.0, ans=0.1 2024-08-19 03:04:18,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4246370.0, ans=0.0 2024-08-19 03:04:26,999 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 15 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-19 03:04:34,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4246570.0, ans=0.2 2024-08-19 03:04:35,618 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 9050, loss[loss=0.08502, beats_loss=0.01285, ecapa_loss=0.0001456, whisper_loss=0.07071, over 22561.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0105, ecapa_loss=0.0001428, whisper_loss=0.08997, over 3866593.04 frames. ], batch size: 95, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 03:04:39,491 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 31 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-19 03:04:44,530 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4246570.0, ans=0.0 2024-08-19 03:04:47,952 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 25 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-19 03:04:48,180 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4246670.0, ans=0.1 2024-08-19 03:05:01,827 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-19 03:05:05,313 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-19 03:05:20,265 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 16 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-19 03:05:21,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4246870.0, ans=0.0 2024-08-19 03:05:22,785 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 24 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-19 03:05:37,340 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 9100, loss[loss=0.1374, beats_loss=0.007195, ecapa_loss=0.0001289, whisper_loss=0.1289, over 15944.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01043, ecapa_loss=0.0001436, whisper_loss=0.09094, over 3856196.51 frames. ], batch size: 59, lr: 2.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:05:48,587 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-19 03:05:54,737 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 30 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-19 03:05:58,188 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.412e+01 2.655e+01 2.932e+01 4.507e+01, threshold=5.309e+01, percent-clipped=0.0 2024-08-19 03:05:59,786 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4247170.0, ans=0.125 2024-08-19 03:06:14,383 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-19 03:06:14,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4247370.0, ans=0.125 2024-08-19 03:06:34,859 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 29 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-19 03:06:38,268 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 9150, loss[loss=0.1054, beats_loss=0.01011, ecapa_loss=0.0001026, whisper_loss=0.09424, over 23331.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01039, ecapa_loss=0.0001422, whisper_loss=0.09119, over 3883249.90 frames. ], batch size: 90, lr: 2.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:06:41,044 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 26 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-19 03:06:48,494 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 36 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-19 03:06:51,628 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.04 vs. limit=6.0 2024-08-19 03:06:52,451 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 03:06:55,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4247670.0, ans=0.125 2024-08-19 03:07:12,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4247770.0, ans=0.2 2024-08-19 03:07:42,270 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 9200, loss[loss=0.08371, beats_loss=0.01058, ecapa_loss=0.0001589, whisper_loss=0.07155, over 13188.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01037, ecapa_loss=0.0001423, whisper_loss=0.09089, over 3844254.45 frames. ], batch size: 55, lr: 2.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:07:42,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4248070.0, ans=0.1 2024-08-19 03:07:45,040 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.037e+05 2024-08-19 03:07:47,841 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4248070.0, ans=0.0 2024-08-19 03:07:50,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4248070.0, ans=0.0 2024-08-19 03:07:56,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4248170.0, ans=0.0 2024-08-19 03:08:03,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4248170.0, ans=0.2 2024-08-19 03:08:04,113 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.284e+01 2.554e+01 2.841e+01 4.515e+02, threshold=5.108e+01, percent-clipped=1.0 2024-08-19 03:08:17,413 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-19 03:08:29,218 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4248370.0, ans=0.1 2024-08-19 03:08:34,603 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.16 vs. limit=15.0 2024-08-19 03:08:46,920 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.84 vs. limit=5.0 2024-08-19 03:08:47,108 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 9250, loss[loss=0.1002, beats_loss=0.008636, ecapa_loss=0.0001644, whisper_loss=0.08988, over 21390.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01037, ecapa_loss=0.0001426, whisper_loss=0.09105, over 3868315.77 frames. ], batch size: 90, lr: 2.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:09:07,410 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4248670.0, ans=0.125 2024-08-19 03:09:16,689 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4248770.0, ans=0.0 2024-08-19 03:09:18,701 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-19 03:09:27,588 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.961e+01 2024-08-19 03:09:33,676 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 12 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-19 03:09:35,526 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.99 vs. limit=15.0 2024-08-19 03:09:45,671 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 26 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-19 03:09:54,404 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 9300, loss[loss=0.09958, beats_loss=0.009896, ecapa_loss=0.0001273, whisper_loss=0.08841, over 15973.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01036, ecapa_loss=0.0001417, whisper_loss=0.09144, over 3848261.79 frames. ], batch size: 60, lr: 2.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:09:57,104 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 22 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-19 03:10:15,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=4249170.0, ans=0.02 2024-08-19 03:10:17,512 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.468e+01 2.687e+01 3.069e+01 9.721e+01, threshold=5.373e+01, percent-clipped=1.0 2024-08-19 03:10:21,413 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-19 03:10:32,651 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-19 03:10:55,036 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-19 03:10:57,401 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-19 03:11:01,529 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 9350, loss[loss=0.09711, beats_loss=0.01082, ecapa_loss=0.0001916, whisper_loss=0.08438, over 20488.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0104, ecapa_loss=0.0001423, whisper_loss=0.0913, over 3873146.54 frames. ], batch size: 89, lr: 2.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:11:11,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=4249570.0, ans=0.1 2024-08-19 03:11:13,686 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-19 03:11:25,334 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 27 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-19 03:11:29,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4249770.0, ans=0.0 2024-08-19 03:11:31,089 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4249770.0, ans=0.125 2024-08-19 03:11:35,662 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-19 03:11:37,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4249770.0, ans=0.125 2024-08-19 03:11:41,616 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4249870.0, ans=0.04949747468305833 2024-08-19 03:11:43,190 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4249870.0, ans=0.125 2024-08-19 03:11:47,873 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 21 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-19 03:11:54,838 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4249970.0, ans=0.2 2024-08-19 03:12:08,081 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 9400, loss[loss=0.1071, beats_loss=0.009965, ecapa_loss=0.0001518, whisper_loss=0.09558, over 19425.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01041, ecapa_loss=0.0001425, whisper_loss=0.09114, over 3874730.32 frames. ], batch size: 77, lr: 2.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:12:16,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4250070.0, ans=0.1 2024-08-19 03:12:32,927 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.288e+01 2.511e+01 2.753e+01 4.434e+01, threshold=5.022e+01, percent-clipped=0.0 2024-08-19 03:12:47,019 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-19 03:12:47,636 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.94 vs. limit=15.0 2024-08-19 03:12:58,361 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4250370.0, ans=0.125 2024-08-19 03:12:59,611 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4250370.0, ans=0.125 2024-08-19 03:13:01,885 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 19 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-19 03:13:12,847 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4250470.0, ans=0.125 2024-08-19 03:13:17,887 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 9450, loss[loss=0.09396, beats_loss=0.009749, ecapa_loss=0.000159, whisper_loss=0.08262, over 21373.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01047, ecapa_loss=0.0001418, whisper_loss=0.09054, over 3874778.53 frames. ], batch size: 89, lr: 2.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:13:19,426 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4250570.0, ans=0.125 2024-08-19 03:13:25,086 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 17 from LS+wenet, 31 from Vox, 30 fro AS 2024-08-19 03:13:37,713 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4250670.0, ans=0.0 2024-08-19 03:13:48,149 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-19 03:14:04,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4250870.0, ans=0.125 2024-08-19 03:14:04,577 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 03:14:23,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4250970.0, ans=0.125 2024-08-19 03:14:26,766 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 9500, loss[loss=0.1032, beats_loss=0.008245, ecapa_loss=0.0001718, whisper_loss=0.09319, over 14712.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01045, ecapa_loss=0.0001426, whisper_loss=0.09039, over 3904481.16 frames. ], batch size: 61, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:14:34,739 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.67 vs. limit=15.0 2024-08-19 03:14:44,098 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4251170.0, ans=0.125 2024-08-19 03:14:50,643 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.743e+01 2.262e+01 2.501e+01 2.810e+01 4.302e+01, threshold=5.003e+01, percent-clipped=0.0 2024-08-19 03:14:58,572 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4251270.0, ans=0.1 2024-08-19 03:14:59,813 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=4251270.0, ans=6.0 2024-08-19 03:15:00,716 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4251270.0, ans=0.125 2024-08-19 03:15:07,257 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-19 03:15:17,894 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-19 03:15:36,375 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 9550, loss[loss=0.1108, beats_loss=0.01026, ecapa_loss=0.0001182, whisper_loss=0.09938, over 22857.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01045, ecapa_loss=0.000143, whisper_loss=0.09029, over 3925335.82 frames. ], batch size: 90, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:15:36,615 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-19 03:15:53,848 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4251670.0, ans=0.1 2024-08-19 03:15:54,714 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 22 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-19 03:16:00,146 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4251670.0, ans=0.04949747468305833 2024-08-19 03:16:08,233 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 27 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-19 03:16:10,472 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=6.84 vs. limit=12.0 2024-08-19 03:16:16,542 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 25 from Vox, 22 fro AS 2024-08-19 03:16:27,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4251870.0, ans=0.125 2024-08-19 03:16:27,424 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4251870.0, ans=0.2 2024-08-19 03:16:31,227 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-19 03:16:41,282 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.70 vs. limit=22.5 2024-08-19 03:16:44,659 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 9600, loss[loss=0.09673, beats_loss=0.01164, ecapa_loss=0.0001206, whisper_loss=0.08388, over 18657.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01047, ecapa_loss=0.0001434, whisper_loss=0.09002, over 3887165.03 frames. ], batch size: 72, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:16:49,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4252070.0, ans=0.125 2024-08-19 03:16:51,950 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4252070.0, ans=0.125 2024-08-19 03:17:08,140 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.320e+01 2.551e+01 2.874e+01 5.589e+01, threshold=5.101e+01, percent-clipped=1.0 2024-08-19 03:17:12,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4252270.0, ans=0.125 2024-08-19 03:17:14,323 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4252270.0, ans=0.125 2024-08-19 03:17:29,477 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4252370.0, ans=0.125 2024-08-19 03:17:38,723 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4252470.0, ans=0.0 2024-08-19 03:17:45,278 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4252470.0, ans=0.0 2024-08-19 03:17:52,969 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 9650, loss[loss=0.08212, beats_loss=0.01272, ecapa_loss=0.0001382, whisper_loss=0.06801, over 14960.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0104, ecapa_loss=0.0001437, whisper_loss=0.08986, over 3838579.38 frames. ], batch size: 65, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:17:53,734 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.640e-03 2024-08-19 03:18:01,711 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4252570.0, ans=0.09899494936611666 2024-08-19 03:18:09,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4252670.0, ans=0.125 2024-08-19 03:18:12,457 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4252670.0, ans=0.0 2024-08-19 03:18:33,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4252870.0, ans=0.1 2024-08-19 03:18:45,269 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 18 from LS+wenet, 23 from Vox, 50 fro AS 2024-08-19 03:18:54,098 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4252970.0, ans=0.125 2024-08-19 03:18:57,226 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-19 03:19:02,190 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 9700, loss[loss=0.09616, beats_loss=0.00937, ecapa_loss=0.0001409, whisper_loss=0.08538, over 19598.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01039, ecapa_loss=0.0001437, whisper_loss=0.0899, over 3867385.48 frames. ], batch size: 77, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:19:03,679 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 26 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-19 03:19:10,623 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.12 vs. limit=6.0 2024-08-19 03:19:11,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4253070.0, ans=0.125 2024-08-19 03:19:24,448 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.350e+01 2.550e+01 2.854e+01 4.797e+01, threshold=5.099e+01, percent-clipped=0.0 2024-08-19 03:19:25,103 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4253170.0, ans=0.0 2024-08-19 03:19:37,482 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 16 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-19 03:19:41,498 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4253370.0, ans=0.125 2024-08-19 03:19:43,248 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.91 vs. limit=12.0 2024-08-19 03:19:56,269 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 21 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-19 03:20:09,344 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 9750, loss[loss=0.1102, beats_loss=0.00973, ecapa_loss=0.000125, whisper_loss=0.09924, over 18675.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01046, ecapa_loss=0.0001423, whisper_loss=0.08923, over 3843390.70 frames. ], batch size: 69, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:20:17,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4253570.0, ans=0.0 2024-08-19 03:20:37,203 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 22 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-19 03:20:41,969 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4253770.0, ans=0.025 2024-08-19 03:20:47,048 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4253770.0, ans=0.1 2024-08-19 03:21:04,731 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 27 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-19 03:21:11,785 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 28 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-19 03:21:16,994 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 9800, loss[loss=0.1072, beats_loss=0.01096, ecapa_loss=0.0001247, whisper_loss=0.09495, over 17425.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01046, ecapa_loss=0.0001421, whisper_loss=0.08972, over 3841624.07 frames. ], batch size: 67, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:21:17,519 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4254070.0, ans=0.0 2024-08-19 03:21:33,005 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-19 03:21:38,882 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4254170.0, ans=0.0 2024-08-19 03:21:40,857 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.265e+01 2.575e+01 2.940e+01 5.043e+01, threshold=5.150e+01, percent-clipped=0.0 2024-08-19 03:21:47,799 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4254270.0, ans=0.0 2024-08-19 03:21:50,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4254270.0, ans=0.125 2024-08-19 03:21:57,407 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.76 vs. limit=12.0 2024-08-19 03:22:02,668 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 23 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-19 03:22:13,971 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-19 03:22:14,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4254470.0, ans=0.035 2024-08-19 03:22:16,674 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 30 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-19 03:22:21,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4254470.0, ans=0.125 2024-08-19 03:22:22,009 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-19 03:22:27,194 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 9850, loss[loss=0.1019, beats_loss=0.01147, ecapa_loss=0.0001047, whisper_loss=0.08936, over 18763.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01048, ecapa_loss=0.0001419, whisper_loss=0.09026, over 3874176.24 frames. ], batch size: 74, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:22:34,937 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4254570.0, ans=0.1 2024-08-19 03:22:40,660 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4254670.0, ans=0.125 2024-08-19 03:22:45,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4254670.0, ans=0.04949747468305833 2024-08-19 03:22:50,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4254670.0, ans=0.125 2024-08-19 03:22:50,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4254670.0, ans=0.2 2024-08-19 03:22:54,195 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-19 03:23:04,247 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4254770.0, ans=0.125 2024-08-19 03:23:07,779 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.27 vs. limit=10.0 2024-08-19 03:23:12,586 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-19 03:23:12,798 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4254870.0, ans=0.0 2024-08-19 03:23:18,064 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.15 vs. limit=15.0 2024-08-19 03:23:36,019 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.73 vs. limit=10.0 2024-08-19 03:23:38,059 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 9900, loss[loss=0.08244, beats_loss=0.01061, ecapa_loss=0.000144, whisper_loss=0.07039, over 21516.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01052, ecapa_loss=0.0001428, whisper_loss=0.09031, over 3896599.35 frames. ], batch size: 90, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:23:38,154 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-19 03:23:46,100 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.33 vs. limit=10.0 2024-08-19 03:23:50,104 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4255070.0, ans=0.0 2024-08-19 03:23:56,999 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4255170.0, ans=0.05 2024-08-19 03:24:01,888 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.746e+01 2.258e+01 2.526e+01 2.831e+01 1.628e+02, threshold=5.053e+01, percent-clipped=0.0 2024-08-19 03:24:08,747 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4255270.0, ans=10.0 2024-08-19 03:24:18,203 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4255270.0, ans=0.09899494936611666 2024-08-19 03:24:24,702 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-19 03:24:29,562 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4255370.0, ans=0.0 2024-08-19 03:24:37,114 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4255470.0, ans=0.125 2024-08-19 03:24:39,938 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 03:24:45,413 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.37 vs. limit=15.0 2024-08-19 03:24:49,067 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 9950, loss[loss=0.1234, beats_loss=0.009683, ecapa_loss=0.0001338, whisper_loss=0.1124, over 17923.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01047, ecapa_loss=0.0001435, whisper_loss=0.0906, over 3897631.56 frames. ], batch size: 71, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:25:00,850 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.91 vs. limit=6.0 2024-08-19 03:25:13,589 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-19 03:25:23,180 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 22 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-19 03:25:33,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4255870.0, ans=0.1 2024-08-19 03:25:44,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4255870.0, ans=0.125 2024-08-19 03:25:47,848 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4255970.0, ans=0.2 2024-08-19 03:26:01,897 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4256070.0, ans=0.0 2024-08-19 03:26:02,734 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 10000, loss[loss=0.1026, beats_loss=0.01064, ecapa_loss=0.000125, whisper_loss=0.09066, over 21590.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01053, ecapa_loss=0.0001436, whisper_loss=0.09011, over 3904375.17 frames. ], batch size: 85, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:26:05,964 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-19 03:26:26,308 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 17 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-19 03:26:30,545 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.221e+01 2.512e+01 2.759e+01 2.738e+02, threshold=5.023e+01, percent-clipped=3.0 2024-08-19 03:26:34,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4256270.0, ans=0.1 2024-08-19 03:26:41,132 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-19 03:26:46,103 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 17 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-19 03:26:47,413 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 13 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-19 03:26:47,835 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.83 vs. limit=22.5 2024-08-19 03:26:59,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4256370.0, ans=0.125 2024-08-19 03:27:00,827 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 17 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-19 03:27:19,981 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 10050, loss[loss=0.08569, beats_loss=0.01187, ecapa_loss=0.000135, whisper_loss=0.07247, over 18894.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01056, ecapa_loss=0.0001427, whisper_loss=0.08956, over 3911177.31 frames. ], batch size: 79, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:27:20,594 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4256570.0, ans=0.125 2024-08-19 03:27:20,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4256570.0, ans=0.0 2024-08-19 03:27:28,893 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-19 03:27:34,019 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 16 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-19 03:27:36,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4256670.0, ans=0.1 2024-08-19 03:27:41,581 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4256670.0, ans=0.1 2024-08-19 03:27:47,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4256670.0, ans=0.125 2024-08-19 03:27:51,798 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 29 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-19 03:27:57,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4256770.0, ans=0.0 2024-08-19 03:28:01,998 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 19 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-19 03:28:06,861 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.662e-01 2024-08-19 03:28:15,857 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4256870.0, ans=0.0 2024-08-19 03:28:20,113 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-19 03:28:28,597 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.02 vs. limit=15.0 2024-08-19 03:28:34,487 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.52 vs. limit=22.5 2024-08-19 03:28:36,297 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 10100, loss[loss=0.0974, beats_loss=0.01101, ecapa_loss=0.0001372, whisper_loss=0.08502, over 17297.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01055, ecapa_loss=0.000143, whisper_loss=0.08952, over 3886033.10 frames. ], batch size: 71, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:28:39,138 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-19 03:28:58,643 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4257170.0, ans=0.125 2024-08-19 03:29:02,274 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.774e+01 2.419e+01 2.709e+01 3.030e+01 4.080e+01, threshold=5.418e+01, percent-clipped=0.0 2024-08-19 03:29:03,270 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.54 vs. limit=22.5 2024-08-19 03:29:04,286 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.51 vs. limit=15.0 2024-08-19 03:29:16,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4257270.0, ans=0.0 2024-08-19 03:29:18,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4257270.0, ans=0.0 2024-08-19 03:29:21,799 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-19 03:29:41,587 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4257470.0, ans=0.2 2024-08-19 03:29:56,146 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 10150, loss[loss=0.1087, beats_loss=0.008782, ecapa_loss=0.0001708, whisper_loss=0.09822, over 15744.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01052, ecapa_loss=0.0001429, whisper_loss=0.09009, over 3923481.23 frames. ], batch size: 65, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:30:01,575 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-19 03:30:13,402 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4257670.0, ans=0.125 2024-08-19 03:30:22,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4257670.0, ans=0.125 2024-08-19 03:30:23,691 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4257670.0, ans=0.2 2024-08-19 03:30:30,928 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4257770.0, ans=0.0 2024-08-19 03:30:39,290 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.70 vs. limit=22.5 2024-08-19 03:31:01,055 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.84 vs. limit=15.0 2024-08-19 03:31:08,673 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 10200, loss[loss=0.07574, beats_loss=0.01216, ecapa_loss=0.0001125, whisper_loss=0.06246, over 16788.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01052, ecapa_loss=0.0001424, whisper_loss=0.08979, over 3924085.97 frames. ], batch size: 65, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:31:15,116 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4258070.0, ans=0.125 2024-08-19 03:31:18,025 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4258070.0, ans=0.125 2024-08-19 03:31:32,156 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.806e+01 2.333e+01 2.551e+01 2.847e+01 4.838e+01, threshold=5.102e+01, percent-clipped=0.0 2024-08-19 03:31:36,756 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4258270.0, ans=0.125 2024-08-19 03:31:36,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4258270.0, ans=0.125 2024-08-19 03:32:17,785 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 10250, loss[loss=0.1082, beats_loss=0.009759, ecapa_loss=0.0001269, whisper_loss=0.09714, over 20795.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01052, ecapa_loss=0.0001426, whisper_loss=0.08971, over 3910376.53 frames. ], batch size: 80, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:32:24,634 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4258570.0, ans=0.04949747468305833 2024-08-19 03:32:36,545 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 12 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-19 03:32:49,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4258770.0, ans=0.09899494936611666 2024-08-19 03:33:00,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4258870.0, ans=0.0 2024-08-19 03:33:09,077 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.35 vs. limit=15.0 2024-08-19 03:33:27,150 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 10300, loss[loss=0.09957, beats_loss=0.01186, ecapa_loss=0.0001073, whisper_loss=0.08663, over 21097.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01057, ecapa_loss=0.0001423, whisper_loss=0.08912, over 3913078.02 frames. ], batch size: 83, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:33:46,087 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-19 03:33:47,186 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-19 03:33:49,511 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.334e+01 2.546e+01 2.816e+01 4.072e+01, threshold=5.092e+01, percent-clipped=0.0 2024-08-19 03:34:02,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4259270.0, ans=0.125 2024-08-19 03:34:04,450 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 24 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-19 03:34:08,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4259370.0, ans=0.04949747468305833 2024-08-19 03:34:11,553 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4259370.0, ans=0.2 2024-08-19 03:34:13,819 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4259370.0, ans=0.1 2024-08-19 03:34:17,937 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4259370.0, ans=0.2 2024-08-19 03:34:35,653 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 10350, loss[loss=0.1016, beats_loss=0.01072, ecapa_loss=0.0001074, whisper_loss=0.08978, over 24201.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01058, ecapa_loss=0.0001427, whisper_loss=0.08873, over 3904292.64 frames. ], batch size: 92, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:34:36,102 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-19 03:34:38,668 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.47 vs. limit=22.5 2024-08-19 03:35:01,741 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.108e+01 2024-08-19 03:35:14,427 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 17 from LS+wenet, 28 from Vox, 48 fro AS 2024-08-19 03:35:17,396 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4259770.0, ans=0.125 2024-08-19 03:35:25,342 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 17 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-19 03:35:27,113 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-19 03:35:43,633 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.32 vs. limit=15.0 2024-08-19 03:35:48,027 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 10400, loss[loss=0.09695, beats_loss=0.009973, ecapa_loss=0.0001378, whisper_loss=0.0856, over 22868.00 frames. ], tot_loss[loss=0.09996, beats_loss=0.01067, ecapa_loss=0.0001415, whisper_loss=0.08788, over 3920169.70 frames. ], batch size: 90, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:36:02,818 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=19.94 vs. limit=15.0 2024-08-19 03:36:05,822 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.48 vs. limit=15.0 2024-08-19 03:36:11,406 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.301e+01 2.515e+01 2.779e+01 4.056e+01, threshold=5.030e+01, percent-clipped=0.0 2024-08-19 03:36:23,489 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 18 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-19 03:36:28,643 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 19 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-19 03:36:33,224 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.12 vs. limit=15.0 2024-08-19 03:36:35,959 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.27 vs. limit=15.0 2024-08-19 03:36:40,655 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 16 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-19 03:36:42,042 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 35 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-19 03:36:54,932 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.61 vs. limit=15.0 2024-08-19 03:36:55,418 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 10450, loss[loss=0.101, beats_loss=0.009374, ecapa_loss=0.0001643, whisper_loss=0.08997, over 17267.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01057, ecapa_loss=0.0001416, whisper_loss=0.08817, over 3861946.42 frames. ], batch size: 69, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:37:05,745 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.21 vs. limit=22.5 2024-08-19 03:37:13,477 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4260670.0, ans=0.0 2024-08-19 03:37:16,985 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.87 vs. limit=22.5 2024-08-19 03:37:26,991 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 03:37:43,012 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4260870.0, ans=0.125 2024-08-19 03:37:50,803 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4260870.0, ans=0.2 2024-08-19 03:37:54,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=4260970.0, ans=0.025 2024-08-19 03:37:54,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=4260970.0, ans=22.5 2024-08-19 03:38:00,992 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4260970.0, ans=0.125 2024-08-19 03:38:02,416 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-19 03:38:07,855 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 10500, loss[loss=0.09281, beats_loss=0.01248, ecapa_loss=0.0001405, whisper_loss=0.07892, over 15030.00 frames. ], tot_loss[loss=0.09988, beats_loss=0.01058, ecapa_loss=0.000142, whisper_loss=0.08789, over 3867395.53 frames. ], batch size: 63, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:38:09,711 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4261070.0, ans=0.0 2024-08-19 03:38:12,666 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4261070.0, ans=0.125 2024-08-19 03:38:22,750 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.23 vs. limit=15.0 2024-08-19 03:38:24,678 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-19 03:38:31,475 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.267e+01 2.482e+01 2.765e+01 1.931e+02, threshold=4.963e+01, percent-clipped=1.0 2024-08-19 03:38:37,456 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-19 03:39:06,206 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4261470.0, ans=0.0 2024-08-19 03:39:07,411 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4261470.0, ans=0.125 2024-08-19 03:39:15,923 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4261470.0, ans=0.2 2024-08-19 03:39:18,193 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 10550, loss[loss=0.1076, beats_loss=0.009561, ecapa_loss=0.0001561, whisper_loss=0.09644, over 20084.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01051, ecapa_loss=0.0001421, whisper_loss=0.08843, over 3836253.29 frames. ], batch size: 81, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:39:19,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4261570.0, ans=0.125 2024-08-19 03:39:29,710 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4261570.0, ans=0.125 2024-08-19 03:39:29,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4261570.0, ans=0.125 2024-08-19 03:39:43,331 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4261670.0, ans=0.125 2024-08-19 03:40:09,207 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 28 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-19 03:40:11,174 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-19 03:40:28,612 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 10600, loss[loss=0.1172, beats_loss=0.007289, ecapa_loss=0.0001689, whisper_loss=0.1082, over 23159.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01045, ecapa_loss=0.0001428, whisper_loss=0.08876, over 3857188.09 frames. ], batch size: 94, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:40:52,818 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.950e+01 2.423e+01 2.630e+01 2.907e+01 3.949e+01, threshold=5.261e+01, percent-clipped=0.0 2024-08-19 03:40:54,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4262170.0, ans=0.0 2024-08-19 03:40:58,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4262270.0, ans=0.125 2024-08-19 03:41:03,446 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 31 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-19 03:41:23,130 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4262470.0, ans=0.125 2024-08-19 03:41:24,924 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4262470.0, ans=0.0 2024-08-19 03:41:37,482 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 10650, loss[loss=0.08443, beats_loss=0.009272, ecapa_loss=0.0001792, whisper_loss=0.07337, over 15431.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01046, ecapa_loss=0.0001419, whisper_loss=0.08834, over 3829355.01 frames. ], batch size: 68, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:41:41,755 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4262570.0, ans=0.125 2024-08-19 03:42:03,396 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4262770.0, ans=0.125 2024-08-19 03:42:04,789 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4262770.0, ans=0.125 2024-08-19 03:42:06,835 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 14 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-19 03:42:11,541 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-19 03:42:13,177 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4262770.0, ans=0.125 2024-08-19 03:42:20,259 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4262870.0, ans=0.125 2024-08-19 03:42:36,700 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-19 03:42:42,513 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4262970.0, ans=0.1 2024-08-19 03:42:46,293 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 10700, loss[loss=0.08706, beats_loss=0.01136, ecapa_loss=0.0001394, whisper_loss=0.0743, over 17721.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01044, ecapa_loss=0.0001417, whisper_loss=0.0892, over 3838351.29 frames. ], batch size: 74, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:43:00,940 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4263170.0, ans=0.125 2024-08-19 03:43:07,166 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4263170.0, ans=0.125 2024-08-19 03:43:09,190 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.702e+01 2.321e+01 2.471e+01 2.734e+01 8.130e+01, threshold=4.942e+01, percent-clipped=1.0 2024-08-19 03:43:18,399 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.72 vs. limit=15.0 2024-08-19 03:43:22,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4263270.0, ans=0.1 2024-08-19 03:43:29,972 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4263370.0, ans=0.07 2024-08-19 03:43:32,377 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 19 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-19 03:43:39,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4263470.0, ans=0.1 2024-08-19 03:43:48,896 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4263470.0, ans=0.125 2024-08-19 03:43:53,777 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 10750, loss[loss=0.1022, beats_loss=0.01001, ecapa_loss=0.0001453, whisper_loss=0.09069, over 24281.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01045, ecapa_loss=0.0001422, whisper_loss=0.08964, over 3899433.75 frames. ], batch size: 95, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:44:11,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4263670.0, ans=0.125 2024-08-19 03:44:13,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4263670.0, ans=0.125 2024-08-19 03:44:18,724 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 27 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-19 03:44:25,221 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-19 03:44:30,971 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.61 vs. limit=15.0 2024-08-19 03:44:34,495 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4263870.0, ans=0.2 2024-08-19 03:44:37,335 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.93 vs. limit=22.5 2024-08-19 03:44:40,609 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4263870.0, ans=0.0 2024-08-19 03:44:46,899 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-19 03:44:55,027 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-19 03:45:00,426 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 10800, loss[loss=0.1067, beats_loss=0.01177, ecapa_loss=0.0001351, whisper_loss=0.09358, over 20808.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01045, ecapa_loss=0.0001424, whisper_loss=0.09016, over 3890847.23 frames. ], batch size: 84, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 03:45:01,483 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.29 vs. limit=15.0 2024-08-19 03:45:02,538 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4264070.0, ans=0.2 2024-08-19 03:45:23,290 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4264170.0, ans=0.2 2024-08-19 03:45:25,262 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.325e+01 2.616e+01 2.924e+01 8.173e+01, threshold=5.233e+01, percent-clipped=1.0 2024-08-19 03:45:34,497 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4264270.0, ans=0.125 2024-08-19 03:46:03,237 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-19 03:46:10,166 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 10850, loss[loss=0.1014, beats_loss=0.01133, ecapa_loss=0.0001331, whisper_loss=0.08869, over 22688.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01039, ecapa_loss=0.0001429, whisper_loss=0.09064, over 3891132.83 frames. ], batch size: 92, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 03:46:20,331 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4264570.0, ans=0.0 2024-08-19 03:46:24,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4264670.0, ans=0.2 2024-08-19 03:46:27,237 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4264670.0, ans=0.125 2024-08-19 03:46:35,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4264670.0, ans=0.2 2024-08-19 03:46:44,140 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-19 03:47:17,719 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4264870.0, ans=0.125 2024-08-19 03:47:18,952 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-19 03:47:51,074 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 10900, loss[loss=0.09023, beats_loss=0.01027, ecapa_loss=0.000129, whisper_loss=0.07866, over 20010.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01038, ecapa_loss=0.0001422, whisper_loss=0.09086, over 3912888.65 frames. ], batch size: 76, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 03:48:19,165 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.006e+01 2.332e+01 2.581e+01 2.915e+01 5.254e+01, threshold=5.161e+01, percent-clipped=1.0 2024-08-19 03:49:01,419 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4265470.0, ans=0.125 2024-08-19 03:49:05,886 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4265470.0, ans=0.2 2024-08-19 03:49:08,515 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 10950, loss[loss=0.1028, beats_loss=0.01133, ecapa_loss=0.0001345, whisper_loss=0.09012, over 14402.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01039, ecapa_loss=0.0001427, whisper_loss=0.09058, over 3896486.79 frames. ], batch size: 57, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 03:49:08,677 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 15 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-19 03:49:17,520 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 12 from Vox, 40 fro AS 2024-08-19 03:49:20,752 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-19 03:49:26,990 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4265670.0, ans=0.2 2024-08-19 03:49:29,380 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-19 03:49:30,101 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4265670.0, ans=0.125 2024-08-19 03:49:34,461 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4265670.0, ans=0.125 2024-08-19 03:49:52,654 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4265870.0, ans=0.0 2024-08-19 03:49:57,582 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4265870.0, ans=0.125 2024-08-19 03:49:59,327 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.06 vs. limit=15.0 2024-08-19 03:50:16,831 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.07 vs. limit=22.5 2024-08-19 03:50:22,920 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 11000, loss[loss=0.08439, beats_loss=0.0127, ecapa_loss=0.0001495, whisper_loss=0.07019, over 21638.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01038, ecapa_loss=0.0001438, whisper_loss=0.09065, over 3901767.43 frames. ], batch size: 92, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 03:50:31,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4266070.0, ans=0.125 2024-08-19 03:50:33,085 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=15.0 2024-08-19 03:50:35,594 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 26 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-19 03:50:40,242 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 22 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-19 03:50:48,749 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.345e+01 2.567e+01 2.803e+01 3.050e+02, threshold=5.135e+01, percent-clipped=1.0 2024-08-19 03:50:59,822 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 17 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-19 03:51:35,600 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 11050, loss[loss=0.08938, beats_loss=0.01006, ecapa_loss=0.0001402, whisper_loss=0.07792, over 13822.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01039, ecapa_loss=0.0001435, whisper_loss=0.09055, over 3903821.62 frames. ], batch size: 55, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 03:51:43,776 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.42 vs. limit=15.0 2024-08-19 03:51:48,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4266570.0, ans=0.125 2024-08-19 03:51:57,195 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.82 vs. limit=15.0 2024-08-19 03:52:04,079 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4266770.0, ans=0.125 2024-08-19 03:52:32,091 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4266870.0, ans=0.0 2024-08-19 03:52:44,385 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4266970.0, ans=0.125 2024-08-19 03:52:49,136 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 11100, loss[loss=0.07193, beats_loss=0.01269, ecapa_loss=0.0001182, whisper_loss=0.05806, over 18926.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0105, ecapa_loss=0.0001426, whisper_loss=0.08949, over 3910207.33 frames. ], batch size: 73, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 03:52:56,747 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4267070.0, ans=0.125 2024-08-19 03:52:58,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4267070.0, ans=0.2 2024-08-19 03:53:07,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4267170.0, ans=0.0 2024-08-19 03:53:14,772 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.398e+01 2.605e+01 2.848e+01 4.368e+01, threshold=5.210e+01, percent-clipped=0.0 2024-08-19 03:53:16,591 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 20 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-19 03:53:19,370 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 29 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-19 03:53:45,715 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4267470.0, ans=0.0 2024-08-19 03:53:59,358 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 34 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-19 03:54:00,469 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 11150, loss[loss=0.1316, beats_loss=0.009319, ecapa_loss=0.0001079, whisper_loss=0.1212, over 23401.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01041, ecapa_loss=0.0001415, whisper_loss=0.09044, over 3885771.97 frames. ], batch size: 85, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 03:54:04,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4267570.0, ans=0.035 2024-08-19 03:54:12,937 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4267570.0, ans=0.2 2024-08-19 03:54:21,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4267670.0, ans=0.1 2024-08-19 03:54:32,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4267770.0, ans=0.125 2024-08-19 03:54:40,480 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4267770.0, ans=0.125 2024-08-19 03:54:52,104 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.13 vs. limit=15.0 2024-08-19 03:55:11,492 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 11200, loss[loss=0.1198, beats_loss=0.008407, ecapa_loss=0.0001634, whisper_loss=0.1098, over 16850.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01039, ecapa_loss=0.0001417, whisper_loss=0.09031, over 3881540.89 frames. ], batch size: 65, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 03:55:27,026 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4268170.0, ans=0.125 2024-08-19 03:55:37,721 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.683e+01 2.383e+01 2.571e+01 2.965e+01 4.836e+01, threshold=5.143e+01, percent-clipped=0.0 2024-08-19 03:55:41,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=4268270.0, ans=15.0 2024-08-19 03:55:42,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4268270.0, ans=0.2 2024-08-19 03:56:05,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4268370.0, ans=0.125 2024-08-19 03:56:16,114 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 34 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-19 03:56:26,213 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 11250, loss[loss=0.1216, beats_loss=0.01064, ecapa_loss=0.000131, whisper_loss=0.1097, over 22958.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01046, ecapa_loss=0.0001425, whisper_loss=0.09008, over 3926624.18 frames. ], batch size: 90, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 03:56:34,116 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4268570.0, ans=0.0 2024-08-19 03:56:39,566 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 22 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-19 03:56:47,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=4268670.0, ans=0.5 2024-08-19 03:56:57,038 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4268770.0, ans=0.125 2024-08-19 03:56:59,336 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 32 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-19 03:57:13,205 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.39 vs. limit=22.5 2024-08-19 03:57:40,750 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 11300, loss[loss=0.1015, beats_loss=0.01043, ecapa_loss=0.00014, whisper_loss=0.08963, over 21932.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01045, ecapa_loss=0.0001419, whisper_loss=0.08977, over 3897645.16 frames. ], batch size: 91, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 03:57:51,710 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 15 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-19 03:58:05,688 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.950e+01 2.405e+01 2.659e+01 2.967e+01 4.406e+01, threshold=5.318e+01, percent-clipped=0.0 2024-08-19 03:58:09,022 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4269270.0, ans=0.0 2024-08-19 03:58:09,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4269270.0, ans=0.125 2024-08-19 03:58:11,976 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4269270.0, ans=0.125 2024-08-19 03:58:28,913 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4269370.0, ans=0.2 2024-08-19 03:58:36,328 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.98 vs. limit=15.0 2024-08-19 03:58:49,084 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-19 03:58:50,319 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 11350, loss[loss=0.1088, beats_loss=0.01071, ecapa_loss=0.0001203, whisper_loss=0.09685, over 18870.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01039, ecapa_loss=0.0001419, whisper_loss=0.08992, over 3882008.90 frames. ], batch size: 71, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 03:59:03,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4269570.0, ans=0.0 2024-08-19 03:59:23,269 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 14 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-19 03:59:32,344 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4269770.0, ans=0.125 2024-08-19 04:00:04,462 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 11400, loss[loss=0.09527, beats_loss=0.01284, ecapa_loss=0.0001083, whisper_loss=0.08135, over 23489.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01039, ecapa_loss=0.0001419, whisper_loss=0.08972, over 3861015.77 frames. ], batch size: 91, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:00:06,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4270070.0, ans=0.125 2024-08-19 04:00:07,905 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4270070.0, ans=0.0 2024-08-19 04:00:08,084 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.28 vs. limit=15.0 2024-08-19 04:00:09,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4270070.0, ans=0.125 2024-08-19 04:00:30,695 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.261e+01 2.453e+01 2.815e+01 3.733e+01, threshold=4.905e+01, percent-clipped=0.0 2024-08-19 04:00:37,667 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 25 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-19 04:00:45,871 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4270270.0, ans=0.0 2024-08-19 04:00:52,710 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4270370.0, ans=0.125 2024-08-19 04:00:54,251 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 04:01:11,052 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 12 from Vox, 47 fro AS 2024-08-19 04:01:14,132 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 23 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-19 04:01:15,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4270570.0, ans=0.0 2024-08-19 04:01:16,511 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 11450, loss[loss=0.09671, beats_loss=0.01009, ecapa_loss=0.0001572, whisper_loss=0.08505, over 22907.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01037, ecapa_loss=0.0001418, whisper_loss=0.09031, over 3849559.94 frames. ], batch size: 93, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:01:20,982 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 27 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-19 04:01:24,463 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 18 from Vox, 49 fro AS 2024-08-19 04:01:27,140 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4270570.0, ans=0.0 2024-08-19 04:01:34,530 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.50 vs. limit=15.0 2024-08-19 04:01:41,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4270670.0, ans=0.035 2024-08-19 04:01:43,655 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 21 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-19 04:02:00,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4270870.0, ans=0.1 2024-08-19 04:02:13,643 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.29 vs. limit=22.5 2024-08-19 04:02:19,111 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4270970.0, ans=0.07 2024-08-19 04:02:22,715 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.16 vs. limit=15.0 2024-08-19 04:02:23,849 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4270970.0, ans=0.0 2024-08-19 04:02:29,408 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 11500, loss[loss=0.1259, beats_loss=0.007363, ecapa_loss=0.0001534, whisper_loss=0.117, over 15060.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01042, ecapa_loss=0.0001416, whisper_loss=0.09039, over 3855106.10 frames. ], batch size: 57, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:02:41,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4271070.0, ans=0.2 2024-08-19 04:02:41,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4271070.0, ans=0.04949747468305833 2024-08-19 04:02:49,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4271170.0, ans=0.1 2024-08-19 04:02:56,952 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.93 vs. limit=15.0 2024-08-19 04:02:57,420 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.365e+01 2.674e+01 3.024e+01 4.760e+02, threshold=5.347e+01, percent-clipped=3.0 2024-08-19 04:02:59,444 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-19 04:03:30,849 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4271370.0, ans=0.0 2024-08-19 04:03:36,469 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 32 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-19 04:03:40,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4271470.0, ans=0.0 2024-08-19 04:03:44,064 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-19 04:03:48,905 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 11550, loss[loss=0.09925, beats_loss=0.01093, ecapa_loss=0.000142, whisper_loss=0.0869, over 16593.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01041, ecapa_loss=0.0001419, whisper_loss=0.09025, over 3866699.78 frames. ], batch size: 66, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:03:49,310 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4271570.0, ans=0.0 2024-08-19 04:03:52,163 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-19 04:04:07,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=4271670.0, ans=0.05 2024-08-19 04:04:11,521 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.41 vs. limit=15.0 2024-08-19 04:04:15,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4271670.0, ans=0.0 2024-08-19 04:04:30,292 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 23 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-19 04:04:35,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4271870.0, ans=0.04949747468305833 2024-08-19 04:04:41,527 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4271870.0, ans=0.125 2024-08-19 04:04:43,069 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4271870.0, ans=0.125 2024-08-19 04:04:45,973 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4271870.0, ans=0.0 2024-08-19 04:04:46,038 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4271870.0, ans=0.0 2024-08-19 04:04:57,623 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4271970.0, ans=0.0 2024-08-19 04:05:07,664 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 11600, loss[loss=0.1077, beats_loss=0.01053, ecapa_loss=0.0001519, whisper_loss=0.09562, over 21974.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0104, ecapa_loss=0.0001415, whisper_loss=0.09057, over 3914737.67 frames. ], batch size: 91, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:05:09,892 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-19 04:05:10,475 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.90 vs. limit=15.0 2024-08-19 04:05:12,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4272070.0, ans=0.125 2024-08-19 04:05:22,464 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-19 04:05:22,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4272170.0, ans=0.125 2024-08-19 04:05:34,371 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 12 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-19 04:05:35,356 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.347e+01 2.564e+01 2.905e+01 5.911e+01, threshold=5.128e+01, percent-clipped=1.0 2024-08-19 04:05:45,078 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-19 04:06:23,641 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-19 04:06:23,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4272470.0, ans=0.125 2024-08-19 04:06:25,982 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.50 vs. limit=15.0 2024-08-19 04:06:29,689 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 11650, loss[loss=0.1039, beats_loss=0.01043, ecapa_loss=0.0001461, whisper_loss=0.09201, over 21906.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01041, ecapa_loss=0.0001423, whisper_loss=0.09059, over 3921583.17 frames. ], batch size: 87, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:06:32,783 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.77 vs. limit=15.0 2024-08-19 04:07:00,065 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4272670.0, ans=0.125 2024-08-19 04:07:18,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4272770.0, ans=0.125 2024-08-19 04:07:20,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=4272870.0, ans=0.05 2024-08-19 04:07:26,327 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4272870.0, ans=0.125 2024-08-19 04:07:30,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4272870.0, ans=0.125 2024-08-19 04:07:41,101 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4272970.0, ans=0.1 2024-08-19 04:07:54,683 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 11700, loss[loss=0.09563, beats_loss=0.01173, ecapa_loss=0.0001504, whisper_loss=0.08239, over 19874.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01052, ecapa_loss=0.0001419, whisper_loss=0.09028, over 3934900.33 frames. ], batch size: 83, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:08:05,449 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4273070.0, ans=0.0 2024-08-19 04:08:17,067 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4273170.0, ans=0.125 2024-08-19 04:08:23,706 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.836e+01 2.305e+01 2.645e+01 2.900e+01 9.382e+01, threshold=5.291e+01, percent-clipped=2.0 2024-08-19 04:08:40,765 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.71 vs. limit=15.0 2024-08-19 04:08:47,631 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4273370.0, ans=0.07 2024-08-19 04:08:47,716 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.79 vs. limit=22.5 2024-08-19 04:08:51,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4273370.0, ans=0.1 2024-08-19 04:08:55,462 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.71 vs. limit=15.0 2024-08-19 04:09:00,174 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4273470.0, ans=0.09899494936611666 2024-08-19 04:09:14,954 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 11750, loss[loss=0.1201, beats_loss=0.008965, ecapa_loss=0.0001313, whisper_loss=0.1098, over 17063.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01053, ecapa_loss=0.0001412, whisper_loss=0.09061, over 3912365.09 frames. ], batch size: 65, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:09:18,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4273570.0, ans=0.125 2024-08-19 04:09:42,808 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4273670.0, ans=0.125 2024-08-19 04:09:44,271 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.27 vs. limit=10.0 2024-08-19 04:09:49,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4273770.0, ans=0.125 2024-08-19 04:09:57,727 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 21 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-19 04:10:30,515 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4273970.0, ans=0.125 2024-08-19 04:10:31,989 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4274070.0, ans=0.0 2024-08-19 04:10:32,832 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 11800, loss[loss=0.08991, beats_loss=0.01085, ecapa_loss=0.0001197, whisper_loss=0.07786, over 14935.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01049, ecapa_loss=0.000141, whisper_loss=0.09119, over 3925105.84 frames. ], batch size: 55, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:10:38,514 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 24 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-19 04:10:47,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=4274170.0, ans=10.0 2024-08-19 04:10:58,705 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.38 vs. limit=15.0 2024-08-19 04:11:02,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4274170.0, ans=0.125 2024-08-19 04:11:03,174 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.316e+01 2.542e+01 2.977e+01 1.357e+02, threshold=5.084e+01, percent-clipped=2.0 2024-08-19 04:11:12,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4274270.0, ans=0.0 2024-08-19 04:11:54,537 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 11850, loss[loss=0.104, beats_loss=0.01262, ecapa_loss=0.0001016, whisper_loss=0.09041, over 23167.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01052, ecapa_loss=0.000142, whisper_loss=0.09073, over 3955249.65 frames. ], batch size: 89, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:12:01,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4274570.0, ans=0.0 2024-08-19 04:12:03,104 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4274570.0, ans=0.125 2024-08-19 04:12:26,307 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2024-08-19 04:12:26,408 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.85 vs. limit=10.0 2024-08-19 04:12:29,033 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=4274770.0, ans=0.2 2024-08-19 04:12:33,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4274770.0, ans=0.125 2024-08-19 04:12:41,043 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 39 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-19 04:13:11,775 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 11900, loss[loss=0.1011, beats_loss=0.01058, ecapa_loss=0.0001328, whisper_loss=0.08915, over 17605.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01046, ecapa_loss=0.0001428, whisper_loss=0.09166, over 3979238.64 frames. ], batch size: 69, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:13:33,945 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4275170.0, ans=0.0 2024-08-19 04:13:35,118 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4275170.0, ans=0.125 2024-08-19 04:13:39,084 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.362e+01 2.683e+01 3.004e+01 4.414e+01, threshold=5.366e+01, percent-clipped=0.0 2024-08-19 04:13:50,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4275270.0, ans=0.125 2024-08-19 04:14:26,844 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 11950, loss[loss=0.1115, beats_loss=0.009632, ecapa_loss=0.0001332, whisper_loss=0.1005, over 22480.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01039, ecapa_loss=0.0001425, whisper_loss=0.09154, over 3941253.99 frames. ], batch size: 87, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:14:37,514 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 24 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-19 04:14:43,707 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 32 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-19 04:14:45,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4275670.0, ans=0.125 2024-08-19 04:14:51,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4275670.0, ans=0.0 2024-08-19 04:15:17,082 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 14 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-19 04:15:19,960 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 17 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-19 04:15:37,462 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 12000, loss[loss=0.1069, beats_loss=0.01135, ecapa_loss=0.0001372, whisper_loss=0.09419, over 16851.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0104, ecapa_loss=0.000143, whisper_loss=0.09115, over 3918106.12 frames. ], batch size: 68, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:15:37,462 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-19 04:16:19,267 INFO [train_multi_KD3.py:1149] (2/4) Epoch 29, validation on ASR_libri: loss=0.2542, beats_loss=0, ecapa_loss=0.0005284, whisper_loss=0.2489, over 922467.00 frames. 2024-08-19 04:16:36,951 INFO [train_multi_KD3.py:1149] (2/4) Epoch 29, validation on SV_voxceleb1: loss=0.004097, beats_loss=0, ecapa_loss=0.0004097, whisper_loss=0, over 939242.00 frames. 2024-08-19 04:17:21,140 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.4670, 2.5787, 3.2412, 1.3781], device='cuda:2') 2024-08-19 04:18:29,300 INFO [train_multi_KD3.py:1149] (2/4) Epoch 29, validation on AT_audioset: loss=0.02313, beats_loss=0.02313, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 04:18:29,306 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31584MB 2024-08-19 04:18:33,484 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4276070.0, ans=0.0 2024-08-19 04:18:37,988 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4276070.0, ans=0.125 2024-08-19 04:18:38,822 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-19 04:18:54,344 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.02 vs. limit=15.0 2024-08-19 04:18:54,620 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.268e+01 2.505e+01 2.757e+01 3.965e+01, threshold=5.011e+01, percent-clipped=0.0 2024-08-19 04:19:03,277 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-19 04:19:03,449 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=4276270.0, ans=0.5 2024-08-19 04:19:19,400 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4276370.0, ans=0.0 2024-08-19 04:19:25,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4276470.0, ans=0.125 2024-08-19 04:19:34,088 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-19 04:19:36,921 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4276470.0, ans=0.125 2024-08-19 04:19:38,966 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 12050, loss[loss=0.1094, beats_loss=0.01367, ecapa_loss=0.0001243, whisper_loss=0.09447, over 17880.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01047, ecapa_loss=0.0001425, whisper_loss=0.09011, over 3897310.43 frames. ], batch size: 72, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:20:10,184 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4276770.0, ans=0.125 2024-08-19 04:20:11,284 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-19 04:20:13,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4276770.0, ans=0.125 2024-08-19 04:20:16,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4276770.0, ans=0.125 2024-08-19 04:20:18,306 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4276770.0, ans=0.0 2024-08-19 04:20:19,149 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 14 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-19 04:20:19,891 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.53 vs. limit=10.0 2024-08-19 04:20:23,992 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4276870.0, ans=0.1 2024-08-19 04:20:45,397 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4276970.0, ans=0.0 2024-08-19 04:20:48,512 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 12100, loss[loss=0.09124, beats_loss=0.012, ecapa_loss=0.0001078, whisper_loss=0.07816, over 18929.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01047, ecapa_loss=0.0001421, whisper_loss=0.0898, over 3884602.93 frames. ], batch size: 72, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:20:52,495 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4277070.0, ans=0.1 2024-08-19 04:20:54,808 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=4277070.0, ans=0.2 2024-08-19 04:20:58,348 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-19 04:21:11,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4277170.0, ans=0.1 2024-08-19 04:21:13,376 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.256e+01 2.611e+01 2.868e+01 1.471e+02, threshold=5.223e+01, percent-clipped=2.0 2024-08-19 04:21:14,071 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2024-08-19 04:21:14,718 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-19 04:21:18,502 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.81 vs. limit=15.0 2024-08-19 04:21:22,189 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-19 04:21:28,271 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 17 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-19 04:21:58,648 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 12150, loss[loss=0.1193, beats_loss=0.008623, ecapa_loss=0.0001569, whisper_loss=0.1091, over 22964.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01053, ecapa_loss=0.0001425, whisper_loss=0.08913, over 3858286.80 frames. ], batch size: 91, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:22:14,477 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4277670.0, ans=0.2 2024-08-19 04:22:15,679 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4277670.0, ans=0.1 2024-08-19 04:22:16,841 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.188e+00 2024-08-19 04:22:37,934 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 15 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-19 04:22:42,805 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 23 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-19 04:22:45,259 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-19 04:22:51,961 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4277870.0, ans=0.0 2024-08-19 04:23:08,259 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4278070.0, ans=0.0 2024-08-19 04:23:08,952 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 12200, loss[loss=0.1092, beats_loss=0.01292, ecapa_loss=0.0001216, whisper_loss=0.09509, over 22572.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01058, ecapa_loss=0.0001416, whisper_loss=0.08856, over 3844395.08 frames. ], batch size: 88, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:23:16,182 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 20 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-19 04:23:38,850 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.338e+01 2.537e+01 2.829e+01 3.837e+02, threshold=5.074e+01, percent-clipped=2.0 2024-08-19 04:23:39,264 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4278170.0, ans=0.0 2024-08-19 04:23:41,485 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 15 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-19 04:23:52,034 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-19 04:23:57,162 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 40 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-19 04:24:10,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4278370.0, ans=0.0 2024-08-19 04:24:11,944 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-19 04:24:17,512 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 25 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-19 04:24:23,106 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4278470.0, ans=0.1 2024-08-19 04:24:31,624 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 12250, loss[loss=0.0933, beats_loss=0.009963, ecapa_loss=0.0001353, whisper_loss=0.08198, over 22306.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01047, ecapa_loss=0.0001425, whisper_loss=0.08954, over 3889442.75 frames. ], batch size: 90, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:24:33,325 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=4278570.0, ans=22.5 2024-08-19 04:24:39,714 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 28 from LS+wenet, 10 from Vox, 25 fro AS 2024-08-19 04:24:44,573 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-19 04:25:00,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4278670.0, ans=0.125 2024-08-19 04:25:02,999 WARNING [optim.py:496] (2/4) Scaling gradients by 0.05575420707464218, model_norm_threshold=50.743568420410156 2024-08-19 04:25:03,163 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.0.self_attn_weights.in_proj.bias with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.164e+05, grad_sumsq=1.291e+04, orig_rms_sq=9.017e+00 2024-08-19 04:25:03,484 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4278670.0, ans=0.0 2024-08-19 04:25:09,034 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4278770.0, ans=0.0 2024-08-19 04:25:20,478 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4278770.0, ans=0.1 2024-08-19 04:25:28,285 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 21 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-19 04:25:43,473 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.82 vs. limit=22.5 2024-08-19 04:25:47,831 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 22 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-19 04:25:51,760 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.80 vs. limit=10.0 2024-08-19 04:25:59,455 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 12300, loss[loss=0.1235, beats_loss=0.007133, ecapa_loss=0.0001589, whisper_loss=0.1148, over 19291.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01041, ecapa_loss=0.0001423, whisper_loss=0.09033, over 3889012.42 frames. ], batch size: 76, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:26:08,025 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4279070.0, ans=0.125 2024-08-19 04:26:10,855 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.85 vs. limit=15.0 2024-08-19 04:26:33,444 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.928e+01 2.490e+01 2.668e+01 2.990e+01 9.101e+02, threshold=5.335e+01, percent-clipped=3.0 2024-08-19 04:26:44,872 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 33 from Vox, 32 fro AS 2024-08-19 04:26:47,390 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-19 04:26:49,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4279270.0, ans=0.2 2024-08-19 04:26:50,000 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4279270.0, ans=0.125 2024-08-19 04:26:53,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4279270.0, ans=0.0 2024-08-19 04:27:00,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4279370.0, ans=0.1 2024-08-19 04:27:00,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4279370.0, ans=0.0 2024-08-19 04:27:05,890 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 18 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-19 04:27:14,980 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4279470.0, ans=0.0 2024-08-19 04:27:18,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4279470.0, ans=0.125 2024-08-19 04:27:20,134 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4279470.0, ans=0.125 2024-08-19 04:27:30,248 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 12350, loss[loss=0.08359, beats_loss=0.0134, ecapa_loss=0.0001104, whisper_loss=0.06908, over 17197.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0105, ecapa_loss=0.0001427, whisper_loss=0.08974, over 3895579.95 frames. ], batch size: 67, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:27:30,917 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-19 04:27:34,009 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 15 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-19 04:27:41,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=4279570.0, ans=0.05 2024-08-19 04:27:50,153 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4279670.0, ans=0.125 2024-08-19 04:28:13,444 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 21 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-19 04:28:26,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4279870.0, ans=0.0 2024-08-19 04:28:30,948 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4279870.0, ans=0.125 2024-08-19 04:28:49,364 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4279970.0, ans=0.125 2024-08-19 04:28:51,338 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 12400, loss[loss=0.1056, beats_loss=0.008038, ecapa_loss=0.0001844, whisper_loss=0.09573, over 15654.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01051, ecapa_loss=0.0001428, whisper_loss=0.08938, over 3871849.31 frames. ], batch size: 60, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:28:58,534 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 22 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-19 04:29:02,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4280070.0, ans=0.125 2024-08-19 04:29:13,069 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4280170.0, ans=0.125 2024-08-19 04:29:15,113 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.385e+01 2.588e+01 2.896e+01 2.116e+02, threshold=5.177e+01, percent-clipped=1.0 2024-08-19 04:29:18,149 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4280270.0, ans=0.125 2024-08-19 04:29:18,225 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4280270.0, ans=0.125 2024-08-19 04:29:33,341 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.67 vs. limit=15.0 2024-08-19 04:29:36,299 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 22 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-19 04:29:37,972 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4280370.0, ans=0.025 2024-08-19 04:29:39,779 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=10.72 vs. limit=12.0 2024-08-19 04:29:40,773 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4280370.0, ans=0.125 2024-08-19 04:29:41,996 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4280370.0, ans=0.0 2024-08-19 04:29:46,433 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 36 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-19 04:29:48,191 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4280470.0, ans=0.125 2024-08-19 04:29:57,091 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 12450, loss[loss=0.1046, beats_loss=0.01002, ecapa_loss=0.000141, whisper_loss=0.0932, over 17989.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01055, ecapa_loss=0.0001418, whisper_loss=0.08893, over 3898789.53 frames. ], batch size: 71, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:30:02,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4280570.0, ans=0.125 2024-08-19 04:30:12,419 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.72 vs. limit=22.5 2024-08-19 04:30:20,817 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.98 vs. limit=15.0 2024-08-19 04:30:30,832 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4280770.0, ans=0.125 2024-08-19 04:30:32,007 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 23 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-19 04:30:41,113 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 21 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-19 04:30:47,462 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4280870.0, ans=0.2 2024-08-19 04:30:47,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4280870.0, ans=0.125 2024-08-19 04:31:02,550 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 12500, loss[loss=0.0929, beats_loss=0.01211, ecapa_loss=0.0001469, whisper_loss=0.07932, over 21538.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01048, ecapa_loss=0.0001416, whisper_loss=0.08895, over 3885776.04 frames. ], batch size: 90, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:31:09,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4281070.0, ans=0.1 2024-08-19 04:31:25,715 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.257e+01 2.522e+01 2.778e+01 4.051e+01, threshold=5.043e+01, percent-clipped=0.0 2024-08-19 04:31:29,788 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-19 04:31:33,887 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4281270.0, ans=0.0 2024-08-19 04:31:35,248 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4281270.0, ans=0.1 2024-08-19 04:31:50,795 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-19 04:31:57,200 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4281470.0, ans=0.0 2024-08-19 04:31:59,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4281470.0, ans=0.05 2024-08-19 04:32:06,724 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 12550, loss[loss=0.1102, beats_loss=0.008945, ecapa_loss=0.0001556, whisper_loss=0.09973, over 16430.00 frames. ], tot_loss[loss=0.101, beats_loss=0.0105, ecapa_loss=0.0001418, whisper_loss=0.08907, over 3880225.66 frames. ], batch size: 65, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:32:08,016 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 18 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-19 04:32:16,846 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4281570.0, ans=0.2 2024-08-19 04:32:19,205 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=4281670.0, ans=0.025 2024-08-19 04:32:21,509 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 23 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-19 04:32:22,086 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.32 vs. limit=15.0 2024-08-19 04:32:54,717 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 21 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-19 04:33:03,392 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.51 vs. limit=22.5 2024-08-19 04:33:11,136 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 12600, loss[loss=0.08575, beats_loss=0.01145, ecapa_loss=0.0001247, whisper_loss=0.07306, over 21789.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01059, ecapa_loss=0.0001415, whisper_loss=0.08852, over 3875413.44 frames. ], batch size: 88, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:33:28,106 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.83 vs. limit=12.0 2024-08-19 04:33:34,135 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4282170.0, ans=0.125 2024-08-19 04:33:34,966 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.025e+01 2.310e+01 2.550e+01 2.894e+01 3.916e+01, threshold=5.099e+01, percent-clipped=0.0 2024-08-19 04:33:37,803 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4282270.0, ans=0.0 2024-08-19 04:33:49,337 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 23 from LS+wenet, 32 from Vox, 37 fro AS 2024-08-19 04:34:16,142 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 12650, loss[loss=0.1235, beats_loss=0.01046, ecapa_loss=9.319e-05, whisper_loss=0.1121, over 17899.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01056, ecapa_loss=0.0001427, whisper_loss=0.08886, over 3846170.81 frames. ], batch size: 63, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:34:21,695 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 33 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-19 04:34:36,327 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 27 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-19 04:34:46,570 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4282770.0, ans=0.125 2024-08-19 04:34:46,613 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4282770.0, ans=0.1 2024-08-19 04:34:49,660 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.12 vs. limit=22.5 2024-08-19 04:34:52,190 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.45 vs. limit=15.0 2024-08-19 04:35:02,099 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 23 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-19 04:35:08,928 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4282970.0, ans=0.0 2024-08-19 04:35:18,792 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 20 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-19 04:35:21,218 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 12700, loss[loss=0.1285, beats_loss=0.008936, ecapa_loss=0.0001356, whisper_loss=0.1182, over 20783.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01056, ecapa_loss=0.0001423, whisper_loss=0.08929, over 3869522.57 frames. ], batch size: 80, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:35:21,385 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-19 04:35:43,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4283170.0, ans=0.0 2024-08-19 04:35:44,699 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.234e+01 2.542e+01 2.819e+01 3.569e+01, threshold=5.084e+01, percent-clipped=0.0 2024-08-19 04:35:55,245 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-19 04:36:12,647 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 17 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-19 04:36:27,299 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 12750, loss[loss=0.09361, beats_loss=0.01004, ecapa_loss=0.0001253, whisper_loss=0.08232, over 16567.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01057, ecapa_loss=0.0001425, whisper_loss=0.09009, over 3870823.83 frames. ], batch size: 63, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:36:50,404 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4283670.0, ans=0.125 2024-08-19 04:36:51,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4283670.0, ans=0.125 2024-08-19 04:36:52,084 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.71 vs. limit=15.0 2024-08-19 04:36:55,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4283770.0, ans=0.125 2024-08-19 04:36:55,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4283770.0, ans=0.125 2024-08-19 04:37:01,250 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.99 vs. limit=15.0 2024-08-19 04:37:24,710 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4283970.0, ans=0.125 2024-08-19 04:37:33,403 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 12800, loss[loss=0.1265, beats_loss=0.008331, ecapa_loss=0.0001504, whisper_loss=0.1167, over 23803.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01059, ecapa_loss=0.0001431, whisper_loss=0.09005, over 3863768.33 frames. ], batch size: 93, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:37:44,483 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.48 vs. limit=15.0 2024-08-19 04:37:56,530 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.707e+01 2.197e+01 2.426e+01 2.664e+01 3.787e+01, threshold=4.851e+01, percent-clipped=0.0 2024-08-19 04:38:05,476 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 17 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-19 04:38:07,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4284270.0, ans=0.05 2024-08-19 04:38:16,600 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.83 vs. limit=15.0 2024-08-19 04:38:19,644 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-19 04:38:27,442 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4284470.0, ans=0.125 2024-08-19 04:38:30,474 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.01 vs. limit=15.0 2024-08-19 04:38:37,225 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 12850, loss[loss=0.09661, beats_loss=0.01199, ecapa_loss=0.0001434, whisper_loss=0.08319, over 19122.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01057, ecapa_loss=0.0001429, whisper_loss=0.09062, over 3865593.76 frames. ], batch size: 79, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:38:41,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4284570.0, ans=0.125 2024-08-19 04:38:46,159 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 16 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-19 04:38:50,047 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 16 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-19 04:38:57,781 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-19 04:39:01,924 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4284770.0, ans=0.0 2024-08-19 04:39:15,793 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-19 04:39:21,887 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4284870.0, ans=0.0 2024-08-19 04:39:24,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4284870.0, ans=0.125 2024-08-19 04:39:33,730 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.46 vs. limit=10.0 2024-08-19 04:39:40,130 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 12900, loss[loss=0.09731, beats_loss=0.01059, ecapa_loss=0.0001353, whisper_loss=0.08537, over 22339.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01062, ecapa_loss=0.0001418, whisper_loss=0.08987, over 3846383.42 frames. ], batch size: 89, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:39:40,231 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-19 04:39:53,233 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 26 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-19 04:40:02,817 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.358e+01 2.618e+01 2.908e+01 5.283e+01, threshold=5.236e+01, percent-clipped=1.0 2024-08-19 04:40:07,701 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.86 vs. limit=15.0 2024-08-19 04:40:08,610 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4285270.0, ans=0.125 2024-08-19 04:40:09,486 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-19 04:40:29,124 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 21 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-19 04:40:30,326 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-19 04:40:37,898 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4285470.0, ans=0.04949747468305833 2024-08-19 04:40:42,501 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 12950, loss[loss=0.1014, beats_loss=0.009668, ecapa_loss=0.0001573, whisper_loss=0.09011, over 20946.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01055, ecapa_loss=0.000142, whisper_loss=0.09016, over 3860360.84 frames. ], batch size: 89, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:40:47,821 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 16 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-19 04:40:53,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4285570.0, ans=0.125 2024-08-19 04:41:02,726 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 22 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-19 04:41:06,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4285770.0, ans=0.1 2024-08-19 04:41:18,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4285870.0, ans=0.125 2024-08-19 04:41:34,460 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 19 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-19 04:41:44,616 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 13000, loss[loss=0.109, beats_loss=0.01054, ecapa_loss=0.0001768, whisper_loss=0.09665, over 22311.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01053, ecapa_loss=0.0001435, whisper_loss=0.08995, over 3869462.13 frames. ], batch size: 92, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:41:46,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4286070.0, ans=10.0 2024-08-19 04:42:02,605 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.09 vs. limit=22.5 2024-08-19 04:42:03,438 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.788e+05 2024-08-19 04:42:06,228 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.47 vs. limit=12.0 2024-08-19 04:42:06,516 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.266e+01 2.592e+01 2.874e+01 4.367e+01, threshold=5.183e+01, percent-clipped=0.0 2024-08-19 04:42:06,635 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 22 from LS+wenet, 20 from Vox, 17 fro AS 2024-08-19 04:42:15,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4286270.0, ans=0.0 2024-08-19 04:42:19,359 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 9 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-19 04:42:36,002 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4286470.0, ans=0.125 2024-08-19 04:42:43,488 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4286470.0, ans=0.0 2024-08-19 04:42:46,820 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 13050, loss[loss=0.1187, beats_loss=0.009325, ecapa_loss=0.0001632, whisper_loss=0.1078, over 16244.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01053, ecapa_loss=0.0001422, whisper_loss=0.08971, over 3853929.38 frames. ], batch size: 62, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:42:51,338 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4286570.0, ans=0.0 2024-08-19 04:43:12,800 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-19 04:43:19,031 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4286770.0, ans=0.125 2024-08-19 04:43:30,222 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 25 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-19 04:43:42,793 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4286970.0, ans=0.125 2024-08-19 04:43:42,827 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4286970.0, ans=0.0 2024-08-19 04:43:56,471 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.78 vs. limit=15.0 2024-08-19 04:43:57,079 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 13100, loss[loss=0.102, beats_loss=0.009464, ecapa_loss=0.0001624, whisper_loss=0.09087, over 18985.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0105, ecapa_loss=0.0001421, whisper_loss=0.08996, over 3862259.90 frames. ], batch size: 78, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:43:57,226 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-19 04:44:04,298 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-19 04:44:07,483 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 20 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-19 04:44:10,462 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 29 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-19 04:44:16,768 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4287170.0, ans=0.1 2024-08-19 04:44:23,734 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.776e+01 2.315e+01 2.592e+01 2.960e+01 1.126e+02, threshold=5.185e+01, percent-clipped=1.0 2024-08-19 04:44:26,658 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-19 04:44:35,846 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-19 04:44:37,736 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 26 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-19 04:44:42,058 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.86 vs. limit=6.0 2024-08-19 04:44:49,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4287370.0, ans=0.0 2024-08-19 04:44:55,946 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-19 04:45:01,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4287470.0, ans=0.1 2024-08-19 04:45:02,723 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.00 vs. limit=10.0 2024-08-19 04:45:14,376 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 13150, loss[loss=0.09835, beats_loss=0.00791, ecapa_loss=0.000119, whisper_loss=0.08926, over 14945.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0105, ecapa_loss=0.0001408, whisper_loss=0.08999, over 3865885.03 frames. ], batch size: 56, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:45:14,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=4287570.0, ans=0.05 2024-08-19 04:45:29,527 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4287670.0, ans=0.125 2024-08-19 04:45:38,656 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-19 04:45:44,449 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-19 04:45:53,055 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4287770.0, ans=0.2 2024-08-19 04:46:16,992 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4287970.0, ans=0.0 2024-08-19 04:46:21,638 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 19 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-19 04:46:30,675 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 13200, loss[loss=0.101, beats_loss=0.009339, ecapa_loss=0.0001532, whisper_loss=0.09011, over 18040.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01049, ecapa_loss=0.0001418, whisper_loss=0.08993, over 3855567.97 frames. ], batch size: 71, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:46:32,490 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 16 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-19 04:46:45,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4288170.0, ans=0.125 2024-08-19 04:46:46,809 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-19 04:46:50,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4288170.0, ans=0.2 2024-08-19 04:46:51,168 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.74 vs. limit=15.0 2024-08-19 04:46:59,768 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+01 2.394e+01 2.675e+01 2.983e+01 8.603e+01, threshold=5.350e+01, percent-clipped=2.0 2024-08-19 04:47:02,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4288270.0, ans=0.125 2024-08-19 04:47:07,533 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4288270.0, ans=0.1 2024-08-19 04:47:12,231 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 18 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-19 04:47:13,541 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4288270.0, ans=0.125 2024-08-19 04:47:17,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4288370.0, ans=0.125 2024-08-19 04:47:22,920 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 23 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-19 04:47:24,624 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4288370.0, ans=0.0 2024-08-19 04:47:33,446 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 14 from Vox, 49 fro AS 2024-08-19 04:47:39,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=4288470.0, ans=0.2 2024-08-19 04:47:40,116 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 29 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-19 04:47:49,118 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 13250, loss[loss=0.1235, beats_loss=0.00814, ecapa_loss=0.0001309, whisper_loss=0.1141, over 22165.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01047, ecapa_loss=0.0001412, whisper_loss=0.08988, over 3869667.22 frames. ], batch size: 82, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:48:08,704 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-19 04:48:18,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4288670.0, ans=0.125 2024-08-19 04:48:19,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4288770.0, ans=0.035 2024-08-19 04:48:24,475 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-19 04:48:33,419 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4288770.0, ans=0.0 2024-08-19 04:48:49,883 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-19 04:49:06,080 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 31 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-19 04:49:07,105 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 13300, loss[loss=0.1209, beats_loss=0.008382, ecapa_loss=0.0001468, whisper_loss=0.1111, over 19582.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0104, ecapa_loss=0.0001423, whisper_loss=0.09008, over 3859833.61 frames. ], batch size: 77, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:49:19,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4289070.0, ans=0.125 2024-08-19 04:49:31,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4289170.0, ans=0.0 2024-08-19 04:49:34,964 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.945e+01 2.357e+01 2.503e+01 2.761e+01 3.724e+01, threshold=5.005e+01, percent-clipped=0.0 2024-08-19 04:49:38,962 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.31 vs. limit=12.0 2024-08-19 04:49:46,162 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-19 04:49:58,094 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4289370.0, ans=0.125 2024-08-19 04:50:09,728 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4289470.0, ans=0.0 2024-08-19 04:50:15,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4289470.0, ans=0.125 2024-08-19 04:50:18,160 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.19 vs. limit=15.0 2024-08-19 04:50:21,364 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4289470.0, ans=0.125 2024-08-19 04:50:24,057 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 13350, loss[loss=0.1331, beats_loss=0.007235, ecapa_loss=0.0001776, whisper_loss=0.1241, over 20979.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01047, ecapa_loss=0.0001435, whisper_loss=0.09015, over 3904023.13 frames. ], batch size: 80, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:50:24,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4289570.0, ans=0.125 2024-08-19 04:50:31,110 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4289570.0, ans=0.07 2024-08-19 04:50:42,195 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.15 vs. limit=15.0 2024-08-19 04:50:45,952 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 22 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-19 04:50:51,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4289670.0, ans=0.0 2024-08-19 04:50:52,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4289670.0, ans=0.2 2024-08-19 04:50:58,866 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 24 from LS+wenet, 12 from Vox, 20 fro AS 2024-08-19 04:51:06,148 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4289770.0, ans=0.0 2024-08-19 04:51:15,319 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 25 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-19 04:51:40,236 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 13400, loss[loss=0.1165, beats_loss=0.01088, ecapa_loss=0.0001275, whisper_loss=0.1044, over 16250.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01049, ecapa_loss=0.0001426, whisper_loss=0.08926, over 3896103.02 frames. ], batch size: 62, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:52:02,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4290170.0, ans=0.0 2024-08-19 04:52:05,741 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.283e+01 2.575e+01 2.794e+01 4.211e+01, threshold=5.151e+01, percent-clipped=0.0 2024-08-19 04:52:22,994 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4290370.0, ans=0.1 2024-08-19 04:52:30,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4290370.0, ans=0.0 2024-08-19 04:52:32,930 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.04 vs. limit=6.0 2024-08-19 04:52:40,983 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-19 04:52:52,714 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 13450, loss[loss=0.1126, beats_loss=0.009616, ecapa_loss=0.0001377, whisper_loss=0.1016, over 22154.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01055, ecapa_loss=0.0001421, whisper_loss=0.08919, over 3891584.05 frames. ], batch size: 89, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:53:01,998 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4290570.0, ans=0.125 2024-08-19 04:53:06,519 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 13 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-19 04:53:14,697 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.87 vs. limit=15.0 2024-08-19 04:53:15,121 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-19 04:54:01,655 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 32 from Vox, 36 fro AS 2024-08-19 04:54:05,027 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4290970.0, ans=0.125 2024-08-19 04:54:10,686 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 13500, loss[loss=0.09937, beats_loss=0.0116, ecapa_loss=0.0001406, whisper_loss=0.08636, over 22059.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0105, ecapa_loss=0.0001416, whisper_loss=0.09009, over 3899014.21 frames. ], batch size: 93, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:54:22,703 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4291070.0, ans=0.07 2024-08-19 04:54:26,589 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4291170.0, ans=0.0 2024-08-19 04:54:27,146 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.02 vs. limit=22.5 2024-08-19 04:54:37,459 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.892e+01 2.302e+01 2.521e+01 2.822e+01 3.950e+01, threshold=5.042e+01, percent-clipped=0.0 2024-08-19 04:54:45,611 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4291270.0, ans=0.125 2024-08-19 04:54:48,537 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=4291270.0, ans=15.0 2024-08-19 04:54:56,079 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 22 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-19 04:55:06,861 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4291370.0, ans=0.1 2024-08-19 04:55:11,321 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.26 vs. limit=15.0 2024-08-19 04:55:11,800 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 35 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-19 04:55:12,414 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4291470.0, ans=0.0 2024-08-19 04:55:15,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4291470.0, ans=0.1 2024-08-19 04:55:23,448 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 13550, loss[loss=0.1052, beats_loss=0.01034, ecapa_loss=0.0001631, whisper_loss=0.09324, over 22873.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01054, ecapa_loss=0.0001416, whisper_loss=0.09016, over 3907243.64 frames. ], batch size: 94, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:55:29,219 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 23 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-19 04:55:29,450 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4291570.0, ans=0.2 2024-08-19 04:55:39,589 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4291670.0, ans=0.1 2024-08-19 04:55:47,116 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4291670.0, ans=0.2 2024-08-19 04:55:52,260 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 25 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-19 04:56:07,518 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-19 04:56:11,144 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2024-08-19 04:56:19,435 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 26 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-19 04:56:26,788 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4291970.0, ans=0.05 2024-08-19 04:56:29,155 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 30 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-19 04:56:36,270 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 13600, loss[loss=0.1102, beats_loss=0.01108, ecapa_loss=0.0001365, whisper_loss=0.09778, over 23122.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01049, ecapa_loss=0.0001417, whisper_loss=0.09064, over 3919343.81 frames. ], batch size: 93, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:56:48,352 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.71 vs. limit=15.0 2024-08-19 04:57:02,140 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.298e+01 2.598e+01 3.003e+01 1.611e+02, threshold=5.196e+01, percent-clipped=4.0 2024-08-19 04:57:02,300 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-19 04:57:06,475 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4292270.0, ans=0.125 2024-08-19 04:57:11,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4292270.0, ans=0.1 2024-08-19 04:57:13,710 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=15.0 2024-08-19 04:57:35,507 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.59 vs. limit=15.0 2024-08-19 04:57:40,087 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4292470.0, ans=0.125 2024-08-19 04:57:50,875 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 13650, loss[loss=0.1102, beats_loss=0.01016, ecapa_loss=0.0001575, whisper_loss=0.09846, over 22015.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01055, ecapa_loss=0.0001422, whisper_loss=0.09058, over 3933567.88 frames. ], batch size: 87, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:58:09,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4292670.0, ans=0.0 2024-08-19 04:58:31,732 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-19 04:58:46,846 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 22 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-19 04:59:08,231 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 13700, loss[loss=0.1097, beats_loss=0.01114, ecapa_loss=0.0001155, whisper_loss=0.09737, over 15734.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0106, ecapa_loss=0.0001411, whisper_loss=0.0898, over 3928174.18 frames. ], batch size: 60, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:59:09,366 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4293070.0, ans=0.2 2024-08-19 04:59:12,262 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 23 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-19 04:59:19,402 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.80 vs. limit=15.0 2024-08-19 04:59:27,327 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4293170.0, ans=0.125 2024-08-19 04:59:33,543 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4293170.0, ans=0.125 2024-08-19 04:59:35,451 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.12 vs. limit=10.0 2024-08-19 04:59:38,032 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.742e+01 2.235e+01 2.503e+01 2.717e+01 3.786e+01, threshold=5.006e+01, percent-clipped=0.0 2024-08-19 04:59:45,675 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-19 04:59:56,215 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4293370.0, ans=0.125 2024-08-19 05:00:08,720 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 29 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-19 05:00:16,594 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.65 vs. limit=10.0 2024-08-19 05:00:18,250 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.70 vs. limit=22.5 2024-08-19 05:00:28,127 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 13750, loss[loss=0.1078, beats_loss=0.009769, ecapa_loss=0.0001497, whisper_loss=0.0965, over 19898.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01059, ecapa_loss=0.0001416, whisper_loss=0.09028, over 3937517.44 frames. ], batch size: 79, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:00:35,375 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.78 vs. limit=8.0 2024-08-19 05:00:35,759 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-19 05:00:36,088 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4293570.0, ans=0.0 2024-08-19 05:00:40,538 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4293570.0, ans=0.125 2024-08-19 05:00:54,850 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 18 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-19 05:01:03,668 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4293770.0, ans=0.125 2024-08-19 05:01:10,747 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4293870.0, ans=10.0 2024-08-19 05:01:25,920 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4293970.0, ans=0.2 2024-08-19 05:01:28,998 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.83 vs. limit=6.0 2024-08-19 05:01:37,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4293970.0, ans=0.125 2024-08-19 05:01:38,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4294070.0, ans=0.0 2024-08-19 05:01:39,179 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 13800, loss[loss=0.1133, beats_loss=0.01118, ecapa_loss=0.0001274, whisper_loss=0.1008, over 21369.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0106, ecapa_loss=0.0001417, whisper_loss=0.09002, over 3899648.23 frames. ], batch size: 82, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:01:47,489 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4294070.0, ans=0.125 2024-08-19 05:01:50,442 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4294070.0, ans=0.125 2024-08-19 05:02:01,779 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 18 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-19 05:02:02,832 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.336e+01 2.495e+01 2.875e+01 4.670e+01, threshold=4.991e+01, percent-clipped=0.0 2024-08-19 05:02:06,811 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 14 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-19 05:02:10,846 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-19 05:02:29,925 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.20 vs. limit=15.0 2024-08-19 05:02:32,534 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4294470.0, ans=0.07 2024-08-19 05:02:35,002 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4294470.0, ans=0.125 2024-08-19 05:02:45,065 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 13850, loss[loss=0.08381, beats_loss=0.01375, ecapa_loss=0.0001108, whisper_loss=0.06895, over 18102.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01054, ecapa_loss=0.0001421, whisper_loss=0.09046, over 3903611.97 frames. ], batch size: 72, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:02:50,604 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4294570.0, ans=0.125 2024-08-19 05:02:54,271 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 25 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-19 05:03:18,225 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-19 05:03:27,417 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4294870.0, ans=0.125 2024-08-19 05:03:28,300 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-19 05:03:29,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4294870.0, ans=0.125 2024-08-19 05:03:30,266 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.92 vs. limit=6.0 2024-08-19 05:03:48,306 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.39 vs. limit=15.0 2024-08-19 05:03:50,093 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 13900, loss[loss=0.1056, beats_loss=0.007509, ecapa_loss=0.0001544, whisper_loss=0.09659, over 14987.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01048, ecapa_loss=0.0001424, whisper_loss=0.09053, over 3900361.43 frames. ], batch size: 57, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:03:52,038 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4295070.0, ans=0.125 2024-08-19 05:04:08,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4295170.0, ans=0.0 2024-08-19 05:04:13,639 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.343e+01 2.547e+01 2.776e+01 4.991e+01, threshold=5.095e+01, percent-clipped=1.0 2024-08-19 05:04:22,383 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-19 05:04:30,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4295370.0, ans=0.125 2024-08-19 05:04:31,515 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4295370.0, ans=0.0 2024-08-19 05:04:43,849 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4295470.0, ans=0.0 2024-08-19 05:04:46,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4295470.0, ans=0.125 2024-08-19 05:04:50,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=4295470.0, ans=0.1 2024-08-19 05:04:56,500 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 13950, loss[loss=0.1177, beats_loss=0.008825, ecapa_loss=0.000149, whisper_loss=0.1073, over 22857.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0104, ecapa_loss=0.0001428, whisper_loss=0.09127, over 3909665.25 frames. ], batch size: 90, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:04:57,566 WARNING [optim.py:496] (2/4) Scaling gradients by 0.029894206672906876, model_norm_threshold=50.94768524169922 2024-08-19 05:04:57,729 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.708e+05, grad_sumsq=1.429e+05, orig_rms_sq=3.294e+00 2024-08-19 05:05:01,839 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 16 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-19 05:05:17,967 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4295670.0, ans=0.04949747468305833 2024-08-19 05:05:37,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4295870.0, ans=0.2 2024-08-19 05:05:48,809 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-19 05:06:02,414 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 14000, loss[loss=0.1017, beats_loss=0.009909, ecapa_loss=0.0001461, whisper_loss=0.09032, over 17454.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01044, ecapa_loss=0.0001414, whisper_loss=0.09083, over 3904977.49 frames. ], batch size: 71, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:06:14,140 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-19 05:06:20,745 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.79 vs. limit=15.0 2024-08-19 05:06:22,360 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 23 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-19 05:06:26,046 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.077e+01 2.382e+01 2.706e+01 3.107e+01 1.704e+03, threshold=5.412e+01, percent-clipped=4.0 2024-08-19 05:06:46,926 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 22 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-19 05:06:51,519 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.795e-01 2024-08-19 05:06:52,415 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-19 05:06:59,360 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4296470.0, ans=0.0 2024-08-19 05:07:07,727 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 14050, loss[loss=0.09457, beats_loss=0.01188, ecapa_loss=0.0001289, whisper_loss=0.0814, over 23160.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01052, ecapa_loss=0.0001399, whisper_loss=0.09031, over 3918532.04 frames. ], batch size: 95, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:07:09,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4296570.0, ans=0.1 2024-08-19 05:07:10,617 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 24 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-19 05:07:12,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4296570.0, ans=0.125 2024-08-19 05:07:29,020 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 14 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-19 05:07:35,278 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.81 vs. limit=15.0 2024-08-19 05:07:49,435 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 14 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-19 05:07:55,563 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.17 vs. limit=15.0 2024-08-19 05:08:14,742 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 14100, loss[loss=0.09626, beats_loss=0.00955, ecapa_loss=0.0002188, whisper_loss=0.08452, over 20994.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01047, ecapa_loss=0.0001411, whisper_loss=0.09036, over 3882864.42 frames. ], batch size: 92, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:08:14,870 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 20 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-19 05:08:15,163 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4297070.0, ans=0.0 2024-08-19 05:08:15,409 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.09 vs. limit=15.0 2024-08-19 05:08:20,546 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4297070.0, ans=0.1 2024-08-19 05:08:25,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4297070.0, ans=0.07 2024-08-19 05:08:36,354 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.75 vs. limit=22.5 2024-08-19 05:08:38,007 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.983e+01 2.319e+01 2.520e+01 2.882e+01 1.476e+02, threshold=5.041e+01, percent-clipped=1.0 2024-08-19 05:08:54,834 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 29 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-19 05:08:58,786 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4297370.0, ans=0.125 2024-08-19 05:08:59,930 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.42 vs. limit=6.0 2024-08-19 05:09:18,937 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.58 vs. limit=15.0 2024-08-19 05:09:19,363 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 14150, loss[loss=0.08617, beats_loss=0.01204, ecapa_loss=0.0001361, whisper_loss=0.07277, over 21964.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01065, ecapa_loss=0.0001405, whisper_loss=0.08909, over 3883102.71 frames. ], batch size: 91, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:09:22,446 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4297570.0, ans=0.1 2024-08-19 05:09:33,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4297670.0, ans=0.125 2024-08-19 05:09:37,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4297670.0, ans=0.125 2024-08-19 05:09:53,300 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4297770.0, ans=0.0 2024-08-19 05:10:01,002 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 17 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-19 05:10:03,558 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-19 05:10:12,584 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 21 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-19 05:10:15,491 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4297970.0, ans=0.2 2024-08-19 05:10:19,501 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4297970.0, ans=0.125 2024-08-19 05:10:21,046 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4297970.0, ans=0.0 2024-08-19 05:10:24,143 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 14200, loss[loss=0.09293, beats_loss=0.01201, ecapa_loss=0.0001202, whisper_loss=0.07972, over 20984.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01059, ecapa_loss=0.0001404, whisper_loss=0.08926, over 3880206.12 frames. ], batch size: 86, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:10:24,412 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-19 05:10:26,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4298070.0, ans=0.125 2024-08-19 05:10:32,476 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 21 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-19 05:10:40,718 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-19 05:10:47,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4298170.0, ans=0.07 2024-08-19 05:10:48,040 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.276e+01 2.492e+01 2.801e+01 5.821e+01, threshold=4.984e+01, percent-clipped=1.0 2024-08-19 05:11:01,756 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 15 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-19 05:11:15,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=4298370.0, ans=0.05 2024-08-19 05:11:27,892 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4298470.0, ans=0.125 2024-08-19 05:11:29,945 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 14250, loss[loss=0.1014, beats_loss=0.008227, ecapa_loss=0.0001222, whisper_loss=0.09191, over 18280.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01055, ecapa_loss=0.0001403, whisper_loss=0.08889, over 3863506.90 frames. ], batch size: 68, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:11:35,848 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4298570.0, ans=0.1 2024-08-19 05:11:36,784 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-19 05:11:40,258 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.55 vs. limit=12.0 2024-08-19 05:11:41,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4298570.0, ans=0.125 2024-08-19 05:11:56,893 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.38 vs. limit=22.5 2024-08-19 05:12:22,921 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-19 05:12:34,230 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 14300, loss[loss=0.09273, beats_loss=0.008616, ecapa_loss=0.0001935, whisper_loss=0.08218, over 21568.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01058, ecapa_loss=0.0001394, whisper_loss=0.08886, over 3881723.17 frames. ], batch size: 92, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:12:37,849 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4299070.0, ans=0.0 2024-08-19 05:12:38,156 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.66 vs. limit=15.0 2024-08-19 05:12:38,202 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.39 vs. limit=15.0 2024-08-19 05:12:46,339 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.82 vs. limit=22.5 2024-08-19 05:12:52,988 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-19 05:12:57,972 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.820e+01 2.328e+01 2.560e+01 2.862e+01 1.139e+02, threshold=5.121e+01, percent-clipped=2.0 2024-08-19 05:12:58,390 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4299170.0, ans=0.125 2024-08-19 05:13:03,970 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.90 vs. limit=6.0 2024-08-19 05:13:14,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4299370.0, ans=0.125 2024-08-19 05:13:26,140 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4299470.0, ans=0.125 2024-08-19 05:13:35,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4299470.0, ans=0.1 2024-08-19 05:13:40,120 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 14350, loss[loss=0.1078, beats_loss=0.01052, ecapa_loss=0.0001191, whisper_loss=0.09609, over 17647.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01059, ecapa_loss=0.0001382, whisper_loss=0.08889, over 3851591.15 frames. ], batch size: 68, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:14:00,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4299670.0, ans=0.125 2024-08-19 05:14:01,776 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-19 05:14:03,042 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-19 05:14:06,917 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 32 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-19 05:14:20,168 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4299870.0, ans=0.125 2024-08-19 05:14:23,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4299870.0, ans=0.125 2024-08-19 05:14:25,145 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4299870.0, ans=0.125 2024-08-19 05:14:25,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4299870.0, ans=0.0 2024-08-19 05:14:31,396 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-19 05:14:36,321 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 19 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-19 05:14:40,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4299970.0, ans=0.125 2024-08-19 05:14:43,944 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 14400, loss[loss=0.1189, beats_loss=0.01007, ecapa_loss=0.0001337, whisper_loss=0.1075, over 23084.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01054, ecapa_loss=0.0001395, whisper_loss=0.08911, over 3855455.58 frames. ], batch size: 89, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:14:50,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4300070.0, ans=0.1 2024-08-19 05:14:51,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4300070.0, ans=0.125 2024-08-19 05:14:55,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4300170.0, ans=0.0 2024-08-19 05:14:57,321 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4300170.0, ans=0.125 2024-08-19 05:14:59,446 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 14 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-19 05:15:06,839 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.240e+01 2.498e+01 2.789e+01 4.237e+01, threshold=4.997e+01, percent-clipped=0.0 2024-08-19 05:15:10,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4300270.0, ans=0.0 2024-08-19 05:15:11,557 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.16 vs. limit=15.0 2024-08-19 05:15:21,591 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-19 05:15:24,079 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-19 05:15:41,163 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 19 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-19 05:15:43,183 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.96 vs. limit=15.0 2024-08-19 05:15:44,300 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.48 vs. limit=15.0 2024-08-19 05:15:46,349 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-19 05:15:48,994 INFO [train_multi_KD3.py:1116] (2/4) Epoch 29, batch 14450, loss[loss=0.1009, beats_loss=0.0103, ecapa_loss=0.0001628, whisper_loss=0.08901, over 15977.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01052, ecapa_loss=0.0001404, whisper_loss=0.08981, over 3870443.59 frames. ], batch size: 67, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:15:55,144 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2024-08-19 05:16:02,673 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4300670.0, ans=0.09899494936611666 2024-08-19 05:16:12,446 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.49 vs. limit=15.0 2024-08-19 05:16:20,285 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-19 05:16:26,592 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 19 from LS+wenet, 26 from Vox, 46 fro AS 2024-08-19 05:17:20,840 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 0, loss[loss=0.08932, beats_loss=0.008209, ecapa_loss=0.0001473, whisper_loss=0.07964, over 15147.00 frames. ], tot_loss[loss=0.08932, beats_loss=0.008209, ecapa_loss=0.0001473, whisper_loss=0.07964, over 15147.00 frames. ], batch size: 59, lr: 2.06e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:17:20,840 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-19 05:17:59,436 INFO [train_multi_KD3.py:1149] (2/4) Epoch 30, validation on ASR_libri: loss=0.2538, beats_loss=0, ecapa_loss=0.0005174, whisper_loss=0.2486, over 922467.00 frames. 2024-08-19 05:18:14,971 INFO [train_multi_KD3.py:1149] (2/4) Epoch 30, validation on SV_voxceleb1: loss=0.003909, beats_loss=0, ecapa_loss=0.0003909, whisper_loss=0, over 939242.00 frames. 2024-08-19 05:20:09,436 INFO [train_multi_KD3.py:1149] (2/4) Epoch 30, validation on AT_audioset: loss=0.02304, beats_loss=0.02304, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 05:20:09,439 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31584MB 2024-08-19 05:20:10,891 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-19 05:20:30,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4300990.0, ans=0.0 2024-08-19 05:20:38,085 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-19 05:21:16,716 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.422e+01 2.650e+01 2.982e+01 4.420e+02, threshold=5.300e+01, percent-clipped=2.0 2024-08-19 05:21:17,153 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 05:21:24,669 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.364e+01 2024-08-19 05:21:28,786 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-19 05:21:56,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4301390.0, ans=0.125 2024-08-19 05:21:59,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4301390.0, ans=0.0 2024-08-19 05:22:12,616 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 50, loss[loss=0.1135, beats_loss=0.009078, ecapa_loss=0.0001388, whisper_loss=0.103, over 18471.00 frames. ], tot_loss[loss=0.09818, beats_loss=0.009412, ecapa_loss=0.0001466, whisper_loss=0.0873, over 862753.84 frames. ], batch size: 74, lr: 2.06e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:22:29,465 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 19 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-19 05:22:40,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4301590.0, ans=0.2 2024-08-19 05:23:08,541 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4301690.0, ans=0.125 2024-08-19 05:23:20,805 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=4.280e-02 2024-08-19 05:23:45,970 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=4301890.0, ans=15.0 2024-08-19 05:24:07,355 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 100, loss[loss=0.07628, beats_loss=0.01136, ecapa_loss=0.000144, whisper_loss=0.06347, over 14117.00 frames. ], tot_loss[loss=0.09789, beats_loss=0.00941, ecapa_loss=0.0001458, whisper_loss=0.08702, over 1515768.55 frames. ], batch size: 58, lr: 2.06e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:24:19,121 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-19 05:24:27,960 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 05:24:48,483 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4302090.0, ans=0.0 2024-08-19 05:24:57,229 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.84 vs. limit=15.0 2024-08-19 05:25:05,389 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.646e+01 2.905e+01 3.235e+01 6.271e+01, threshold=5.810e+01, percent-clipped=1.0 2024-08-19 05:25:05,810 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4302190.0, ans=0.0 2024-08-19 05:25:30,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4302290.0, ans=0.0 2024-08-19 05:25:38,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4302390.0, ans=0.1 2024-08-19 05:25:51,600 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 150, loss[loss=0.1038, beats_loss=0.008811, ecapa_loss=0.0001807, whisper_loss=0.0932, over 21020.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.009328, ecapa_loss=0.0001448, whisper_loss=0.09043, over 2045709.80 frames. ], batch size: 83, lr: 2.06e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:25:56,276 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4302490.0, ans=0.0 2024-08-19 05:26:00,986 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4302490.0, ans=0.1 2024-08-19 05:26:06,195 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4302490.0, ans=0.125 2024-08-19 05:26:09,487 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4302590.0, ans=0.125 2024-08-19 05:26:09,504 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=4302590.0, ans=0.5 2024-08-19 05:26:09,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4302590.0, ans=0.0 2024-08-19 05:26:19,101 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 12 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-19 05:26:22,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4302690.0, ans=0.1 2024-08-19 05:26:37,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4302790.0, ans=0.0 2024-08-19 05:26:41,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4302790.0, ans=10.0 2024-08-19 05:27:00,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4302890.0, ans=0.125 2024-08-19 05:27:05,907 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 200, loss[loss=0.09172, beats_loss=0.01324, ecapa_loss=0.0001547, whisper_loss=0.07694, over 23138.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.009671, ecapa_loss=0.0001437, whisper_loss=0.08978, over 2424127.82 frames. ], batch size: 95, lr: 2.06e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:27:13,638 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.94 vs. limit=6.0 2024-08-19 05:27:35,683 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4303190.0, ans=10.0 2024-08-19 05:27:42,099 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.381e+01 2.550e+01 2.834e+01 4.054e+01, threshold=5.100e+01, percent-clipped=0.0 2024-08-19 05:27:46,523 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=4303290.0, ans=15.0 2024-08-19 05:28:12,964 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 250, loss[loss=0.1144, beats_loss=0.01097, ecapa_loss=0.0001118, whisper_loss=0.1024, over 23741.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.009936, ecapa_loss=0.0001438, whisper_loss=0.0904, over 2758457.47 frames. ], batch size: 90, lr: 2.06e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:28:19,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4303490.0, ans=0.125 2024-08-19 05:28:30,490 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 31 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-19 05:28:37,784 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 13 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-19 05:28:45,824 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 29 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-19 05:28:46,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4303690.0, ans=0.0 2024-08-19 05:28:49,384 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 19 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-19 05:29:03,396 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4303890.0, ans=0.2 2024-08-19 05:29:04,565 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4303890.0, ans=0.125 2024-08-19 05:29:04,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4303890.0, ans=0.2 2024-08-19 05:29:09,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4303890.0, ans=0.1 2024-08-19 05:29:13,011 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4303890.0, ans=0.1 2024-08-19 05:29:14,989 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 300, loss[loss=0.08012, beats_loss=0.01262, ecapa_loss=0.0001504, whisper_loss=0.066, over 23418.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01014, ecapa_loss=0.0001422, whisper_loss=0.08926, over 3010045.22 frames. ], batch size: 94, lr: 2.06e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:29:15,087 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-19 05:29:22,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4303990.0, ans=0.0 2024-08-19 05:29:27,013 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4304090.0, ans=0.0 2024-08-19 05:29:47,831 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.657e+01 2.190e+01 2.363e+01 2.558e+01 4.313e+01, threshold=4.727e+01, percent-clipped=0.0 2024-08-19 05:29:49,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4304190.0, ans=0.2 2024-08-19 05:29:52,867 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 26 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-19 05:30:14,492 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4304390.0, ans=0.0 2024-08-19 05:30:17,658 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 350, loss[loss=0.1022, beats_loss=0.01059, ecapa_loss=0.0001515, whisper_loss=0.09012, over 21840.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01023, ecapa_loss=0.0001414, whisper_loss=0.0889, over 3170081.46 frames. ], batch size: 91, lr: 2.06e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:30:20,540 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-19 05:30:30,427 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 21 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-19 05:30:30,751 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4304590.0, ans=0.125 2024-08-19 05:30:42,646 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-19 05:30:49,825 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 20 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-19 05:30:57,481 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4304790.0, ans=0.2 2024-08-19 05:31:03,039 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4304790.0, ans=0.0 2024-08-19 05:31:04,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4304790.0, ans=0.125 2024-08-19 05:31:18,920 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4304990.0, ans=0.125 2024-08-19 05:31:19,716 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 400, loss[loss=0.1154, beats_loss=0.008739, ecapa_loss=0.0001212, whisper_loss=0.1055, over 17381.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01028, ecapa_loss=0.0001402, whisper_loss=0.08905, over 3308001.45 frames. ], batch size: 62, lr: 2.06e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:31:24,101 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4304990.0, ans=0.125 2024-08-19 05:31:28,704 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 24 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-19 05:31:36,078 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 29 from LS+wenet, 29 from Vox, 24 fro AS 2024-08-19 05:31:36,307 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4305090.0, ans=0.125 2024-08-19 05:31:42,794 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4305090.0, ans=0.2 2024-08-19 05:31:51,247 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4305190.0, ans=0.0 2024-08-19 05:31:51,890 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.222e+01 2.400e+01 2.669e+01 1.087e+02, threshold=4.801e+01, percent-clipped=1.0 2024-08-19 05:31:57,857 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.98 vs. limit=15.0 2024-08-19 05:31:59,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4305290.0, ans=0.125 2024-08-19 05:32:05,781 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-19 05:32:15,726 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4305390.0, ans=0.1 2024-08-19 05:32:18,581 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4305390.0, ans=0.0 2024-08-19 05:32:19,766 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4305390.0, ans=0.95 2024-08-19 05:32:20,622 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 15 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-19 05:32:21,890 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 450, loss[loss=0.0861, beats_loss=0.01024, ecapa_loss=9.633e-05, whisper_loss=0.0749, over 14773.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01028, ecapa_loss=0.0001408, whisper_loss=0.08985, over 3437907.30 frames. ], batch size: 54, lr: 2.06e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:32:28,697 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.19 vs. limit=22.5 2024-08-19 05:32:32,739 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 22 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-19 05:32:48,158 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 05:32:59,624 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=4305790.0, ans=0.1 2024-08-19 05:33:03,821 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.35 vs. limit=15.0 2024-08-19 05:33:06,943 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 20 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-19 05:33:08,166 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-19 05:33:09,284 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 20 from LS+wenet, 31 from Vox, 40 fro AS 2024-08-19 05:33:24,074 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 500, loss[loss=0.08618, beats_loss=0.009619, ecapa_loss=0.0001502, whisper_loss=0.07506, over 14371.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01029, ecapa_loss=0.0001408, whisper_loss=0.08964, over 3518636.48 frames. ], batch size: 58, lr: 2.06e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:33:31,631 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 17 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-19 05:33:31,997 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.845e+05 2024-08-19 05:33:39,507 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 28 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-19 05:33:40,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4306090.0, ans=0.125 2024-08-19 05:33:43,019 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 15 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-19 05:33:56,742 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.612e+01 2.301e+01 2.420e+01 2.660e+01 4.195e+01, threshold=4.841e+01, percent-clipped=0.0 2024-08-19 05:34:08,046 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 28 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-19 05:34:10,721 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4306290.0, ans=0.2 2024-08-19 05:34:14,621 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4306390.0, ans=0.125 2024-08-19 05:34:16,080 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.79 vs. limit=15.0 2024-08-19 05:34:26,667 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 550, loss[loss=0.1314, beats_loss=0.008196, ecapa_loss=0.0001459, whisper_loss=0.1217, over 18472.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01028, ecapa_loss=0.0001411, whisper_loss=0.08963, over 3603474.85 frames. ], batch size: 70, lr: 2.06e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:34:30,366 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-19 05:34:30,606 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4306490.0, ans=0.0 2024-08-19 05:34:35,558 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-19 05:34:46,746 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4306590.0, ans=0.1 2024-08-19 05:34:58,316 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4306690.0, ans=0.2 2024-08-19 05:35:04,173 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4306790.0, ans=0.2 2024-08-19 05:35:12,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4306790.0, ans=0.1 2024-08-19 05:35:19,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4306890.0, ans=0.0 2024-08-19 05:35:28,488 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 600, loss[loss=0.1004, beats_loss=0.01067, ecapa_loss=0.0001566, whisper_loss=0.08818, over 21044.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01031, ecapa_loss=0.0001408, whisper_loss=0.08934, over 3647923.87 frames. ], batch size: 89, lr: 2.06e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:35:40,334 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 05:35:42,758 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4307090.0, ans=0.0 2024-08-19 05:35:47,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4307090.0, ans=0.125 2024-08-19 05:36:01,145 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.277e+01 2.491e+01 2.795e+01 3.103e+02, threshold=4.982e+01, percent-clipped=2.0 2024-08-19 05:36:22,142 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 23 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-19 05:36:28,426 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 26 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-19 05:36:30,722 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 650, loss[loss=0.1023, beats_loss=0.007581, ecapa_loss=0.0001613, whisper_loss=0.09313, over 23825.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01027, ecapa_loss=0.0001409, whisper_loss=0.08944, over 3706179.46 frames. ], batch size: 94, lr: 2.06e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:36:34,602 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-19 05:36:42,968 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 19 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-19 05:36:44,157 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 19 from LS+wenet, 12 from Vox, 37 fro AS 2024-08-19 05:36:44,219 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4307590.0, ans=0.1 2024-08-19 05:37:01,440 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-19 05:37:05,020 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-19 05:37:12,903 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4307790.0, ans=0.125 2024-08-19 05:37:17,798 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4307790.0, ans=0.125 2024-08-19 05:37:32,109 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 700, loss[loss=0.09857, beats_loss=0.01084, ecapa_loss=0.0001202, whisper_loss=0.08653, over 22749.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01036, ecapa_loss=0.0001406, whisper_loss=0.08905, over 3740256.24 frames. ], batch size: 92, lr: 2.06e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:37:32,703 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.126e+01 2024-08-19 05:37:45,983 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4308090.0, ans=0.1 2024-08-19 05:38:03,910 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.712e+01 2.327e+01 2.528e+01 2.779e+01 3.860e+01, threshold=5.056e+01, percent-clipped=1.0 2024-08-19 05:38:13,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=4308290.0, ans=10.0 2024-08-19 05:38:20,258 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-19 05:38:20,509 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4308390.0, ans=0.2 2024-08-19 05:38:26,359 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-19 05:38:26,727 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4308390.0, ans=0.1 2024-08-19 05:38:30,824 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4308390.0, ans=0.125 2024-08-19 05:38:33,924 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 750, loss[loss=0.1385, beats_loss=0.005866, ecapa_loss=0.0001863, whisper_loss=0.1308, over 18226.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01033, ecapa_loss=0.0001404, whisper_loss=0.08882, over 3733467.93 frames. ], batch size: 74, lr: 2.06e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:38:38,954 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-19 05:38:40,078 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-19 05:38:40,319 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4308490.0, ans=0.0 2024-08-19 05:38:55,158 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-19 05:38:56,340 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 23 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-19 05:38:58,537 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 10 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-19 05:39:01,163 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4308690.0, ans=0.1 2024-08-19 05:39:03,495 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-19 05:39:09,186 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.35 vs. limit=12.0 2024-08-19 05:39:13,774 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4308790.0, ans=0.125 2024-08-19 05:39:17,091 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 18 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-19 05:39:35,144 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 800, loss[loss=0.1015, beats_loss=0.009492, ecapa_loss=0.0001475, whisper_loss=0.0905, over 22794.00 frames. ], tot_loss[loss=0.0995, beats_loss=0.01042, ecapa_loss=0.0001399, whisper_loss=0.08768, over 3761171.25 frames. ], batch size: 93, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:39:38,903 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-19 05:39:40,135 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 16 from LS+wenet, 8 from Vox, 29 fro AS 2024-08-19 05:39:40,715 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.33 vs. limit=6.0 2024-08-19 05:39:42,924 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4308990.0, ans=0.1 2024-08-19 05:40:05,644 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4309190.0, ans=0.125 2024-08-19 05:40:07,650 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.249e+01 2.414e+01 2.640e+01 3.694e+01, threshold=4.828e+01, percent-clipped=0.0 2024-08-19 05:40:09,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4309190.0, ans=0.2 2024-08-19 05:40:09,580 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.06 vs. limit=22.5 2024-08-19 05:40:16,722 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4309290.0, ans=0.1 2024-08-19 05:40:33,412 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=7.341e-02 2024-08-19 05:40:37,923 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 850, loss[loss=0.08691, beats_loss=0.01104, ecapa_loss=0.0001582, whisper_loss=0.07429, over 14855.00 frames. ], tot_loss[loss=0.09939, beats_loss=0.01052, ecapa_loss=0.0001394, whisper_loss=0.08748, over 3775993.04 frames. ], batch size: 60, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:40:55,854 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.78 vs. limit=12.0 2024-08-19 05:41:00,484 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-19 05:41:09,094 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 20 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-19 05:41:12,994 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-19 05:41:20,189 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.91 vs. limit=15.0 2024-08-19 05:41:24,612 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-19 05:41:27,095 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-19 05:41:38,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4309890.0, ans=0.125 2024-08-19 05:41:40,993 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 900, loss[loss=0.08128, beats_loss=0.01151, ecapa_loss=0.0001761, whisper_loss=0.06801, over 16416.00 frames. ], tot_loss[loss=0.09966, beats_loss=0.0104, ecapa_loss=0.0001397, whisper_loss=0.08786, over 3801024.34 frames. ], batch size: 70, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:42:00,787 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.76 vs. limit=22.5 2024-08-19 05:42:03,052 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.57 vs. limit=15.0 2024-08-19 05:42:06,875 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4310190.0, ans=0.1 2024-08-19 05:42:14,286 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.707e+01 2.290e+01 2.531e+01 2.891e+01 5.693e+01, threshold=5.062e+01, percent-clipped=1.0 2024-08-19 05:42:30,141 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4310290.0, ans=0.125 2024-08-19 05:42:45,439 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 950, loss[loss=0.08346, beats_loss=0.01012, ecapa_loss=0.0001319, whisper_loss=0.07202, over 15342.00 frames. ], tot_loss[loss=0.09954, beats_loss=0.0104, ecapa_loss=0.0001393, whisper_loss=0.08775, over 3812947.51 frames. ], batch size: 60, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:42:54,159 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 21 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-19 05:42:54,546 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=4310490.0, ans=10.0 2024-08-19 05:43:04,724 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 25 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-19 05:43:07,337 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-19 05:43:25,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4310790.0, ans=0.0 2024-08-19 05:43:49,827 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 1000, loss[loss=0.1097, beats_loss=0.01228, ecapa_loss=0.0001321, whisper_loss=0.09606, over 21505.00 frames. ], tot_loss[loss=0.09957, beats_loss=0.01043, ecapa_loss=0.0001387, whisper_loss=0.08776, over 3814652.63 frames. ], batch size: 88, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:43:51,272 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-19 05:43:55,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4310990.0, ans=0.0 2024-08-19 05:43:58,675 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4310990.0, ans=0.5 2024-08-19 05:44:04,840 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.04 vs. limit=12.0 2024-08-19 05:44:10,887 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4311090.0, ans=0.1 2024-08-19 05:44:21,556 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 35 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-19 05:44:24,333 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.215e+01 2.410e+01 2.644e+01 3.685e+01, threshold=4.819e+01, percent-clipped=0.0 2024-08-19 05:44:51,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4311390.0, ans=0.0 2024-08-19 05:44:54,475 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.87 vs. limit=10.0 2024-08-19 05:44:56,079 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 1050, loss[loss=0.1026, beats_loss=0.01148, ecapa_loss=0.0001198, whisper_loss=0.08994, over 21479.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01028, ecapa_loss=0.0001383, whisper_loss=0.08874, over 3825226.99 frames. ], batch size: 85, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:45:28,543 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.02 vs. limit=22.5 2024-08-19 05:45:48,801 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4311890.0, ans=0.125 2024-08-19 05:45:52,722 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 21 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-19 05:45:59,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4311890.0, ans=0.1 2024-08-19 05:46:01,932 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 1100, loss[loss=0.1031, beats_loss=0.01101, ecapa_loss=0.0001222, whisper_loss=0.09089, over 22419.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01031, ecapa_loss=0.000139, whisper_loss=0.08903, over 3820546.79 frames. ], batch size: 88, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:46:04,745 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 27 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-19 05:46:12,652 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-19 05:46:12,977 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 05:46:20,513 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 24 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-19 05:46:29,016 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 25 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-19 05:46:29,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4312190.0, ans=0.2 2024-08-19 05:46:29,382 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4312190.0, ans=0.125 2024-08-19 05:46:36,712 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.284e+01 2.503e+01 2.808e+01 4.234e+01, threshold=5.006e+01, percent-clipped=0.0 2024-08-19 05:46:43,399 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.41 vs. limit=22.5 2024-08-19 05:46:48,541 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4312290.0, ans=0.0 2024-08-19 05:47:10,040 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 1150, loss[loss=0.09263, beats_loss=0.01293, ecapa_loss=0.0001177, whisper_loss=0.07852, over 22769.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01028, ecapa_loss=0.0001392, whisper_loss=0.08945, over 3789044.87 frames. ], batch size: 92, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:47:13,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4312490.0, ans=0.0 2024-08-19 05:47:28,520 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 26 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-19 05:47:42,779 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-19 05:47:43,116 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=9.400e+01 2024-08-19 05:47:56,163 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4312790.0, ans=0.0 2024-08-19 05:47:58,762 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 22 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-19 05:48:05,505 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 26 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-19 05:48:21,566 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 1200, loss[loss=0.1227, beats_loss=0.01037, ecapa_loss=9.215e-05, whisper_loss=0.1114, over 20062.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01038, ecapa_loss=0.000139, whisper_loss=0.08865, over 3792110.51 frames. ], batch size: 70, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:48:28,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4312990.0, ans=0.125 2024-08-19 05:48:36,541 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 23 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-19 05:48:51,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4313190.0, ans=0.1 2024-08-19 05:48:59,725 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.227e+01 2.498e+01 2.672e+01 3.472e+01, threshold=4.995e+01, percent-clipped=0.0 2024-08-19 05:49:01,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4313190.0, ans=10.0 2024-08-19 05:49:21,983 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4313390.0, ans=0.1 2024-08-19 05:49:23,367 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4313390.0, ans=0.1 2024-08-19 05:49:30,755 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=5.249e-03 2024-08-19 05:49:34,264 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 1250, loss[loss=0.1063, beats_loss=0.01173, ecapa_loss=0.0001169, whisper_loss=0.09337, over 22109.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.0104, ecapa_loss=0.0001388, whisper_loss=0.08886, over 3783118.88 frames. ], batch size: 87, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:49:42,380 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4313490.0, ans=0.125 2024-08-19 05:50:03,306 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4313690.0, ans=0.125 2024-08-19 05:50:08,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4313690.0, ans=0.1 2024-08-19 05:50:09,885 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 24 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-19 05:50:24,654 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.53 vs. limit=12.0 2024-08-19 05:50:25,788 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4313790.0, ans=0.1 2024-08-19 05:50:34,716 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4313890.0, ans=0.0 2024-08-19 05:50:41,984 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4313890.0, ans=0.0 2024-08-19 05:50:45,942 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 23 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-19 05:50:48,230 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 1300, loss[loss=0.09711, beats_loss=0.01048, ecapa_loss=0.0001165, whisper_loss=0.08547, over 18083.00 frames. ], tot_loss[loss=0.09957, beats_loss=0.01048, ecapa_loss=0.0001403, whisper_loss=0.08768, over 3790223.63 frames. ], batch size: 70, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:50:52,605 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 16 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-19 05:50:57,589 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 35 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-19 05:51:02,481 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4314090.0, ans=0.0 2024-08-19 05:51:05,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4314090.0, ans=0.125 2024-08-19 05:51:12,554 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=4314090.0, ans=0.025 2024-08-19 05:51:23,584 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 26 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-19 05:51:26,422 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.752e+01 2.267e+01 2.463e+01 2.675e+01 4.452e+01, threshold=4.926e+01, percent-clipped=0.0 2024-08-19 05:51:54,457 WARNING [optim.py:496] (2/4) Scaling gradients by 0.009816886857151985, model_norm_threshold=49.264041900634766 2024-08-19 05:51:54,650 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.0.norm.log_scale with proportion 0.15, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.852e+06, grad_sumsq=3.852e+06, orig_rms_sq=1.000e+00 2024-08-19 05:52:02,061 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 1350, loss[loss=0.1205, beats_loss=0.01013, ecapa_loss=0.0001407, whisper_loss=0.1089, over 23307.00 frames. ], tot_loss[loss=0.1, beats_loss=0.0104, ecapa_loss=0.0001397, whisper_loss=0.08824, over 3780568.18 frames. ], batch size: 92, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:52:07,236 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4314490.0, ans=0.125 2024-08-19 05:52:12,240 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4314490.0, ans=0.2 2024-08-19 05:52:21,196 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-19 05:52:36,320 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4314690.0, ans=0.125 2024-08-19 05:52:45,462 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 30 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-19 05:52:47,256 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 19 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-19 05:52:50,401 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 24 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-19 05:52:53,553 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 13 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-19 05:53:18,921 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 1400, loss[loss=0.09623, beats_loss=0.01038, ecapa_loss=0.0001519, whisper_loss=0.08433, over 15429.00 frames. ], tot_loss[loss=0.09997, beats_loss=0.0105, ecapa_loss=0.0001383, whisper_loss=0.08809, over 3789866.89 frames. ], batch size: 64, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:53:23,501 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 28 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-19 05:53:28,076 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-19 05:53:29,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4314990.0, ans=0.0 2024-08-19 05:53:40,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4315090.0, ans=0.125 2024-08-19 05:53:57,443 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.218e+01 2.484e+01 2.837e+01 5.018e+03, threshold=4.968e+01, percent-clipped=1.0 2024-08-19 05:54:09,036 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-19 05:54:15,751 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4315290.0, ans=0.07 2024-08-19 05:54:18,012 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 31 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-19 05:54:21,870 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.53 vs. limit=10.0 2024-08-19 05:54:28,749 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4315390.0, ans=0.05 2024-08-19 05:54:53,139 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 1450, loss[loss=0.1182, beats_loss=0.008962, ecapa_loss=0.0001392, whisper_loss=0.1078, over 18324.00 frames. ], tot_loss[loss=0.09999, beats_loss=0.01046, ecapa_loss=0.0001385, whisper_loss=0.08815, over 3791253.23 frames. ], batch size: 72, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:55:05,950 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4315490.0, ans=0.125 2024-08-19 05:55:07,669 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.69 vs. limit=15.0 2024-08-19 05:55:21,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4315690.0, ans=0.1 2024-08-19 05:55:33,517 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-19 05:55:46,807 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4315790.0, ans=0.2 2024-08-19 05:55:57,393 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 22 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-19 05:56:04,229 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 1500, loss[loss=0.08113, beats_loss=0.01083, ecapa_loss=0.0001497, whisper_loss=0.0688, over 16097.00 frames. ], tot_loss[loss=0.09919, beats_loss=0.01056, ecapa_loss=0.0001376, whisper_loss=0.08726, over 3831026.27 frames. ], batch size: 67, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:56:04,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4315990.0, ans=0.1 2024-08-19 05:56:06,170 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-19 05:56:11,807 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2024-08-19 05:56:29,928 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4316090.0, ans=0.2 2024-08-19 05:56:40,284 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.207e+01 2.443e+01 2.756e+01 5.889e+01, threshold=4.886e+01, percent-clipped=1.0 2024-08-19 05:56:50,409 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 16 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-19 05:56:56,829 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.29 vs. limit=22.5 2024-08-19 05:56:59,146 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4316390.0, ans=0.125 2024-08-19 05:56:59,153 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4316390.0, ans=0.1 2024-08-19 05:56:59,165 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4316390.0, ans=0.125 2024-08-19 05:57:00,284 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 15 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-19 05:57:14,337 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 1550, loss[loss=0.09404, beats_loss=0.01209, ecapa_loss=0.0001117, whisper_loss=0.08084, over 21339.00 frames. ], tot_loss[loss=0.09912, beats_loss=0.01053, ecapa_loss=0.0001373, whisper_loss=0.08721, over 3817944.97 frames. ], batch size: 84, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:57:17,786 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.97 vs. limit=22.5 2024-08-19 05:57:37,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4316590.0, ans=0.2 2024-08-19 05:58:08,749 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.69 vs. limit=22.5 2024-08-19 05:58:09,479 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-19 05:58:22,832 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.45 vs. limit=15.0 2024-08-19 05:58:24,367 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 1600, loss[loss=0.0687, beats_loss=0.01325, ecapa_loss=0.0001415, whisper_loss=0.05403, over 18095.00 frames. ], tot_loss[loss=0.09963, beats_loss=0.01051, ecapa_loss=0.0001374, whisper_loss=0.08775, over 3830061.33 frames. ], batch size: 77, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:58:41,552 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-19 05:58:42,820 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-19 05:58:52,346 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4317190.0, ans=0.125 2024-08-19 05:58:59,943 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.330e+01 2.560e+01 2.867e+01 4.282e+01, threshold=5.120e+01, percent-clipped=0.0 2024-08-19 05:59:03,194 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4317190.0, ans=0.5 2024-08-19 05:59:05,671 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-19 05:59:12,156 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-19 05:59:27,382 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4317390.0, ans=0.125 2024-08-19 05:59:31,767 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 1650, loss[loss=0.1045, beats_loss=0.01103, ecapa_loss=0.0001236, whisper_loss=0.09224, over 16543.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01039, ecapa_loss=0.0001377, whisper_loss=0.08897, over 3808362.72 frames. ], batch size: 63, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:59:37,124 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 29 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-19 05:59:38,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4317490.0, ans=0.0 2024-08-19 06:00:16,862 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=4317790.0, ans=10.0 2024-08-19 06:00:22,042 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 23 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-19 06:00:23,635 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4317890.0, ans=0.05 2024-08-19 06:00:27,050 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-19 06:00:31,957 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 15 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-19 06:00:32,853 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.00 vs. limit=15.0 2024-08-19 06:00:37,906 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 1700, loss[loss=0.1083, beats_loss=0.009177, ecapa_loss=0.0001179, whisper_loss=0.09793, over 20343.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01036, ecapa_loss=0.0001378, whisper_loss=0.08862, over 3799898.42 frames. ], batch size: 75, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:00:51,176 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 25 from LS+wenet, 8 from Vox, 37 fro AS 2024-08-19 06:01:04,876 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-19 06:01:11,376 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.261e+01 2.522e+01 2.818e+01 7.809e+01, threshold=5.044e+01, percent-clipped=1.0 2024-08-19 06:01:16,882 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 30 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-19 06:01:28,376 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4318290.0, ans=0.125 2024-08-19 06:01:41,563 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4318390.0, ans=0.0 2024-08-19 06:01:45,061 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 1750, loss[loss=0.07483, beats_loss=0.01201, ecapa_loss=0.000137, whisper_loss=0.06145, over 17256.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01027, ecapa_loss=0.0001381, whisper_loss=0.08927, over 3818524.24 frames. ], batch size: 71, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:01:52,442 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4318490.0, ans=0.2 2024-08-19 06:01:57,805 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4318490.0, ans=0.125 2024-08-19 06:02:10,211 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4318590.0, ans=0.0 2024-08-19 06:02:12,729 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-19 06:02:42,231 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-19 06:02:57,867 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 17 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-19 06:03:03,155 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 1800, loss[loss=0.08564, beats_loss=0.01136, ecapa_loss=0.0001252, whisper_loss=0.07303, over 16170.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01026, ecapa_loss=0.0001388, whisper_loss=0.08937, over 3789472.40 frames. ], batch size: 64, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:03:12,016 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.737e-01 2024-08-19 06:03:13,550 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4318990.0, ans=0.015 2024-08-19 06:03:13,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4318990.0, ans=0.125 2024-08-19 06:03:13,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4318990.0, ans=0.0 2024-08-19 06:03:17,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4318990.0, ans=0.1 2024-08-19 06:03:22,261 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 16 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-19 06:03:26,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4319090.0, ans=0.1 2024-08-19 06:03:32,813 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 21 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-19 06:03:49,009 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.637e+01 2.229e+01 2.414e+01 2.728e+01 1.792e+02, threshold=4.829e+01, percent-clipped=1.0 2024-08-19 06:04:10,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4319290.0, ans=0.0 2024-08-19 06:04:10,398 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.55 vs. limit=10.0 2024-08-19 06:04:13,809 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4319290.0, ans=0.1 2024-08-19 06:04:23,253 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.70 vs. limit=10.0 2024-08-19 06:04:26,606 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 15 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-19 06:04:35,210 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 1850, loss[loss=0.09404, beats_loss=0.01339, ecapa_loss=0.0001044, whisper_loss=0.07961, over 22717.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01022, ecapa_loss=0.0001395, whisper_loss=0.08944, over 3798665.01 frames. ], batch size: 90, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:04:44,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4319490.0, ans=0.125 2024-08-19 06:05:02,144 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4319590.0, ans=0.2 2024-08-19 06:05:05,200 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4319590.0, ans=0.1 2024-08-19 06:05:15,134 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4319690.0, ans=0.1 2024-08-19 06:05:21,582 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4319690.0, ans=0.125 2024-08-19 06:06:14,321 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 1900, loss[loss=0.1211, beats_loss=0.008797, ecapa_loss=0.0001506, whisper_loss=0.1108, over 22250.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01012, ecapa_loss=0.0001397, whisper_loss=0.09011, over 3798839.50 frames. ], batch size: 88, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:06:25,841 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4319990.0, ans=0.0 2024-08-19 06:06:29,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4319990.0, ans=0.125 2024-08-19 06:06:44,069 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 16 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-19 06:07:13,365 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.747e+01 2.248e+01 2.499e+01 2.694e+01 3.637e+01, threshold=4.999e+01, percent-clipped=0.0 2024-08-19 06:07:15,111 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4320190.0, ans=0.125 2024-08-19 06:07:20,914 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-19 06:07:46,562 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4320390.0, ans=0.125 2024-08-19 06:08:09,160 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 1950, loss[loss=0.08491, beats_loss=0.01257, ecapa_loss=0.0001323, whisper_loss=0.07102, over 19334.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01013, ecapa_loss=0.0001399, whisper_loss=0.08937, over 3795976.97 frames. ], batch size: 79, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:08:13,170 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-19 06:08:15,979 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4320490.0, ans=0.0 2024-08-19 06:08:44,911 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.71 vs. limit=6.0 2024-08-19 06:08:52,068 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-19 06:09:11,122 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4320690.0, ans=0.07 2024-08-19 06:09:35,660 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 22 from LS+wenet, 13 from Vox, 48 fro AS 2024-08-19 06:10:07,826 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 2000, loss[loss=0.1142, beats_loss=0.01049, ecapa_loss=0.0001326, whisper_loss=0.1024, over 21584.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01016, ecapa_loss=0.000139, whisper_loss=0.08975, over 3805691.75 frames. ], batch size: 80, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:10:36,790 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-19 06:10:49,200 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 33 from Vox, 33 fro AS 2024-08-19 06:10:55,258 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 17 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-19 06:11:04,805 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.942e+01 2.279e+01 2.537e+01 2.810e+01 5.508e+01, threshold=5.074e+01, percent-clipped=1.0 2024-08-19 06:11:05,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4321190.0, ans=0.125 2024-08-19 06:11:06,294 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-19 06:11:17,005 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4321290.0, ans=0.0 2024-08-19 06:11:21,200 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 25 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-19 06:11:32,095 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-19 06:11:39,785 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4321490.0, ans=0.04949747468305833 2024-08-19 06:11:40,479 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 2050, loss[loss=0.1105, beats_loss=0.008328, ecapa_loss=0.0001552, whisper_loss=0.1006, over 15479.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01021, ecapa_loss=0.000139, whisper_loss=0.08896, over 3805968.39 frames. ], batch size: 61, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:12:04,119 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4321590.0, ans=0.1 2024-08-19 06:12:10,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4321690.0, ans=0.125 2024-08-19 06:12:12,400 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 22 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-19 06:12:15,948 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4321690.0, ans=0.0 2024-08-19 06:12:17,170 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 14 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-19 06:12:27,254 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 18 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-19 06:12:35,745 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4321790.0, ans=0.125 2024-08-19 06:12:53,930 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 31 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-19 06:12:57,721 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 2100, loss[loss=0.1182, beats_loss=0.0103, ecapa_loss=0.0001028, whisper_loss=0.1069, over 17560.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01029, ecapa_loss=0.0001378, whisper_loss=0.08899, over 3826402.91 frames. ], batch size: 64, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:13:07,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4321990.0, ans=0.2 2024-08-19 06:13:08,257 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.59 vs. limit=22.5 2024-08-19 06:13:39,222 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.726e+01 2.264e+01 2.459e+01 2.750e+01 5.104e+01, threshold=4.918e+01, percent-clipped=1.0 2024-08-19 06:14:07,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4322390.0, ans=0.0 2024-08-19 06:14:17,392 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 2150, loss[loss=0.1221, beats_loss=0.008917, ecapa_loss=9.904e-05, whisper_loss=0.1122, over 15726.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01029, ecapa_loss=0.0001369, whisper_loss=0.08942, over 3833641.11 frames. ], batch size: 55, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:14:21,075 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4322490.0, ans=0.1 2024-08-19 06:14:39,986 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4322590.0, ans=0.1 2024-08-19 06:14:40,056 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4322590.0, ans=0.125 2024-08-19 06:14:40,560 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.77 vs. limit=10.0 2024-08-19 06:14:40,995 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 29 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-19 06:14:51,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4322690.0, ans=0.125 2024-08-19 06:15:05,259 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4322790.0, ans=0.125 2024-08-19 06:15:08,360 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 20 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-19 06:15:13,864 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 14 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-19 06:15:20,533 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-19 06:15:32,092 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4322890.0, ans=0.0 2024-08-19 06:15:38,170 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 2200, loss[loss=0.1101, beats_loss=0.0103, ecapa_loss=0.0001465, whisper_loss=0.09831, over 22420.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01033, ecapa_loss=0.0001364, whisper_loss=0.08979, over 3806316.98 frames. ], batch size: 92, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:16:10,799 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4323190.0, ans=0.1 2024-08-19 06:16:17,794 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.245e+01 2.459e+01 2.668e+01 3.758e+01, threshold=4.917e+01, percent-clipped=0.0 2024-08-19 06:16:20,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4323190.0, ans=0.1 2024-08-19 06:16:26,461 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.228e-01 2024-08-19 06:16:30,146 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4323290.0, ans=0.125 2024-08-19 06:16:36,519 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4323290.0, ans=0.0 2024-08-19 06:16:42,903 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4323390.0, ans=0.125 2024-08-19 06:16:49,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4323390.0, ans=0.125 2024-08-19 06:16:52,018 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=4323390.0, ans=0.5 2024-08-19 06:16:55,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4323490.0, ans=0.1 2024-08-19 06:16:56,483 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 2250, loss[loss=0.1014, beats_loss=0.01064, ecapa_loss=0.0001482, whisper_loss=0.08928, over 21840.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01044, ecapa_loss=0.000137, whisper_loss=0.08959, over 3833374.07 frames. ], batch size: 89, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:17:00,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4323490.0, ans=0.2 2024-08-19 06:17:07,016 WARNING [optim.py:496] (2/4) Scaling gradients by 0.027778564020991325, model_norm_threshold=49.17220687866211 2024-08-19 06:17:07,180 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.0.norm.log_scale with proportion 0.24, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.536e+05, grad_sumsq=7.536e+05, orig_rms_sq=1.000e+00 2024-08-19 06:17:07,741 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-19 06:17:21,720 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 27 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-19 06:17:22,025 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4323590.0, ans=0.0 2024-08-19 06:17:28,899 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-19 06:17:59,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4323890.0, ans=0.125 2024-08-19 06:18:14,400 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 2300, loss[loss=0.1281, beats_loss=0.007782, ecapa_loss=0.0001273, whisper_loss=0.119, over 20629.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01053, ecapa_loss=0.0001381, whisper_loss=0.08974, over 3837805.06 frames. ], batch size: 77, lr: 2.05e-03, grad_scale: 1.152921504606847e+18 2024-08-19 06:18:22,614 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2024-08-19 06:18:27,048 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.85 vs. limit=15.0 2024-08-19 06:18:47,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4324190.0, ans=0.125 2024-08-19 06:18:53,068 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.282e+01 2.493e+01 2.821e+01 1.770e+03, threshold=4.986e+01, percent-clipped=1.0 2024-08-19 06:18:59,066 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4324290.0, ans=0.125 2024-08-19 06:19:01,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4324290.0, ans=0.2 2024-08-19 06:19:09,233 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 19 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-19 06:19:09,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4324290.0, ans=0.125 2024-08-19 06:19:11,164 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.92 vs. limit=15.0 2024-08-19 06:19:12,586 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.500e+05 2024-08-19 06:19:14,378 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4324390.0, ans=0.125 2024-08-19 06:19:31,978 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 2350, loss[loss=0.1036, beats_loss=0.008996, ecapa_loss=0.0001792, whisper_loss=0.09278, over 13947.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01046, ecapa_loss=0.0001387, whisper_loss=0.08998, over 3823457.92 frames. ], batch size: 56, lr: 2.05e-03, grad_scale: 1.152921504606847e+18 2024-08-19 06:19:38,772 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.25 vs. limit=15.0 2024-08-19 06:19:41,571 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 20 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-19 06:19:56,094 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4324590.0, ans=0.125 2024-08-19 06:20:15,309 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 23 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-19 06:20:18,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4324790.0, ans=0.1 2024-08-19 06:20:19,559 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-19 06:20:22,174 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 22 from LS+wenet, 17 from Vox, 51 fro AS 2024-08-19 06:20:25,763 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4324790.0, ans=0.2 2024-08-19 06:20:35,823 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4324890.0, ans=0.1 2024-08-19 06:20:43,130 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4324890.0, ans=0.0 2024-08-19 06:20:49,906 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 2400, loss[loss=0.09974, beats_loss=0.01355, ecapa_loss=0.0001493, whisper_loss=0.0847, over 22330.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01047, ecapa_loss=0.0001393, whisper_loss=0.09019, over 3829085.09 frames. ], batch size: 94, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:20:52,276 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4324990.0, ans=0.1 2024-08-19 06:21:20,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4325190.0, ans=0.2 2024-08-19 06:21:23,463 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 17 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-19 06:21:30,502 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.266e+01 2.549e+01 2.810e+01 4.372e+01, threshold=5.098e+01, percent-clipped=0.0 2024-08-19 06:21:47,875 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 20 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-19 06:21:49,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4325390.0, ans=0.125 2024-08-19 06:21:52,766 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4325390.0, ans=0.2 2024-08-19 06:21:54,491 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.64 vs. limit=15.0 2024-08-19 06:21:55,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4325390.0, ans=0.2 2024-08-19 06:22:06,950 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 2450, loss[loss=0.08245, beats_loss=0.01368, ecapa_loss=0.0001308, whisper_loss=0.06746, over 21410.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01045, ecapa_loss=0.0001394, whisper_loss=0.09003, over 3846039.17 frames. ], batch size: 89, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:22:16,357 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4325490.0, ans=0.125 2024-08-19 06:22:31,113 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 16 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-19 06:22:44,480 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4325690.0, ans=0.125 2024-08-19 06:23:01,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4325790.0, ans=0.1 2024-08-19 06:23:08,964 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4325890.0, ans=0.125 2024-08-19 06:23:18,182 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=4325890.0, ans=0.5 2024-08-19 06:23:24,943 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 2500, loss[loss=0.1056, beats_loss=0.01043, ecapa_loss=0.0001511, whisper_loss=0.09369, over 19697.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01049, ecapa_loss=0.0001386, whisper_loss=0.08952, over 3844834.42 frames. ], batch size: 81, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:23:29,319 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 19 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-19 06:23:36,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4325990.0, ans=0.125 2024-08-19 06:23:40,601 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4326090.0, ans=0.125 2024-08-19 06:23:42,447 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.43 vs. limit=15.0 2024-08-19 06:24:06,659 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.293e+01 2.574e+01 2.771e+01 4.931e+01, threshold=5.149e+01, percent-clipped=0.0 2024-08-19 06:24:21,681 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.91 vs. limit=15.0 2024-08-19 06:24:45,746 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 2550, loss[loss=0.09769, beats_loss=0.00889, ecapa_loss=0.0001766, whisper_loss=0.08704, over 17589.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01043, ecapa_loss=0.0001391, whisper_loss=0.09031, over 3865478.74 frames. ], batch size: 71, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:24:48,150 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.08 vs. limit=15.0 2024-08-19 06:25:03,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4326590.0, ans=0.125 2024-08-19 06:25:09,243 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 15 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-19 06:25:20,404 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-19 06:25:22,015 WARNING [optim.py:496] (2/4) Scaling gradients by 0.054495006799697876, model_norm_threshold=51.48568344116211 2024-08-19 06:25:22,180 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.10, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=9.141e+04, grad_sumsq=2.777e+04, orig_rms_sq=3.292e+00 2024-08-19 06:25:22,597 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4326690.0, ans=0.1 2024-08-19 06:25:40,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4326790.0, ans=0.125 2024-08-19 06:25:47,846 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 17 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-19 06:26:04,344 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 2600, loss[loss=0.1169, beats_loss=0.008395, ecapa_loss=0.0001261, whisper_loss=0.1073, over 18903.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01049, ecapa_loss=0.000139, whisper_loss=0.09063, over 3842095.96 frames. ], batch size: 72, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:26:06,470 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4326990.0, ans=0.125 2024-08-19 06:26:12,769 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 20 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-19 06:26:25,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4327090.0, ans=0.1 2024-08-19 06:26:25,313 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.88 vs. limit=6.0 2024-08-19 06:26:32,472 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4327090.0, ans=0.125 2024-08-19 06:26:45,668 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.338e+01 2.555e+01 2.845e+01 9.448e+02, threshold=5.110e+01, percent-clipped=2.0 2024-08-19 06:26:50,743 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-19 06:26:50,966 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4327290.0, ans=0.0 2024-08-19 06:27:04,034 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.85 vs. limit=15.0 2024-08-19 06:27:09,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4327390.0, ans=0.0 2024-08-19 06:27:23,436 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 2650, loss[loss=0.08436, beats_loss=0.009562, ecapa_loss=0.0001463, whisper_loss=0.07333, over 16002.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01044, ecapa_loss=0.0001401, whisper_loss=0.09068, over 3867720.45 frames. ], batch size: 64, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:27:51,374 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-19 06:27:53,180 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.16 vs. limit=10.0 2024-08-19 06:28:08,107 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4327690.0, ans=0.125 2024-08-19 06:28:09,878 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.88 vs. limit=15.0 2024-08-19 06:28:29,352 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 28 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-19 06:28:41,780 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 2700, loss[loss=0.09211, beats_loss=0.01219, ecapa_loss=0.0001315, whisper_loss=0.07861, over 19687.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01041, ecapa_loss=0.0001405, whisper_loss=0.09037, over 3889978.79 frames. ], batch size: 81, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:28:45,276 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4327990.0, ans=0.0 2024-08-19 06:28:51,246 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-19 06:29:07,611 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-19 06:29:18,794 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 12 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-19 06:29:19,375 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.66 vs. limit=12.0 2024-08-19 06:29:24,270 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.337e+01 2.538e+01 2.914e+01 3.709e+01, threshold=5.076e+01, percent-clipped=0.0 2024-08-19 06:29:31,495 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4328290.0, ans=0.0 2024-08-19 06:29:40,748 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-19 06:30:00,770 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 2750, loss[loss=0.1087, beats_loss=0.01064, ecapa_loss=0.000129, whisper_loss=0.09675, over 15949.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01037, ecapa_loss=0.0001399, whisper_loss=0.09059, over 3865394.08 frames. ], batch size: 63, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:30:05,728 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4328490.0, ans=0.0 2024-08-19 06:30:15,276 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4328590.0, ans=0.0 2024-08-19 06:30:38,879 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4328690.0, ans=0.1 2024-08-19 06:31:20,976 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 2800, loss[loss=0.1238, beats_loss=0.00931, ecapa_loss=0.0001541, whisper_loss=0.113, over 20395.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01039, ecapa_loss=0.0001395, whisper_loss=0.09092, over 3878185.68 frames. ], batch size: 80, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:31:34,439 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4328990.0, ans=0.09899494936611666 2024-08-19 06:31:43,693 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 16 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-19 06:31:45,553 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4329090.0, ans=0.0 2024-08-19 06:31:53,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4329190.0, ans=0.035 2024-08-19 06:32:04,776 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.348e+01 2.575e+01 2.807e+01 4.733e+01, threshold=5.150e+01, percent-clipped=0.0 2024-08-19 06:32:14,621 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 22 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-19 06:32:41,437 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 2850, loss[loss=0.09698, beats_loss=0.01017, ecapa_loss=0.000161, whisper_loss=0.0852, over 17192.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01039, ecapa_loss=0.0001397, whisper_loss=0.09085, over 3864132.74 frames. ], batch size: 71, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:32:46,210 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.47 vs. limit=22.5 2024-08-19 06:32:49,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4329490.0, ans=0.125 2024-08-19 06:32:50,353 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.99 vs. limit=15.0 2024-08-19 06:32:56,686 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 12 from Vox, 48 fro AS 2024-08-19 06:33:41,742 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4329790.0, ans=0.125 2024-08-19 06:34:01,934 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 2900, loss[loss=0.09284, beats_loss=0.01108, ecapa_loss=0.0001568, whisper_loss=0.08019, over 15580.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01042, ecapa_loss=0.0001395, whisper_loss=0.09051, over 3873627.99 frames. ], batch size: 64, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:34:21,851 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.39 vs. limit=15.0 2024-08-19 06:34:22,601 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-19 06:34:44,963 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-19 06:34:46,027 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.702e+01 2.333e+01 2.520e+01 2.841e+01 3.701e+01, threshold=5.040e+01, percent-clipped=0.0 2024-08-19 06:34:52,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4330290.0, ans=0.0 2024-08-19 06:34:58,546 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-19 06:34:59,782 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 16 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-19 06:35:13,122 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 35 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-19 06:35:14,909 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4330390.0, ans=0.95 2024-08-19 06:35:21,641 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 2950, loss[loss=0.0994, beats_loss=0.009596, ecapa_loss=0.0001541, whisper_loss=0.08827, over 19780.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01037, ecapa_loss=0.0001397, whisper_loss=0.0908, over 3872798.66 frames. ], batch size: 80, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:35:25,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4330490.0, ans=0.125 2024-08-19 06:35:29,975 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-19 06:35:30,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4330490.0, ans=0.125 2024-08-19 06:35:44,928 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 23 from LS+wenet, 28 from Vox, 25 fro AS 2024-08-19 06:35:47,954 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-19 06:36:02,681 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4330690.0, ans=0.125 2024-08-19 06:36:06,364 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.75 vs. limit=15.0 2024-08-19 06:36:22,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4330790.0, ans=0.125 2024-08-19 06:36:35,413 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 18 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-19 06:36:36,841 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 26 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-19 06:36:44,147 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 3000, loss[loss=0.1057, beats_loss=0.009255, ecapa_loss=0.0001345, whisper_loss=0.09509, over 23490.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01025, ecapa_loss=0.0001405, whisper_loss=0.09143, over 3886930.48 frames. ], batch size: 91, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:36:44,148 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-19 06:37:11,098 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.4598, 1.5644, 3.3481, 2.1788, 2.4010, 2.6158, 3.5605, 3.4457], device='cuda:2') 2024-08-19 06:37:20,869 INFO [train_multi_KD3.py:1149] (2/4) Epoch 30, validation on ASR_libri: loss=0.2537, beats_loss=0, ecapa_loss=0.0005147, whisper_loss=0.2486, over 922467.00 frames. 2024-08-19 06:37:39,404 INFO [train_multi_KD3.py:1149] (2/4) Epoch 30, validation on SV_voxceleb1: loss=0.003985, beats_loss=0, ecapa_loss=0.0003985, whisper_loss=0, over 939242.00 frames. 2024-08-19 06:39:27,107 INFO [train_multi_KD3.py:1149] (2/4) Epoch 30, validation on AT_audioset: loss=0.02305, beats_loss=0.02305, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 06:39:27,110 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31584MB 2024-08-19 06:39:58,790 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.19 vs. limit=22.5 2024-08-19 06:39:59,647 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 34 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-19 06:40:05,527 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4331190.0, ans=0.125 2024-08-19 06:40:11,476 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.401e+01 2.628e+01 2.942e+01 3.563e+02, threshold=5.255e+01, percent-clipped=2.0 2024-08-19 06:40:44,444 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.09 vs. limit=15.0 2024-08-19 06:40:45,827 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4331390.0, ans=0.0 2024-08-19 06:40:47,479 WARNING [optim.py:496] (2/4) Scaling gradients by 0.05165092274546623, model_norm_threshold=52.55119705200195 2024-08-19 06:40:47,640 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.383e+05, grad_sumsq=1.383e+05, orig_rms_sq=1.000e+00 2024-08-19 06:40:49,671 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-19 06:40:53,230 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 3050, loss[loss=0.09684, beats_loss=0.01102, ecapa_loss=0.0001764, whisper_loss=0.08405, over 15042.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01042, ecapa_loss=0.0001395, whisper_loss=0.09103, over 3904064.61 frames. ], batch size: 60, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:41:15,045 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4331590.0, ans=0.0 2024-08-19 06:41:26,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4331690.0, ans=0.0 2024-08-19 06:41:32,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=4331690.0, ans=0.1 2024-08-19 06:41:34,043 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.20 vs. limit=15.0 2024-08-19 06:41:38,169 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.91 vs. limit=15.0 2024-08-19 06:41:51,122 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4331790.0, ans=0.125 2024-08-19 06:41:57,039 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4331790.0, ans=0.95 2024-08-19 06:42:17,899 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 3100, loss[loss=0.1023, beats_loss=0.009119, ecapa_loss=0.000132, whisper_loss=0.09191, over 15911.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01045, ecapa_loss=0.0001408, whisper_loss=0.09099, over 3871090.81 frames. ], batch size: 62, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:42:23,063 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4331990.0, ans=0.1 2024-08-19 06:42:25,087 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 27 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-19 06:42:33,031 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4332090.0, ans=0.125 2024-08-19 06:42:39,102 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 19 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-19 06:42:53,218 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4332190.0, ans=0.0 2024-08-19 06:43:01,791 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.302e+01 2.496e+01 2.802e+01 1.017e+03, threshold=4.993e+01, percent-clipped=2.0 2024-08-19 06:43:38,786 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4332490.0, ans=0.1 2024-08-19 06:43:40,239 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 3150, loss[loss=0.1086, beats_loss=0.009711, ecapa_loss=0.0001677, whisper_loss=0.09726, over 14999.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01052, ecapa_loss=0.0001411, whisper_loss=0.09041, over 3847315.60 frames. ], batch size: 62, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:43:45,377 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4332490.0, ans=0.2 2024-08-19 06:43:52,715 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4332490.0, ans=0.1 2024-08-19 06:44:02,572 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.35 vs. limit=15.0 2024-08-19 06:44:08,781 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 19 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-19 06:44:20,131 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=4332690.0, ans=15.0 2024-08-19 06:44:25,807 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 15 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-19 06:44:41,915 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-19 06:44:51,243 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 17 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-19 06:44:55,515 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4332890.0, ans=0.0 2024-08-19 06:45:00,966 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 3200, loss[loss=0.1128, beats_loss=0.008264, ecapa_loss=0.0001391, whisper_loss=0.1032, over 15027.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01047, ecapa_loss=0.0001418, whisper_loss=0.09057, over 3797759.20 frames. ], batch size: 54, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:45:02,708 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4332990.0, ans=0.1 2024-08-19 06:45:07,167 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-19 06:45:42,803 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.260e+01 2.450e+01 2.731e+01 1.494e+02, threshold=4.900e+01, percent-clipped=1.0 2024-08-19 06:45:57,419 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.55 vs. limit=15.0 2024-08-19 06:46:19,765 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 3250, loss[loss=0.09126, beats_loss=0.01099, ecapa_loss=0.0001367, whisper_loss=0.0789, over 20463.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01049, ecapa_loss=0.0001419, whisper_loss=0.09052, over 3781444.17 frames. ], batch size: 80, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:46:30,379 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4333490.0, ans=0.0 2024-08-19 06:47:10,683 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 19 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-19 06:47:10,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4333790.0, ans=0.1 2024-08-19 06:47:18,821 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4333790.0, ans=0.125 2024-08-19 06:47:37,857 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 3300, loss[loss=0.07006, beats_loss=0.01405, ecapa_loss=0.0001295, whisper_loss=0.05471, over 19086.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01055, ecapa_loss=0.0001414, whisper_loss=0.08979, over 3793196.40 frames. ], batch size: 79, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:47:39,298 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-19 06:47:47,503 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4333990.0, ans=0.125 2024-08-19 06:47:52,279 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4334090.0, ans=0.0 2024-08-19 06:47:56,218 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4334090.0, ans=0.0 2024-08-19 06:47:57,785 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4334090.0, ans=0.0 2024-08-19 06:47:59,212 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4334090.0, ans=0.0 2024-08-19 06:48:03,395 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 22 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-19 06:48:18,347 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.382e+01 2.607e+01 2.876e+01 9.685e+01, threshold=5.214e+01, percent-clipped=1.0 2024-08-19 06:48:27,739 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-19 06:48:27,974 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4334290.0, ans=0.125 2024-08-19 06:48:30,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4334290.0, ans=0.0 2024-08-19 06:48:44,335 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-19 06:48:47,416 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4334390.0, ans=0.125 2024-08-19 06:48:52,873 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 3350, loss[loss=0.08896, beats_loss=0.01188, ecapa_loss=0.0001379, whisper_loss=0.07571, over 21369.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01056, ecapa_loss=0.0001412, whisper_loss=0.08873, over 3785274.18 frames. ], batch size: 90, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:48:53,588 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 15 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-19 06:48:55,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4334490.0, ans=0.125 2024-08-19 06:48:59,384 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-19 06:49:15,786 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 22 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-19 06:49:22,560 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.15 vs. limit=15.0 2024-08-19 06:49:27,041 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-19 06:49:57,261 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 38 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-19 06:49:58,967 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4334890.0, ans=0.125 2024-08-19 06:50:00,146 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4334890.0, ans=0.1 2024-08-19 06:50:05,002 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 3400, loss[loss=0.08269, beats_loss=0.01221, ecapa_loss=0.0001397, whisper_loss=0.06908, over 12988.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01053, ecapa_loss=0.0001418, whisper_loss=0.0892, over 3843167.33 frames. ], batch size: 54, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:50:17,160 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-19 06:50:38,652 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=16.08 vs. limit=15.0 2024-08-19 06:50:39,726 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.39 vs. limit=15.0 2024-08-19 06:50:43,281 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.739e+01 2.221e+01 2.426e+01 2.713e+01 1.019e+02, threshold=4.853e+01, percent-clipped=2.0 2024-08-19 06:50:51,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4335290.0, ans=0.125 2024-08-19 06:50:53,530 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.68 vs. limit=22.5 2024-08-19 06:50:54,359 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4335290.0, ans=0.0 2024-08-19 06:50:54,382 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=4335290.0, ans=0.1 2024-08-19 06:51:01,008 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 40 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-19 06:51:02,339 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-19 06:51:06,426 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-19 06:51:15,619 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 3450, loss[loss=0.09251, beats_loss=0.01163, ecapa_loss=0.000119, whisper_loss=0.07969, over 20103.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01046, ecapa_loss=0.0001418, whisper_loss=0.08961, over 3869243.68 frames. ], batch size: 80, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:51:22,156 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-19 06:51:28,640 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.25 vs. limit=22.5 2024-08-19 06:51:32,161 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-19 06:51:41,939 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-19 06:51:48,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4335690.0, ans=0.125 2024-08-19 06:51:51,751 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4335690.0, ans=0.125 2024-08-19 06:51:53,144 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 24 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-19 06:52:00,903 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 28 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-19 06:52:04,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4335790.0, ans=0.0 2024-08-19 06:52:08,699 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.21 vs. limit=15.0 2024-08-19 06:52:08,883 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.18 vs. limit=15.0 2024-08-19 06:52:14,794 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 21 from LS+wenet, 27 from Vox, 45 fro AS 2024-08-19 06:52:20,990 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4335890.0, ans=0.0 2024-08-19 06:52:22,932 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 3500, loss[loss=0.08659, beats_loss=0.01146, ecapa_loss=0.0001398, whisper_loss=0.07373, over 15054.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01048, ecapa_loss=0.0001426, whisper_loss=0.08958, over 3905532.09 frames. ], batch size: 63, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:52:31,404 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4335990.0, ans=0.125 2024-08-19 06:52:54,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4336190.0, ans=0.125 2024-08-19 06:52:57,036 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.291e+01 2.555e+01 2.847e+01 3.911e+01, threshold=5.110e+01, percent-clipped=0.0 2024-08-19 06:52:58,361 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 23 from LS+wenet, 19 from Vox, 13 fro AS 2024-08-19 06:53:07,837 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2024-08-19 06:53:25,306 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 3550, loss[loss=0.08351, beats_loss=0.01255, ecapa_loss=0.0001523, whisper_loss=0.06943, over 21392.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01041, ecapa_loss=0.0001424, whisper_loss=0.08949, over 3877148.86 frames. ], batch size: 93, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:53:28,051 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 22 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-19 06:53:31,883 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4336490.0, ans=0.125 2024-08-19 06:53:31,923 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4336490.0, ans=0.125 2024-08-19 06:53:36,782 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4336590.0, ans=0.0 2024-08-19 06:53:37,744 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-19 06:53:41,111 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 21 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-19 06:53:44,164 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.83 vs. limit=10.0 2024-08-19 06:53:52,587 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4336690.0, ans=0.125 2024-08-19 06:53:58,474 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 24 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-19 06:54:05,139 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4336790.0, ans=0.0 2024-08-19 06:54:08,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4336790.0, ans=0.125 2024-08-19 06:54:12,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4336790.0, ans=0.0 2024-08-19 06:54:13,180 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 16 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-19 06:54:19,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4336890.0, ans=0.0 2024-08-19 06:54:27,094 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 3600, loss[loss=0.1184, beats_loss=0.008096, ecapa_loss=0.0001527, whisper_loss=0.1087, over 22427.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0104, ecapa_loss=0.0001421, whisper_loss=0.09026, over 3894893.08 frames. ], batch size: 89, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:54:36,474 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.81 vs. limit=15.0 2024-08-19 06:54:37,212 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-19 06:54:42,866 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4337090.0, ans=0.1 2024-08-19 06:54:47,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4337090.0, ans=0.125 2024-08-19 06:54:47,937 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.46 vs. limit=22.5 2024-08-19 06:54:52,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4337190.0, ans=0.2 2024-08-19 06:55:00,627 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.265e+01 2.489e+01 2.858e+01 3.762e+01, threshold=4.979e+01, percent-clipped=0.0 2024-08-19 06:55:02,475 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4337190.0, ans=0.2 2024-08-19 06:55:03,700 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4337290.0, ans=0.125 2024-08-19 06:55:04,699 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 13 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-19 06:55:23,831 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4337390.0, ans=0.025 2024-08-19 06:55:29,295 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 3650, loss[loss=0.1149, beats_loss=0.008854, ecapa_loss=0.0001691, whisper_loss=0.1044, over 23121.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01048, ecapa_loss=0.0001416, whisper_loss=0.09026, over 3901094.40 frames. ], batch size: 94, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:55:33,111 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4337490.0, ans=0.1 2024-08-19 06:55:40,176 WARNING [optim.py:496] (2/4) Scaling gradients by 0.09651452302932739, model_norm_threshold=49.788700103759766 2024-08-19 06:55:40,338 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.29, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.699e+04, grad_sumsq=7.699e+04, orig_rms_sq=1.000e+00 2024-08-19 06:55:40,461 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-19 06:55:52,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4337690.0, ans=0.0 2024-08-19 06:55:58,282 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4337690.0, ans=0.125 2024-08-19 06:56:10,406 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.54 vs. limit=22.5 2024-08-19 06:56:21,467 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4337890.0, ans=0.0 2024-08-19 06:56:32,218 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 3700, loss[loss=0.1007, beats_loss=0.009969, ecapa_loss=0.0001189, whisper_loss=0.08957, over 18610.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01046, ecapa_loss=0.0001425, whisper_loss=0.09042, over 3893549.55 frames. ], batch size: 70, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:56:32,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4337990.0, ans=0.1 2024-08-19 06:56:33,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4337990.0, ans=0.125 2024-08-19 06:56:36,669 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=4337990.0, ans=15.0 2024-08-19 06:56:46,395 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4338090.0, ans=0.07 2024-08-19 06:56:47,348 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-19 06:56:47,594 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4338090.0, ans=0.125 2024-08-19 06:56:56,818 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-19 06:56:57,496 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.44 vs. limit=15.0 2024-08-19 06:57:05,625 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.332e+01 2.586e+01 3.000e+01 5.159e+02, threshold=5.172e+01, percent-clipped=5.0 2024-08-19 06:57:06,972 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 16 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-19 06:57:09,562 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-19 06:57:30,928 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4338390.0, ans=0.1 2024-08-19 06:57:34,160 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 3750, loss[loss=0.09177, beats_loss=0.01152, ecapa_loss=0.000144, whisper_loss=0.07881, over 17083.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01048, ecapa_loss=0.0001426, whisper_loss=0.09044, over 3883544.38 frames. ], batch size: 69, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:57:50,029 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 21 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-19 06:57:50,406 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4338590.0, ans=0.0 2024-08-19 06:58:06,503 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 18 from LS+wenet, 35 from Vox, 35 fro AS 2024-08-19 06:58:11,506 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4338790.0, ans=0.2 2024-08-19 06:58:23,009 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4338890.0, ans=0.2 2024-08-19 06:58:27,315 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 17 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-19 06:58:35,234 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4338990.0, ans=0.04949747468305833 2024-08-19 06:58:35,918 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 3800, loss[loss=0.1031, beats_loss=0.01161, ecapa_loss=0.0001204, whisper_loss=0.0903, over 23198.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01059, ecapa_loss=0.0001411, whisper_loss=0.08976, over 3896991.49 frames. ], batch size: 93, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:58:41,053 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 21 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-19 06:58:44,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4338990.0, ans=0.0 2024-08-19 06:58:57,601 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4339090.0, ans=0.2 2024-08-19 06:59:00,518 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.90 vs. limit=15.0 2024-08-19 06:59:03,319 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-19 06:59:09,297 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.974e+01 2.289e+01 2.536e+01 2.898e+01 5.473e+01, threshold=5.073e+01, percent-clipped=1.0 2024-08-19 06:59:16,938 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-19 06:59:20,059 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.48 vs. limit=15.0 2024-08-19 06:59:24,300 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 22 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-19 06:59:34,945 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.94 vs. limit=15.0 2024-08-19 06:59:37,789 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 3850, loss[loss=0.0776, beats_loss=0.01372, ecapa_loss=0.0001263, whisper_loss=0.06262, over 16191.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01057, ecapa_loss=0.0001418, whisper_loss=0.08954, over 3880771.45 frames. ], batch size: 67, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:59:57,033 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.04 vs. limit=15.0 2024-08-19 07:00:02,525 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-19 07:00:04,029 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4339690.0, ans=0.0 2024-08-19 07:00:11,396 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4339690.0, ans=0.125 2024-08-19 07:00:19,557 INFO [train_multi_KD3.py:844] (2/4) A total of 97 cuts. 26 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-19 07:00:21,064 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4339790.0, ans=0.07 2024-08-19 07:00:32,712 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.47 vs. limit=15.0 2024-08-19 07:00:34,617 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 26 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-19 07:00:39,316 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 3900, loss[loss=0.1118, beats_loss=0.01029, ecapa_loss=0.0001931, whisper_loss=0.09956, over 16717.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01057, ecapa_loss=0.0001416, whisper_loss=0.08941, over 3896059.03 frames. ], batch size: 71, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:00:45,759 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-19 07:00:53,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4340090.0, ans=0.0 2024-08-19 07:01:05,538 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-19 07:01:07,019 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4340190.0, ans=0.0 2024-08-19 07:01:13,020 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.932e+01 2.284e+01 2.480e+01 2.767e+01 3.650e+01, threshold=4.959e+01, percent-clipped=0.0 2024-08-19 07:01:16,035 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.79 vs. limit=15.0 2024-08-19 07:01:18,634 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.94 vs. limit=15.0 2024-08-19 07:01:20,643 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4340290.0, ans=0.0 2024-08-19 07:01:28,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4340390.0, ans=0.0 2024-08-19 07:01:37,906 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4340390.0, ans=0.125 2024-08-19 07:01:41,229 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 3950, loss[loss=0.09446, beats_loss=0.01213, ecapa_loss=0.0001428, whisper_loss=0.0809, over 22532.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01048, ecapa_loss=0.0001425, whisper_loss=0.08999, over 3904792.58 frames. ], batch size: 90, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:01:43,883 WARNING [optim.py:496] (2/4) Scaling gradients by 0.08397600054740906, model_norm_threshold=49.59146499633789 2024-08-19 07:01:44,044 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.26, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=9.206e+04, grad_sumsq=8.842e+06, orig_rms_sq=1.041e-02 2024-08-19 07:01:45,599 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4340490.0, ans=0.1 2024-08-19 07:01:48,091 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 21 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-19 07:01:56,601 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 13 from LS+wenet, 12 from Vox, 39 fro AS 2024-08-19 07:01:59,228 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=15.0 2024-08-19 07:02:03,606 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 23 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-19 07:02:21,613 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4340790.0, ans=0.125 2024-08-19 07:02:41,506 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-19 07:02:43,738 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 4000, loss[loss=0.08885, beats_loss=0.01022, ecapa_loss=0.0001844, whisper_loss=0.07679, over 14134.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01045, ecapa_loss=0.0001433, whisper_loss=0.09033, over 3900075.02 frames. ], batch size: 59, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:03:07,732 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4341190.0, ans=0.125 2024-08-19 07:03:09,317 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.93 vs. limit=15.0 2024-08-19 07:03:10,207 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4341190.0, ans=0.125 2024-08-19 07:03:10,501 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.27 vs. limit=15.0 2024-08-19 07:03:12,360 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-19 07:03:17,339 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.399e+01 2.689e+01 3.054e+01 5.905e+02, threshold=5.377e+01, percent-clipped=2.0 2024-08-19 07:03:42,601 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.70 vs. limit=15.0 2024-08-19 07:03:46,332 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 4050, loss[loss=0.09567, beats_loss=0.0107, ecapa_loss=0.0001532, whisper_loss=0.08344, over 18363.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01053, ecapa_loss=0.0001425, whisper_loss=0.09015, over 3898954.85 frames. ], batch size: 78, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:03:47,906 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4341490.0, ans=0.125 2024-08-19 07:03:50,104 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 19 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-19 07:03:51,287 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 26 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-19 07:04:01,648 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 29 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-19 07:04:12,316 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-19 07:04:13,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4341690.0, ans=0.0 2024-08-19 07:04:27,768 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.329e+01 2024-08-19 07:04:43,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4341890.0, ans=0.2 2024-08-19 07:04:46,943 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.14 vs. limit=15.0 2024-08-19 07:04:48,534 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 4100, loss[loss=0.1239, beats_loss=0.009632, ecapa_loss=0.0001141, whisper_loss=0.1132, over 24604.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01049, ecapa_loss=0.0001426, whisper_loss=0.09041, over 3894986.94 frames. ], batch size: 91, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:04:52,435 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 21 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-19 07:05:02,480 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-19 07:05:21,935 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.730e+01 2.245e+01 2.633e+01 2.897e+01 5.694e+01, threshold=5.267e+01, percent-clipped=1.0 2024-08-19 07:05:32,807 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4342290.0, ans=0.05 2024-08-19 07:05:36,749 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4342290.0, ans=0.2 2024-08-19 07:05:40,795 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.70 vs. limit=22.5 2024-08-19 07:05:43,642 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 24 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-19 07:05:49,033 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4342390.0, ans=0.0 2024-08-19 07:05:49,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4342390.0, ans=0.125 2024-08-19 07:05:51,374 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 4150, loss[loss=0.1188, beats_loss=0.01032, ecapa_loss=0.0001238, whisper_loss=0.1072, over 23788.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01047, ecapa_loss=0.0001432, whisper_loss=0.09034, over 3882250.75 frames. ], batch size: 93, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:06:09,898 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2024-08-19 07:06:17,061 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4342690.0, ans=0.0 2024-08-19 07:06:19,092 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-19 07:06:30,906 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4342790.0, ans=0.1 2024-08-19 07:06:34,732 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4342790.0, ans=0.2 2024-08-19 07:06:54,598 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 4200, loss[loss=0.1137, beats_loss=0.008975, ecapa_loss=0.0001574, whisper_loss=0.1032, over 21976.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01035, ecapa_loss=0.0001441, whisper_loss=0.09114, over 3913497.87 frames. ], batch size: 90, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:06:55,262 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.89 vs. limit=6.0 2024-08-19 07:07:13,019 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.67 vs. limit=22.5 2024-08-19 07:07:23,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4343190.0, ans=0.125 2024-08-19 07:07:28,654 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.261e+01 2.591e+01 2.854e+01 3.492e+01, threshold=5.183e+01, percent-clipped=0.0 2024-08-19 07:07:29,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4343190.0, ans=0.0 2024-08-19 07:07:34,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4343290.0, ans=0.05 2024-08-19 07:07:36,728 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2024-08-19 07:07:45,686 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.92 vs. limit=15.0 2024-08-19 07:07:53,461 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=4343390.0, ans=15.0 2024-08-19 07:07:57,617 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 4250, loss[loss=0.1155, beats_loss=0.009612, ecapa_loss=0.0001561, whisper_loss=0.1043, over 21672.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01041, ecapa_loss=0.000143, whisper_loss=0.09035, over 3930098.52 frames. ], batch size: 89, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:07:58,067 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4343490.0, ans=0.2 2024-08-19 07:07:59,018 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 18 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-19 07:08:02,208 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.49 vs. limit=15.0 2024-08-19 07:08:03,074 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4343490.0, ans=0.125 2024-08-19 07:08:10,565 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4343590.0, ans=0.035 2024-08-19 07:08:16,622 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4343590.0, ans=0.2 2024-08-19 07:08:20,393 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4343590.0, ans=0.1 2024-08-19 07:08:28,886 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 17 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-19 07:08:30,419 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4343690.0, ans=0.2 2024-08-19 07:08:38,931 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4343790.0, ans=0.125 2024-08-19 07:08:45,429 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=10.52 vs. limit=10.0 2024-08-19 07:08:48,827 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4343890.0, ans=0.2 2024-08-19 07:08:54,898 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4343890.0, ans=0.0 2024-08-19 07:08:59,406 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 4300, loss[loss=0.114, beats_loss=0.007778, ecapa_loss=0.0001464, whisper_loss=0.1048, over 15753.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01047, ecapa_loss=0.0001424, whisper_loss=0.08921, over 3879054.49 frames. ], batch size: 60, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:08:59,944 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4343990.0, ans=0.125 2024-08-19 07:09:09,943 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 21 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-19 07:09:33,362 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.222e+01 2.440e+01 2.788e+01 3.909e+01, threshold=4.880e+01, percent-clipped=0.0 2024-08-19 07:09:33,823 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4344190.0, ans=0.0 2024-08-19 07:09:38,468 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4344290.0, ans=0.125 2024-08-19 07:09:44,659 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 16 from Vox, 49 fro AS 2024-08-19 07:09:55,127 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.22 vs. limit=15.0 2024-08-19 07:09:59,457 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-19 07:10:01,977 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 4350, loss[loss=0.1012, beats_loss=0.00927, ecapa_loss=0.0001509, whisper_loss=0.09047, over 21689.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01046, ecapa_loss=0.0001438, whisper_loss=0.08889, over 3883444.54 frames. ], batch size: 89, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:10:07,371 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4344490.0, ans=0.125 2024-08-19 07:10:11,083 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4344490.0, ans=0.2 2024-08-19 07:10:28,699 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4344690.0, ans=0.1 2024-08-19 07:10:30,980 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4344690.0, ans=0.0 2024-08-19 07:10:31,014 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4344690.0, ans=0.0 2024-08-19 07:10:34,436 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-19 07:10:43,688 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4344790.0, ans=0.125 2024-08-19 07:10:54,804 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4344890.0, ans=0.125 2024-08-19 07:10:54,817 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4344890.0, ans=0.2 2024-08-19 07:10:55,253 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.33 vs. limit=15.0 2024-08-19 07:10:55,676 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-19 07:11:05,011 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 4400, loss[loss=0.108, beats_loss=0.01014, ecapa_loss=0.0001422, whisper_loss=0.09645, over 14075.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01046, ecapa_loss=0.0001433, whisper_loss=0.0893, over 3860497.86 frames. ], batch size: 55, lr: 2.05e-03, grad_scale: 1.152921504606847e+18 2024-08-19 07:11:07,817 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4344990.0, ans=0.0 2024-08-19 07:11:11,330 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4344990.0, ans=0.0 2024-08-19 07:11:17,426 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4345090.0, ans=0.125 2024-08-19 07:11:22,971 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4345090.0, ans=0.2 2024-08-19 07:11:28,061 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.52 vs. limit=15.0 2024-08-19 07:11:31,289 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4345190.0, ans=0.125 2024-08-19 07:11:36,302 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.77 vs. limit=12.0 2024-08-19 07:11:39,294 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.205e+01 2.466e+01 2.774e+01 4.446e+01, threshold=4.932e+01, percent-clipped=0.0 2024-08-19 07:11:43,595 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4345290.0, ans=0.125 2024-08-19 07:11:58,366 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4345390.0, ans=0.0 2024-08-19 07:12:06,638 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 4450, loss[loss=0.1031, beats_loss=0.01156, ecapa_loss=0.0001257, whisper_loss=0.09026, over 19384.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01046, ecapa_loss=0.0001417, whisper_loss=0.08996, over 3889258.50 frames. ], batch size: 78, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:12:14,703 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=4345490.0, ans=0.02 2024-08-19 07:12:16,381 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.48 vs. limit=15.0 2024-08-19 07:12:21,982 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 20 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-19 07:12:26,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4345590.0, ans=0.0 2024-08-19 07:12:33,528 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 22 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-19 07:12:33,827 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4345690.0, ans=0.125 2024-08-19 07:12:42,797 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.40 vs. limit=15.0 2024-08-19 07:12:45,108 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4345790.0, ans=0.2 2024-08-19 07:13:01,133 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 27 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-19 07:13:09,844 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 4500, loss[loss=0.07991, beats_loss=0.01178, ecapa_loss=0.0001535, whisper_loss=0.06659, over 22551.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01048, ecapa_loss=0.0001418, whisper_loss=0.08892, over 3871226.88 frames. ], batch size: 96, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:13:09,953 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-19 07:13:30,724 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4346090.0, ans=0.0 2024-08-19 07:13:31,638 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 24 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-19 07:13:32,141 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.23 vs. limit=12.0 2024-08-19 07:13:33,097 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 14 from LS+wenet, 29 from Vox, 25 fro AS 2024-08-19 07:13:33,302 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4346090.0, ans=0.2 2024-08-19 07:13:34,666 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4346190.0, ans=0.125 2024-08-19 07:13:43,130 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 25 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-19 07:13:45,333 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.773e+01 2.205e+01 2.468e+01 2.775e+01 4.472e+01, threshold=4.935e+01, percent-clipped=0.0 2024-08-19 07:13:59,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4346390.0, ans=0.1 2024-08-19 07:14:06,129 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.28 vs. limit=22.5 2024-08-19 07:14:13,005 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 4550, loss[loss=0.1096, beats_loss=0.01005, ecapa_loss=0.0001596, whisper_loss=0.09797, over 20173.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01047, ecapa_loss=0.0001429, whisper_loss=0.08864, over 3871072.40 frames. ], batch size: 80, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:14:18,279 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 26 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-19 07:14:24,083 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-19 07:14:34,394 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 30 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-19 07:14:35,808 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4346590.0, ans=0.125 2024-08-19 07:14:50,774 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4346790.0, ans=0.125 2024-08-19 07:15:15,500 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 4600, loss[loss=0.1032, beats_loss=0.01044, ecapa_loss=0.000126, whisper_loss=0.09153, over 18861.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01048, ecapa_loss=0.0001418, whisper_loss=0.08858, over 3884479.99 frames. ], batch size: 75, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:15:18,344 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4346990.0, ans=0.125 2024-08-19 07:15:23,328 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4346990.0, ans=0.0 2024-08-19 07:15:24,235 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-19 07:15:28,323 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4347090.0, ans=0.0 2024-08-19 07:15:31,114 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4347090.0, ans=0.0 2024-08-19 07:15:33,564 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4347090.0, ans=0.1 2024-08-19 07:15:44,741 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.41 vs. limit=15.0 2024-08-19 07:15:45,523 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-19 07:15:48,223 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 20 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-19 07:15:50,493 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.976e+01 2.323e+01 2.576e+01 2.927e+01 9.021e+01, threshold=5.152e+01, percent-clipped=3.0 2024-08-19 07:15:52,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4347290.0, ans=0.0 2024-08-19 07:16:06,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4347390.0, ans=0.125 2024-08-19 07:16:08,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4347390.0, ans=0.2 2024-08-19 07:16:17,704 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 4650, loss[loss=0.1056, beats_loss=0.00832, ecapa_loss=0.0001783, whisper_loss=0.09547, over 18731.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01045, ecapa_loss=0.0001424, whisper_loss=0.08913, over 3884114.89 frames. ], batch size: 76, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:16:33,346 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4347590.0, ans=0.125 2024-08-19 07:16:37,748 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 21 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-19 07:16:59,022 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4347790.0, ans=0.125 2024-08-19 07:16:59,149 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.69 vs. limit=15.0 2024-08-19 07:17:01,323 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4347790.0, ans=0.1 2024-08-19 07:17:10,589 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.21 vs. limit=6.0 2024-08-19 07:17:17,435 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-19 07:17:19,636 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 4700, loss[loss=0.08375, beats_loss=0.01156, ecapa_loss=0.0001646, whisper_loss=0.07055, over 18400.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01049, ecapa_loss=0.0001428, whisper_loss=0.08963, over 3876459.57 frames. ], batch size: 78, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:17:19,800 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-19 07:17:25,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4347990.0, ans=0.125 2024-08-19 07:17:30,011 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 16 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-19 07:17:34,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4348090.0, ans=0.2 2024-08-19 07:17:35,238 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.18 vs. limit=22.5 2024-08-19 07:17:37,350 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 22 from LS+wenet, 32 from Vox, 33 fro AS 2024-08-19 07:17:41,399 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4348090.0, ans=0.125 2024-08-19 07:17:45,149 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.44 vs. limit=15.0 2024-08-19 07:17:50,920 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 27 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-19 07:17:52,142 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-19 07:17:54,430 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.376e+01 2.578e+01 2.929e+01 1.160e+02, threshold=5.156e+01, percent-clipped=1.0 2024-08-19 07:17:54,565 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 28 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-19 07:17:57,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4348290.0, ans=0.125 2024-08-19 07:17:57,543 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4348290.0, ans=0.0 2024-08-19 07:18:05,807 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 17 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-19 07:18:08,619 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 26 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-19 07:18:10,135 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4348390.0, ans=0.2 2024-08-19 07:18:11,491 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4348390.0, ans=0.2 2024-08-19 07:18:21,948 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 4750, loss[loss=0.09368, beats_loss=0.01029, ecapa_loss=0.000166, whisper_loss=0.08172, over 19489.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01049, ecapa_loss=0.0001431, whisper_loss=0.08946, over 3896760.47 frames. ], batch size: 83, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:18:30,299 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4348490.0, ans=0.0 2024-08-19 07:18:31,228 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-19 07:18:33,845 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 22 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-19 07:18:35,379 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4348590.0, ans=0.125 2024-08-19 07:18:36,385 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-19 07:18:57,344 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4348690.0, ans=0.0 2024-08-19 07:19:03,699 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.30 vs. limit=15.0 2024-08-19 07:19:05,747 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4348790.0, ans=0.95 2024-08-19 07:19:11,670 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-19 07:19:23,670 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 4800, loss[loss=0.09155, beats_loss=0.01269, ecapa_loss=0.0001296, whisper_loss=0.07757, over 21581.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01051, ecapa_loss=0.0001431, whisper_loss=0.0888, over 3880920.96 frames. ], batch size: 87, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:19:32,823 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 22 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-19 07:19:47,119 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.15 vs. limit=15.0 2024-08-19 07:19:52,820 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-19 07:19:58,113 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.63 vs. limit=15.0 2024-08-19 07:19:58,675 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.754e+01 2.397e+01 2.595e+01 2.947e+01 3.968e+01, threshold=5.190e+01, percent-clipped=1.0 2024-08-19 07:19:59,073 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4349190.0, ans=0.2 2024-08-19 07:20:07,607 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-19 07:20:10,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4349290.0, ans=0.015 2024-08-19 07:20:13,079 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 22 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-19 07:20:15,688 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4349390.0, ans=0.125 2024-08-19 07:20:22,059 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4349390.0, ans=0.125 2024-08-19 07:20:26,499 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 4850, loss[loss=0.1086, beats_loss=0.01208, ecapa_loss=0.0001731, whisper_loss=0.09483, over 16268.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01051, ecapa_loss=0.0001433, whisper_loss=0.08861, over 3858072.42 frames. ], batch size: 66, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:20:29,189 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 18 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-19 07:20:30,438 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 29 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-19 07:20:44,641 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-19 07:21:18,923 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 31 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-19 07:21:30,438 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 4900, loss[loss=0.09845, beats_loss=0.01029, ecapa_loss=0.0001432, whisper_loss=0.08673, over 20471.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01049, ecapa_loss=0.0001438, whisper_loss=0.08891, over 3859763.83 frames. ], batch size: 86, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:21:32,363 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4349990.0, ans=0.0 2024-08-19 07:21:44,467 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 15 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-19 07:21:48,954 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.63 vs. limit=15.0 2024-08-19 07:22:01,494 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 29 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-19 07:22:06,629 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.310e+01 2.480e+01 2.749e+01 3.874e+01, threshold=4.961e+01, percent-clipped=0.0 2024-08-19 07:22:07,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4350190.0, ans=0.2 2024-08-19 07:22:21,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4350390.0, ans=0.125 2024-08-19 07:22:21,641 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4350390.0, ans=0.09899494936611666 2024-08-19 07:22:22,542 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-19 07:22:28,181 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.44 vs. limit=22.5 2024-08-19 07:22:30,649 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4350390.0, ans=0.125 2024-08-19 07:22:35,134 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 4950, loss[loss=0.1194, beats_loss=0.00899, ecapa_loss=0.0001564, whisper_loss=0.1089, over 22049.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01047, ecapa_loss=0.0001429, whisper_loss=0.08848, over 3853258.45 frames. ], batch size: 88, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:22:37,912 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-19 07:22:39,088 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 19 from LS+wenet, 30 from Vox, 22 fro AS 2024-08-19 07:22:51,444 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.14 vs. limit=15.0 2024-08-19 07:22:53,314 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 18 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-19 07:23:00,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4350690.0, ans=0.1 2024-08-19 07:23:05,990 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-19 07:23:11,611 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2024-08-19 07:23:19,006 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-19 07:23:32,688 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 13 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-19 07:23:41,462 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 5000, loss[loss=0.1013, beats_loss=0.01269, ecapa_loss=0.0001597, whisper_loss=0.08703, over 21041.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01054, ecapa_loss=0.0001423, whisper_loss=0.08834, over 3813891.05 frames. ], batch size: 89, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:23:45,620 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-19 07:23:47,609 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.11 vs. limit=10.0 2024-08-19 07:23:57,433 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.780e+01 2024-08-19 07:24:09,970 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn2.whiten.whitening_limit, batch_count=4351190.0, ans=22.5 2024-08-19 07:24:18,093 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.749e+01 2.255e+01 2.539e+01 2.731e+01 4.622e+01, threshold=5.077e+01, percent-clipped=0.0 2024-08-19 07:24:28,048 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 17 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-19 07:24:48,471 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 5050, loss[loss=0.09739, beats_loss=0.01142, ecapa_loss=0.000109, whisper_loss=0.08487, over 14274.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01057, ecapa_loss=0.0001426, whisper_loss=0.08869, over 3830943.37 frames. ], batch size: 54, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:25:01,458 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4351590.0, ans=0.1 2024-08-19 07:25:07,805 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4351590.0, ans=0.1 2024-08-19 07:25:24,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4351690.0, ans=0.125 2024-08-19 07:25:25,390 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 21 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-19 07:25:29,700 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4351790.0, ans=0.125 2024-08-19 07:25:33,851 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-19 07:25:43,255 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 31 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-19 07:25:49,268 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.53 vs. limit=22.5 2024-08-19 07:25:51,297 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 13 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-19 07:25:57,939 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 5100, loss[loss=0.1156, beats_loss=0.009438, ecapa_loss=0.0001226, whisper_loss=0.1049, over 16399.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01058, ecapa_loss=0.0001422, whisper_loss=0.08953, over 3872387.60 frames. ], batch size: 60, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:26:22,353 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-19 07:26:23,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4352190.0, ans=0.1 2024-08-19 07:26:30,913 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 18 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-19 07:26:33,202 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.383e+01 2.590e+01 2.899e+01 2.370e+02, threshold=5.180e+01, percent-clipped=1.0 2024-08-19 07:27:01,060 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 5150, loss[loss=0.1068, beats_loss=0.01171, ecapa_loss=0.0001196, whisper_loss=0.09389, over 22530.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01056, ecapa_loss=0.0001417, whisper_loss=0.09005, over 3926301.55 frames. ], batch size: 86, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:27:18,712 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4352590.0, ans=0.1 2024-08-19 07:27:18,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4352590.0, ans=0.0 2024-08-19 07:27:42,301 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4352790.0, ans=0.2 2024-08-19 07:27:44,869 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.32 vs. limit=6.0 2024-08-19 07:27:49,755 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4352890.0, ans=0.025 2024-08-19 07:27:55,831 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-19 07:28:03,162 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 5200, loss[loss=0.1073, beats_loss=0.01135, ecapa_loss=0.0001206, whisper_loss=0.09473, over 20816.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01059, ecapa_loss=0.0001403, whisper_loss=0.08933, over 3909523.54 frames. ], batch size: 79, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:28:03,634 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4352990.0, ans=0.1 2024-08-19 07:28:22,483 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4353090.0, ans=0.0 2024-08-19 07:28:26,103 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4353090.0, ans=0.125 2024-08-19 07:28:27,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4353190.0, ans=0.125 2024-08-19 07:28:34,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4353190.0, ans=0.125 2024-08-19 07:28:37,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4353190.0, ans=0.125 2024-08-19 07:28:38,980 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.425e+01 2.708e+01 3.005e+01 4.495e+01, threshold=5.416e+01, percent-clipped=0.0 2024-08-19 07:28:44,075 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 19 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-19 07:28:45,235 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 14 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-19 07:28:49,128 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 24 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-19 07:29:04,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=4353390.0, ans=0.05 2024-08-19 07:29:06,545 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 5250, loss[loss=0.1258, beats_loss=0.009862, ecapa_loss=0.000105, whisper_loss=0.1149, over 20457.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01055, ecapa_loss=0.0001408, whisper_loss=0.08948, over 3885216.32 frames. ], batch size: 76, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:29:13,342 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 27 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-19 07:29:18,391 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 28 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-19 07:29:23,768 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.54 vs. limit=15.0 2024-08-19 07:29:28,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4353590.0, ans=0.025 2024-08-19 07:29:33,472 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.36 vs. limit=22.5 2024-08-19 07:29:38,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4353690.0, ans=0.125 2024-08-19 07:29:38,477 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4353690.0, ans=0.07 2024-08-19 07:29:56,069 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.50 vs. limit=15.0 2024-08-19 07:30:05,006 WARNING [optim.py:496] (2/4) Scaling gradients by 0.035648033022880554, model_norm_threshold=54.15937423706055 2024-08-19 07:30:05,170 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.2.out_combiner.bypass_scale with proportion 0.11, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.460e+05, grad_sumsq=4.288e+05, orig_rms_sq=5.737e-01 2024-08-19 07:30:06,904 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 07:30:08,796 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 5300, loss[loss=0.1114, beats_loss=0.008547, ecapa_loss=0.0001293, whisper_loss=0.1016, over 22931.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01046, ecapa_loss=0.0001406, whisper_loss=0.09034, over 3863166.29 frames. ], batch size: 87, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:30:11,741 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4353990.0, ans=0.125 2024-08-19 07:30:13,463 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.81 vs. limit=15.0 2024-08-19 07:30:18,947 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 14 from Vox, 44 fro AS 2024-08-19 07:30:21,357 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 19 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-19 07:30:22,527 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-19 07:30:36,454 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.78 vs. limit=22.5 2024-08-19 07:30:43,235 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.382e+01 2.697e+01 3.037e+01 1.519e+03, threshold=5.395e+01, percent-clipped=1.0 2024-08-19 07:30:53,013 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=4354290.0, ans=10.0 2024-08-19 07:31:11,095 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 5350, loss[loss=0.1201, beats_loss=0.008429, ecapa_loss=0.0001394, whisper_loss=0.1103, over 20029.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01036, ecapa_loss=0.0001419, whisper_loss=0.09066, over 3861793.25 frames. ], batch size: 77, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:31:15,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=4354490.0, ans=0.05 2024-08-19 07:31:45,583 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4354690.0, ans=0.125 2024-08-19 07:31:54,676 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.64 vs. limit=12.0 2024-08-19 07:31:55,188 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 13 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-19 07:32:09,281 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 21 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-19 07:32:14,143 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 5400, loss[loss=0.09758, beats_loss=0.01278, ecapa_loss=9.394e-05, whisper_loss=0.08386, over 21684.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01039, ecapa_loss=0.0001414, whisper_loss=0.09044, over 3856404.28 frames. ], batch size: 83, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:32:27,447 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.64 vs. limit=15.0 2024-08-19 07:32:29,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4355090.0, ans=0.125 2024-08-19 07:32:30,877 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4355090.0, ans=0.1 2024-08-19 07:32:36,178 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.04 vs. limit=15.0 2024-08-19 07:32:47,474 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.17 vs. limit=15.0 2024-08-19 07:32:49,114 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.281e+01 2.508e+01 2.776e+01 3.569e+02, threshold=5.016e+01, percent-clipped=2.0 2024-08-19 07:32:59,189 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-19 07:33:00,639 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4355290.0, ans=0.05 2024-08-19 07:33:09,251 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-19 07:33:09,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4355390.0, ans=0.125 2024-08-19 07:33:16,258 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 5450, loss[loss=0.1021, beats_loss=0.01215, ecapa_loss=0.0001033, whisper_loss=0.08889, over 18443.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0104, ecapa_loss=0.0001414, whisper_loss=0.09083, over 3877166.56 frames. ], batch size: 72, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:33:20,824 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4355490.0, ans=0.0 2024-08-19 07:33:26,920 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-19 07:33:34,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4355590.0, ans=0.2 2024-08-19 07:33:36,709 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 26 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-19 07:33:44,023 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.45 vs. limit=10.0 2024-08-19 07:33:54,373 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4355790.0, ans=0.035 2024-08-19 07:33:56,685 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 16 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-19 07:33:58,010 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-19 07:33:59,543 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4355790.0, ans=0.125 2024-08-19 07:34:09,166 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-19 07:34:19,219 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 5500, loss[loss=0.1056, beats_loss=0.00834, ecapa_loss=0.0001583, whisper_loss=0.0957, over 17092.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01034, ecapa_loss=0.000142, whisper_loss=0.09089, over 3892847.94 frames. ], batch size: 68, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:34:35,837 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4356090.0, ans=0.125 2024-08-19 07:34:53,918 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.332e+01 2.514e+01 2.807e+01 3.996e+01, threshold=5.028e+01, percent-clipped=0.0 2024-08-19 07:34:58,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4356290.0, ans=0.1 2024-08-19 07:34:58,611 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.33 vs. limit=22.5 2024-08-19 07:35:18,084 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 23 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-19 07:35:21,822 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 5550, loss[loss=0.09257, beats_loss=0.01242, ecapa_loss=0.0001286, whisper_loss=0.07886, over 22436.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01052, ecapa_loss=0.0001409, whisper_loss=0.09118, over 3920246.74 frames. ], batch size: 89, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:35:36,207 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.03 vs. limit=22.5 2024-08-19 07:35:46,691 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4356690.0, ans=0.125 2024-08-19 07:35:48,976 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4356690.0, ans=0.0 2024-08-19 07:35:51,631 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4356690.0, ans=0.125 2024-08-19 07:35:56,902 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4356690.0, ans=0.125 2024-08-19 07:35:58,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4356790.0, ans=0.2 2024-08-19 07:36:06,727 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4356790.0, ans=0.04949747468305833 2024-08-19 07:36:13,648 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.36 vs. limit=15.0 2024-08-19 07:36:14,340 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4356890.0, ans=0.0 2024-08-19 07:36:18,619 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.42 vs. limit=15.0 2024-08-19 07:36:20,660 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4356890.0, ans=0.125 2024-08-19 07:36:23,778 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 5600, loss[loss=0.09897, beats_loss=0.009801, ecapa_loss=0.0001735, whisper_loss=0.08743, over 22039.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01051, ecapa_loss=0.0001407, whisper_loss=0.09146, over 3948535.17 frames. ], batch size: 89, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:36:29,173 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4356990.0, ans=0.0 2024-08-19 07:36:31,338 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 29 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-19 07:36:32,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4356990.0, ans=0.125 2024-08-19 07:36:39,224 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4357090.0, ans=0.1 2024-08-19 07:36:53,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4357190.0, ans=0.125 2024-08-19 07:36:58,537 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.958e+01 2.345e+01 2.547e+01 2.734e+01 3.198e+02, threshold=5.093e+01, percent-clipped=3.0 2024-08-19 07:37:00,394 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.58 vs. limit=15.0 2024-08-19 07:37:01,169 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-19 07:37:08,623 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-19 07:37:10,902 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 22 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-19 07:37:25,627 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 5650, loss[loss=0.09263, beats_loss=0.01118, ecapa_loss=0.0001237, whisper_loss=0.08022, over 20397.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01055, ecapa_loss=0.0001405, whisper_loss=0.09013, over 3925772.70 frames. ], batch size: 82, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:37:27,410 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4357490.0, ans=0.125 2024-08-19 07:37:27,732 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.28 vs. limit=15.0 2024-08-19 07:37:28,278 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 23 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-19 07:37:28,566 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 07:37:29,473 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 29 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-19 07:37:30,647 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-19 07:37:39,089 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-19 07:37:46,086 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.01 vs. limit=6.0 2024-08-19 07:37:57,710 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 29 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-19 07:38:00,554 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4357690.0, ans=0.125 2024-08-19 07:38:08,387 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=11.18 vs. limit=12.0 2024-08-19 07:38:13,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4357890.0, ans=0.1 2024-08-19 07:38:25,218 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4357890.0, ans=0.125 2024-08-19 07:38:27,339 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 5700, loss[loss=0.1138, beats_loss=0.009841, ecapa_loss=0.0001347, whisper_loss=0.1026, over 19926.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01056, ecapa_loss=0.0001411, whisper_loss=0.09021, over 3968597.07 frames. ], batch size: 78, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:38:34,956 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 22 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-19 07:38:57,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4358190.0, ans=0.2 2024-08-19 07:38:58,486 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-19 07:39:02,166 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.849e+01 2.294e+01 2.534e+01 2.806e+01 5.396e+01, threshold=5.067e+01, percent-clipped=1.0 2024-08-19 07:39:13,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4358290.0, ans=0.125 2024-08-19 07:39:16,556 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.49 vs. limit=15.0 2024-08-19 07:39:19,977 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-19 07:39:29,531 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 5750, loss[loss=0.1115, beats_loss=0.008658, ecapa_loss=0.0001674, whisper_loss=0.1011, over 18139.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0105, ecapa_loss=0.0001415, whisper_loss=0.09068, over 3948915.22 frames. ], batch size: 74, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:39:34,821 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 18 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-19 07:39:39,875 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 22 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-19 07:40:00,187 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4358690.0, ans=0.125 2024-08-19 07:40:12,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4358790.0, ans=0.035 2024-08-19 07:40:15,932 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.58 vs. limit=15.0 2024-08-19 07:40:32,179 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 5800, loss[loss=0.1087, beats_loss=0.008943, ecapa_loss=0.0001321, whisper_loss=0.09844, over 17993.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01049, ecapa_loss=0.0001424, whisper_loss=0.09034, over 3910942.89 frames. ], batch size: 69, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:40:35,998 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 32 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-19 07:40:49,060 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.25 vs. limit=15.0 2024-08-19 07:41:01,179 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-19 07:41:03,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4359190.0, ans=0.0 2024-08-19 07:41:03,869 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.32 vs. limit=15.0 2024-08-19 07:41:06,916 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.217e+01 2.506e+01 2.797e+01 5.801e+01, threshold=5.013e+01, percent-clipped=1.0 2024-08-19 07:41:08,457 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4359290.0, ans=0.1 2024-08-19 07:41:13,227 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 21 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-19 07:41:23,573 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 16 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-19 07:41:26,033 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 20 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-19 07:41:26,356 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4359390.0, ans=0.0 2024-08-19 07:41:34,307 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 5850, loss[loss=0.1048, beats_loss=0.009195, ecapa_loss=0.0001582, whisper_loss=0.09403, over 21629.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0105, ecapa_loss=0.0001418, whisper_loss=0.09068, over 3924445.28 frames. ], batch size: 90, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:41:36,142 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4359490.0, ans=0.1 2024-08-19 07:41:36,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=4359490.0, ans=0.5 2024-08-19 07:41:59,431 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4359690.0, ans=0.05 2024-08-19 07:42:03,058 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4359690.0, ans=0.125 2024-08-19 07:42:15,305 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 15 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-19 07:42:18,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4359790.0, ans=0.125 2024-08-19 07:42:19,410 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4359790.0, ans=0.2 2024-08-19 07:42:21,124 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.22 vs. limit=10.0 2024-08-19 07:42:32,998 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 26 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-19 07:42:33,325 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4359890.0, ans=0.125 2024-08-19 07:42:36,789 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 5900, loss[loss=0.1033, beats_loss=0.01115, ecapa_loss=0.000151, whisper_loss=0.09064, over 17508.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01043, ecapa_loss=0.0001416, whisper_loss=0.09065, over 3897711.43 frames. ], batch size: 71, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:42:44,264 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-19 07:43:01,497 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 30 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-19 07:43:13,513 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.336e+01 2.599e+01 2.904e+01 5.543e+01, threshold=5.198e+01, percent-clipped=1.0 2024-08-19 07:43:13,641 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-19 07:43:37,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4360390.0, ans=0.125 2024-08-19 07:43:40,784 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 5950, loss[loss=0.09514, beats_loss=0.01154, ecapa_loss=0.0001794, whisper_loss=0.0818, over 19276.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01043, ecapa_loss=0.0001422, whisper_loss=0.09066, over 3939377.15 frames. ], batch size: 81, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:43:56,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4360590.0, ans=0.125 2024-08-19 07:44:15,407 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=15.72 vs. limit=15.0 2024-08-19 07:44:36,971 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.37 vs. limit=22.5 2024-08-19 07:44:43,770 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 6000, loss[loss=0.1014, beats_loss=0.01109, ecapa_loss=0.0001167, whisper_loss=0.08915, over 18360.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0105, ecapa_loss=0.0001414, whisper_loss=0.09051, over 3921650.39 frames. ], batch size: 68, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:44:43,770 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-19 07:44:57,484 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.3364, 1.3757, 2.3213, 1.2735, 1.1501, 1.8644, 2.3500, 2.3010], device='cuda:2') 2024-08-19 07:45:18,091 INFO [train_multi_KD3.py:1149] (2/4) Epoch 30, validation on ASR_libri: loss=0.2522, beats_loss=0, ecapa_loss=0.0005153, whisper_loss=0.2471, over 922467.00 frames. 2024-08-19 07:45:36,091 INFO [train_multi_KD3.py:1149] (2/4) Epoch 30, validation on SV_voxceleb1: loss=0.004094, beats_loss=0, ecapa_loss=0.0004094, whisper_loss=0, over 939242.00 frames. 2024-08-19 07:47:12,266 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.4663, 1.8555, 1.6170, 1.2497, 1.5035, 1.4024, 1.6805, 1.5787], device='cuda:2') 2024-08-19 07:47:13,289 INFO [train_multi_KD3.py:1149] (2/4) Epoch 30, validation on AT_audioset: loss=0.02301, beats_loss=0.02301, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 07:47:13,297 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31584MB 2024-08-19 07:47:17,150 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4360990.0, ans=0.0 2024-08-19 07:47:21,790 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 29 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-19 07:47:28,207 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 20 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-19 07:47:46,025 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4361190.0, ans=0.2 2024-08-19 07:47:48,006 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.272e+01 2.512e+01 2.834e+01 4.204e+02, threshold=5.024e+01, percent-clipped=2.0 2024-08-19 07:48:07,658 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.09 vs. limit=15.0 2024-08-19 07:48:09,543 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-19 07:48:12,105 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-19 07:48:15,555 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 6050, loss[loss=0.09458, beats_loss=0.01051, ecapa_loss=0.0001437, whisper_loss=0.08263, over 19696.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01046, ecapa_loss=0.0001412, whisper_loss=0.09112, over 3922427.08 frames. ], batch size: 77, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:48:21,889 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 19 from LS+wenet, 28 from Vox, 45 fro AS 2024-08-19 07:48:29,560 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-19 07:48:35,798 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4361590.0, ans=0.1 2024-08-19 07:48:39,636 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4361690.0, ans=0.0 2024-08-19 07:48:44,941 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.55 vs. limit=15.0 2024-08-19 07:48:49,578 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4361690.0, ans=0.125 2024-08-19 07:48:59,897 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-19 07:49:00,764 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.53 vs. limit=15.0 2024-08-19 07:49:04,201 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4361890.0, ans=0.125 2024-08-19 07:49:06,896 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4361890.0, ans=0.2 2024-08-19 07:49:17,004 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 6100, loss[loss=0.07405, beats_loss=0.01249, ecapa_loss=0.0001001, whisper_loss=0.06056, over 15050.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01051, ecapa_loss=0.0001412, whisper_loss=0.09128, over 3927331.24 frames. ], batch size: 60, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:49:17,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4361990.0, ans=0.1 2024-08-19 07:49:51,996 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.656e+01 2.346e+01 2.570e+01 2.899e+01 1.665e+02, threshold=5.140e+01, percent-clipped=1.0 2024-08-19 07:50:19,221 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 6150, loss[loss=0.07919, beats_loss=0.008945, ecapa_loss=0.0001112, whisper_loss=0.06913, over 15584.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01051, ecapa_loss=0.0001406, whisper_loss=0.09115, over 3923454.37 frames. ], batch size: 56, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:50:19,403 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-19 07:50:34,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4362590.0, ans=0.0 2024-08-19 07:50:39,359 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 25 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-19 07:50:52,605 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.79 vs. limit=12.0 2024-08-19 07:51:03,206 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 14 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-19 07:51:16,048 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4362890.0, ans=0.125 2024-08-19 07:51:19,585 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-19 07:51:21,799 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 6200, loss[loss=0.1076, beats_loss=0.01122, ecapa_loss=0.0001222, whisper_loss=0.09511, over 22462.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01049, ecapa_loss=0.0001405, whisper_loss=0.09067, over 3926099.72 frames. ], batch size: 92, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:51:27,788 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.76 vs. limit=15.0 2024-08-19 07:51:37,231 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 24 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-19 07:51:57,099 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.005e+01 2.318e+01 2.578e+01 2.832e+01 1.795e+02, threshold=5.155e+01, percent-clipped=1.0 2024-08-19 07:52:05,416 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.58 vs. limit=15.0 2024-08-19 07:52:12,847 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4363390.0, ans=0.0 2024-08-19 07:52:13,897 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4363390.0, ans=0.125 2024-08-19 07:52:24,454 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 6250, loss[loss=0.1141, beats_loss=0.008571, ecapa_loss=0.0001477, whisper_loss=0.104, over 19449.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01049, ecapa_loss=0.0001395, whisper_loss=0.09097, over 3951933.11 frames. ], batch size: 77, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:52:27,356 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4363490.0, ans=0.0 2024-08-19 07:52:29,386 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 20 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-19 07:52:33,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=4363490.0, ans=0.05 2024-08-19 07:52:34,152 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.38 vs. limit=10.0 2024-08-19 07:52:43,406 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4363590.0, ans=0.125 2024-08-19 07:52:43,523 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4363590.0, ans=0.1 2024-08-19 07:52:53,646 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-19 07:52:53,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4363690.0, ans=0.0 2024-08-19 07:53:01,202 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4363790.0, ans=0.0 2024-08-19 07:53:09,070 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4363790.0, ans=0.1 2024-08-19 07:53:19,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4363890.0, ans=0.125 2024-08-19 07:53:24,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4363890.0, ans=0.125 2024-08-19 07:53:26,962 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 6300, loss[loss=0.09929, beats_loss=0.008405, ecapa_loss=0.0001726, whisper_loss=0.08916, over 17196.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01047, ecapa_loss=0.0001401, whisper_loss=0.091, over 3943489.39 frames. ], batch size: 69, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:53:30,372 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.04 vs. limit=12.0 2024-08-19 07:54:02,110 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.975e+01 2.421e+01 2.740e+01 3.001e+01 4.558e+01, threshold=5.480e+01, percent-clipped=0.0 2024-08-19 07:54:13,638 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 26 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-19 07:54:17,924 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.34 vs. limit=15.0 2024-08-19 07:54:19,825 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 16 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-19 07:54:23,728 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4364390.0, ans=0.125 2024-08-19 07:54:29,817 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 6350, loss[loss=0.1061, beats_loss=0.01143, ecapa_loss=0.0001506, whisper_loss=0.09313, over 21332.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01054, ecapa_loss=0.0001405, whisper_loss=0.09028, over 3889662.53 frames. ], batch size: 91, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:54:30,602 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2024-08-19 07:54:33,832 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 07:54:38,461 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-19 07:54:43,390 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 21 from LS+wenet, 25 from Vox, 47 fro AS 2024-08-19 07:54:48,393 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 27 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-19 07:54:56,067 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4364690.0, ans=0.125 2024-08-19 07:55:02,329 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.16 vs. limit=10.0 2024-08-19 07:55:02,517 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.89 vs. limit=15.0 2024-08-19 07:55:10,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4364790.0, ans=0.125 2024-08-19 07:55:12,459 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.63 vs. limit=10.0 2024-08-19 07:55:16,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4364790.0, ans=0.125 2024-08-19 07:55:18,432 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.16 vs. limit=22.5 2024-08-19 07:55:20,968 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2024-08-19 07:55:30,589 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-19 07:55:32,055 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-19 07:55:33,099 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 6400, loss[loss=0.1016, beats_loss=0.01072, ecapa_loss=0.0001254, whisper_loss=0.08959, over 16956.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01049, ecapa_loss=0.0001407, whisper_loss=0.09058, over 3906897.19 frames. ], batch size: 66, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:55:46,435 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4365090.0, ans=0.125 2024-08-19 07:55:53,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4365090.0, ans=0.125 2024-08-19 07:55:56,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4365090.0, ans=0.0 2024-08-19 07:56:16,379 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.014e+01 2.326e+01 2.535e+01 2.915e+01 6.831e+01, threshold=5.071e+01, percent-clipped=1.0 2024-08-19 07:56:47,581 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.65 vs. limit=15.0 2024-08-19 07:56:55,523 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4365490.0, ans=0.125 2024-08-19 07:56:56,169 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 6450, loss[loss=0.1162, beats_loss=0.009058, ecapa_loss=0.0001474, whisper_loss=0.1057, over 23408.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0105, ecapa_loss=0.0001419, whisper_loss=0.09041, over 3931734.27 frames. ], batch size: 91, lr: 2.04e-03, grad_scale: 1.152921504606847e+18 2024-08-19 07:57:01,428 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4365490.0, ans=0.0 2024-08-19 07:57:05,614 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 24 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-19 07:57:15,788 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4365590.0, ans=0.125 2024-08-19 07:57:17,189 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4365590.0, ans=0.0 2024-08-19 07:57:17,347 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.57 vs. limit=12.0 2024-08-19 07:57:24,567 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4365590.0, ans=0.1 2024-08-19 07:57:27,326 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.31 vs. limit=12.0 2024-08-19 07:57:42,174 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4365690.0, ans=0.2 2024-08-19 07:58:06,654 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4365890.0, ans=0.0 2024-08-19 07:58:08,109 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4365890.0, ans=0.125 2024-08-19 07:58:10,240 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 07:58:14,799 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 17 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-19 07:58:26,223 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 6500, loss[loss=0.09547, beats_loss=0.01133, ecapa_loss=0.0001447, whisper_loss=0.0827, over 21942.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01052, ecapa_loss=0.0001415, whisper_loss=0.09067, over 3966307.71 frames. ], batch size: 92, lr: 2.04e-03, grad_scale: 1.152921504606847e+18 2024-08-19 07:59:14,513 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4366190.0, ans=0.0 2024-08-19 07:59:20,057 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4366190.0, ans=0.035 2024-08-19 07:59:20,058 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4366190.0, ans=0.0 2024-08-19 07:59:29,494 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.467e+01 2.714e+01 3.135e+01 4.455e+01, threshold=5.427e+01, percent-clipped=0.0 2024-08-19 07:59:42,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4366290.0, ans=0.1 2024-08-19 07:59:47,882 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 22 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-19 07:59:59,961 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4366390.0, ans=0.2 2024-08-19 08:00:06,377 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4366390.0, ans=0.0 2024-08-19 08:00:15,394 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 6550, loss[loss=0.1049, beats_loss=0.00932, ecapa_loss=0.0001413, whisper_loss=0.09416, over 19006.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0104, ecapa_loss=0.0001422, whisper_loss=0.09126, over 3937621.10 frames. ], batch size: 76, lr: 2.04e-03, grad_scale: 1.152921504606847e+18 2024-08-19 08:00:42,748 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 22 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-19 08:00:51,543 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4366590.0, ans=0.0 2024-08-19 08:00:51,862 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.55 vs. limit=22.5 2024-08-19 08:01:06,337 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 21 from LS+wenet, 21 from Vox, 16 fro AS 2024-08-19 08:01:11,885 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4366690.0, ans=0.125 2024-08-19 08:01:18,005 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4366690.0, ans=0.125 2024-08-19 08:01:25,981 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 23 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-19 08:01:36,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4366790.0, ans=0.025 2024-08-19 08:01:50,708 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4366890.0, ans=0.125 2024-08-19 08:02:05,205 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4366890.0, ans=0.125 2024-08-19 08:02:07,318 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 6600, loss[loss=0.1205, beats_loss=0.008413, ecapa_loss=0.0001824, whisper_loss=0.1103, over 22854.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01038, ecapa_loss=0.0001427, whisper_loss=0.09165, over 3982644.36 frames. ], batch size: 93, lr: 2.04e-03, grad_scale: 1.152921504606847e+18 2024-08-19 08:02:08,715 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-19 08:02:22,497 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 15 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-19 08:02:36,644 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 16 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-19 08:02:48,633 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-19 08:02:55,803 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-19 08:03:13,615 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.306e+01 2.532e+01 2.841e+01 4.066e+01, threshold=5.063e+01, percent-clipped=0.0 2024-08-19 08:03:19,918 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.03 vs. limit=15.0 2024-08-19 08:03:33,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4367390.0, ans=0.0 2024-08-19 08:03:33,775 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2024-08-19 08:03:43,480 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 6650, loss[loss=0.112, beats_loss=0.007373, ecapa_loss=0.0001174, whisper_loss=0.1034, over 15151.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01033, ecapa_loss=0.0001425, whisper_loss=0.0921, over 3981944.10 frames. ], batch size: 54, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:03:43,648 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-19 08:03:45,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4367490.0, ans=0.1 2024-08-19 08:04:16,648 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-19 08:04:17,104 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4367690.0, ans=0.125 2024-08-19 08:04:23,824 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 30 from LS+wenet, 15 from Vox, 50 fro AS 2024-08-19 08:04:39,853 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 22 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-19 08:04:48,759 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 24 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-19 08:04:50,393 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 22 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-19 08:04:50,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4367890.0, ans=0.2 2024-08-19 08:04:54,482 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-19 08:04:57,274 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 6700, loss[loss=0.1079, beats_loss=0.01139, ecapa_loss=0.0001176, whisper_loss=0.09534, over 22339.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01037, ecapa_loss=0.0001431, whisper_loss=0.09181, over 3971401.44 frames. ], batch size: 89, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:04:58,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4367990.0, ans=0.0 2024-08-19 08:05:11,580 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.12 vs. limit=15.0 2024-08-19 08:05:39,225 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 25 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-19 08:05:39,778 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=10.71 vs. limit=12.0 2024-08-19 08:05:41,117 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 28 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-19 08:05:42,030 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.605e+01 2.280e+01 2.491e+01 2.766e+01 3.799e+01, threshold=4.981e+01, percent-clipped=0.0 2024-08-19 08:05:53,770 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.26 vs. limit=10.0 2024-08-19 08:05:58,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4368390.0, ans=0.125 2024-08-19 08:06:01,891 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 22 from LS+wenet, 26 from Vox, 46 fro AS 2024-08-19 08:06:02,574 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.92 vs. limit=6.0 2024-08-19 08:06:03,574 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 18 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-19 08:06:13,701 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 6750, loss[loss=0.1051, beats_loss=0.0115, ecapa_loss=0.0001383, whisper_loss=0.09224, over 18941.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01039, ecapa_loss=0.0001428, whisper_loss=0.09176, over 3936830.84 frames. ], batch size: 75, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:06:30,731 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 28 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-19 08:06:32,093 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 19 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-19 08:06:35,504 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4368590.0, ans=0.1 2024-08-19 08:06:36,792 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-19 08:06:39,838 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4368590.0, ans=0.2 2024-08-19 08:06:41,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4368590.0, ans=0.0 2024-08-19 08:06:42,465 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-19 08:06:45,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4368690.0, ans=0.1 2024-08-19 08:06:50,326 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4368690.0, ans=0.125 2024-08-19 08:06:52,700 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4368690.0, ans=0.1 2024-08-19 08:06:54,353 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4368690.0, ans=0.125 2024-08-19 08:07:09,542 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 18 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-19 08:07:28,586 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 6800, loss[loss=0.1204, beats_loss=0.007889, ecapa_loss=0.0001799, whisper_loss=0.1107, over 22027.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01045, ecapa_loss=0.0001425, whisper_loss=0.09101, over 3945485.06 frames. ], batch size: 89, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:08:03,032 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4369190.0, ans=0.125 2024-08-19 08:08:09,053 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 36 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-19 08:08:11,803 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.446e+01 2.587e+01 2.881e+01 4.116e+01, threshold=5.174e+01, percent-clipped=0.0 2024-08-19 08:08:12,156 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 14 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-19 08:08:23,497 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 20 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-19 08:08:23,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4369290.0, ans=0.125 2024-08-19 08:08:25,009 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 31 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-19 08:08:33,637 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 19 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-19 08:08:35,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4369390.0, ans=0.125 2024-08-19 08:08:40,132 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.04 vs. limit=12.0 2024-08-19 08:08:42,306 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 6850, loss[loss=0.1048, beats_loss=0.009363, ecapa_loss=0.0001892, whisper_loss=0.09352, over 21027.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01048, ecapa_loss=0.0001433, whisper_loss=0.08999, over 3891954.82 frames. ], batch size: 90, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:08:42,974 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.46 vs. limit=15.0 2024-08-19 08:08:50,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4369490.0, ans=0.125 2024-08-19 08:08:56,437 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 17 from LS+wenet, 26 from Vox, 23 fro AS 2024-08-19 08:08:57,807 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 35 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-19 08:08:58,079 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4369590.0, ans=0.125 2024-08-19 08:09:01,098 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.26 vs. limit=22.5 2024-08-19 08:09:13,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4369690.0, ans=0.2 2024-08-19 08:09:13,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4369690.0, ans=0.0 2024-08-19 08:09:15,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4369690.0, ans=0.125 2024-08-19 08:09:16,312 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.22 vs. limit=22.5 2024-08-19 08:09:20,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4369690.0, ans=0.0 2024-08-19 08:09:25,212 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=4369690.0, ans=0.025 2024-08-19 08:09:25,240 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4369690.0, ans=0.09899494936611666 2024-08-19 08:09:26,624 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.27 vs. limit=15.0 2024-08-19 08:09:32,388 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4369790.0, ans=0.2 2024-08-19 08:09:36,526 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4369790.0, ans=0.125 2024-08-19 08:09:53,541 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-19 08:09:53,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4369890.0, ans=0.2 2024-08-19 08:09:57,524 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 6900, loss[loss=0.1149, beats_loss=0.01275, ecapa_loss=0.0001193, whisper_loss=0.1009, over 22870.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01048, ecapa_loss=0.0001435, whisper_loss=0.09065, over 3874795.61 frames. ], batch size: 88, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:10:10,258 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4369990.0, ans=0.125 2024-08-19 08:10:22,218 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4370090.0, ans=0.0 2024-08-19 08:10:38,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4370190.0, ans=0.1 2024-08-19 08:10:40,725 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.856e+01 2.283e+01 2.474e+01 2.721e+01 3.268e+01, threshold=4.948e+01, percent-clipped=0.0 2024-08-19 08:10:57,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4370390.0, ans=0.0 2024-08-19 08:10:58,908 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.98 vs. limit=15.0 2024-08-19 08:10:59,452 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 19 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-19 08:11:01,264 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4370390.0, ans=0.0 2024-08-19 08:11:12,280 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 6950, loss[loss=0.09697, beats_loss=0.01342, ecapa_loss=0.0001353, whisper_loss=0.08219, over 21697.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01057, ecapa_loss=0.0001424, whisper_loss=0.09027, over 3871500.93 frames. ], batch size: 88, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:11:12,858 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4370490.0, ans=0.1 2024-08-19 08:11:45,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4370690.0, ans=0.2 2024-08-19 08:12:10,665 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.51 vs. limit=15.0 2024-08-19 08:12:28,950 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 7000, loss[loss=0.0973, beats_loss=0.01091, ecapa_loss=0.00012, whisper_loss=0.08519, over 19372.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01051, ecapa_loss=0.000143, whisper_loss=0.0904, over 3871355.46 frames. ], batch size: 76, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:12:30,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4370990.0, ans=0.0 2024-08-19 08:12:31,608 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=4370990.0, ans=15.0 2024-08-19 08:12:36,533 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4370990.0, ans=0.125 2024-08-19 08:12:40,463 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-19 08:13:07,597 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.90 vs. limit=15.0 2024-08-19 08:13:11,931 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.771e+01 2.284e+01 2.585e+01 2.897e+01 5.224e+01, threshold=5.169e+01, percent-clipped=1.0 2024-08-19 08:13:16,945 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4371290.0, ans=0.2 2024-08-19 08:13:20,463 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-19 08:13:25,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4371290.0, ans=0.0 2024-08-19 08:13:28,782 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.28 vs. limit=22.5 2024-08-19 08:13:42,970 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 7050, loss[loss=0.1034, beats_loss=0.009685, ecapa_loss=0.0001291, whisper_loss=0.09239, over 22612.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01049, ecapa_loss=0.0001426, whisper_loss=0.09075, over 3918151.43 frames. ], batch size: 90, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:13:50,839 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-19 08:14:03,433 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-19 08:14:18,167 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 30 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-19 08:14:31,009 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 15 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-19 08:14:50,942 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 15 from LS+wenet, 30 from Vox, 48 fro AS 2024-08-19 08:15:02,236 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 7100, loss[loss=0.1143, beats_loss=0.01069, ecapa_loss=0.0001176, whisper_loss=0.1025, over 17548.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0105, ecapa_loss=0.0001421, whisper_loss=0.09032, over 3906559.79 frames. ], batch size: 68, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:15:12,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4371990.0, ans=0.125 2024-08-19 08:15:13,449 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 19 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-19 08:15:48,059 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.69 vs. limit=10.0 2024-08-19 08:15:48,446 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.334e+01 2.574e+01 2.776e+01 4.254e+01, threshold=5.149e+01, percent-clipped=0.0 2024-08-19 08:16:06,112 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-19 08:16:18,490 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 7150, loss[loss=0.09667, beats_loss=0.01034, ecapa_loss=0.000135, whisper_loss=0.08498, over 23506.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01047, ecapa_loss=0.0001412, whisper_loss=0.09048, over 3929381.19 frames. ], batch size: 93, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:16:38,730 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.78 vs. limit=10.0 2024-08-19 08:16:51,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4372690.0, ans=0.125 2024-08-19 08:17:33,759 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-19 08:17:37,069 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 7200, loss[loss=0.08609, beats_loss=0.009498, ecapa_loss=0.0001555, whisper_loss=0.07504, over 14265.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01048, ecapa_loss=0.0001402, whisper_loss=0.0904, over 3909193.16 frames. ], batch size: 58, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:17:59,094 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4373090.0, ans=0.1 2024-08-19 08:18:01,765 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4373090.0, ans=0.0 2024-08-19 08:18:02,891 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 21 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-19 08:18:04,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4373090.0, ans=0.05 2024-08-19 08:18:05,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4373090.0, ans=0.2 2024-08-19 08:18:11,326 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4373190.0, ans=0.125 2024-08-19 08:18:20,302 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 12 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-19 08:18:23,255 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.339e+01 2.591e+01 2.933e+01 7.006e+01, threshold=5.182e+01, percent-clipped=1.0 2024-08-19 08:18:25,549 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4373290.0, ans=0.0 2024-08-19 08:18:31,394 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 23 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-19 08:18:36,098 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4373290.0, ans=0.2 2024-08-19 08:18:42,408 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.26 vs. limit=15.0 2024-08-19 08:18:56,172 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 7250, loss[loss=0.1168, beats_loss=0.01075, ecapa_loss=0.0001295, whisper_loss=0.1047, over 21620.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01049, ecapa_loss=0.0001399, whisper_loss=0.09049, over 3921790.54 frames. ], batch size: 84, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:18:57,434 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=4373490.0, ans=0.025 2024-08-19 08:19:00,336 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-19 08:19:02,212 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4373490.0, ans=0.125 2024-08-19 08:19:19,583 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4373590.0, ans=0.0 2024-08-19 08:19:33,132 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4373690.0, ans=0.1 2024-08-19 08:19:47,924 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4373790.0, ans=0.1 2024-08-19 08:19:47,939 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4373790.0, ans=0.0 2024-08-19 08:19:47,961 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4373790.0, ans=0.1 2024-08-19 08:20:00,086 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4373890.0, ans=0.09899494936611666 2024-08-19 08:20:15,930 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 7300, loss[loss=0.09771, beats_loss=0.009542, ecapa_loss=0.0001572, whisper_loss=0.08659, over 14580.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01048, ecapa_loss=0.0001408, whisper_loss=0.09079, over 3911682.66 frames. ], batch size: 59, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:21:01,067 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-19 08:21:04,712 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.340e+01 2.529e+01 2.737e+01 3.250e+01, threshold=5.059e+01, percent-clipped=0.0 2024-08-19 08:21:07,170 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4374290.0, ans=0.125 2024-08-19 08:21:14,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4374290.0, ans=0.125 2024-08-19 08:21:29,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4374390.0, ans=0.05 2024-08-19 08:21:33,781 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 22 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-19 08:21:36,415 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-19 08:21:38,091 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 7350, loss[loss=0.1106, beats_loss=0.01235, ecapa_loss=0.000124, whisper_loss=0.09705, over 22981.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01048, ecapa_loss=0.0001408, whisper_loss=0.09061, over 3924130.21 frames. ], batch size: 90, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:21:40,493 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.39 vs. limit=15.0 2024-08-19 08:21:53,670 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 38 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-19 08:21:54,757 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-19 08:22:02,340 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 27 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-19 08:22:34,624 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 24 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-19 08:22:46,083 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 14 from Vox, 44 fro AS 2024-08-19 08:22:46,971 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.25 vs. limit=15.0 2024-08-19 08:22:48,315 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.67 vs. limit=15.0 2024-08-19 08:22:54,676 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 7400, loss[loss=0.09405, beats_loss=0.01212, ecapa_loss=0.000145, whisper_loss=0.08048, over 22939.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01048, ecapa_loss=0.0001414, whisper_loss=0.09095, over 3909102.30 frames. ], batch size: 95, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:22:55,174 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4374990.0, ans=0.125 2024-08-19 08:22:57,247 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4374990.0, ans=0.0 2024-08-19 08:23:03,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4374990.0, ans=0.0 2024-08-19 08:23:11,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4375090.0, ans=0.0 2024-08-19 08:23:17,114 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.75 vs. limit=15.0 2024-08-19 08:23:43,115 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.350e+01 2.542e+01 2.861e+01 4.984e+02, threshold=5.085e+01, percent-clipped=1.0 2024-08-19 08:24:15,395 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 7450, loss[loss=0.09145, beats_loss=0.01181, ecapa_loss=0.0001314, whisper_loss=0.07832, over 21698.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01041, ecapa_loss=0.000141, whisper_loss=0.09111, over 3898501.52 frames. ], batch size: 90, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:24:20,255 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 16 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-19 08:24:20,435 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4375490.0, ans=0.125 2024-08-19 08:24:26,216 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 22 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-19 08:24:30,251 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.20 vs. limit=10.0 2024-08-19 08:24:49,230 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-19 08:24:53,611 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 21 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-19 08:24:54,089 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.44 vs. limit=15.0 2024-08-19 08:25:07,047 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 20 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-19 08:25:18,430 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 21 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-19 08:25:30,002 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 7500, loss[loss=0.1111, beats_loss=0.009167, ecapa_loss=0.0001311, whisper_loss=0.1006, over 19549.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01043, ecapa_loss=0.0001407, whisper_loss=0.09052, over 3890597.44 frames. ], batch size: 76, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:25:36,997 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.78 vs. limit=15.0 2024-08-19 08:25:39,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4375990.0, ans=0.125 2024-08-19 08:25:41,132 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4375990.0, ans=0.125 2024-08-19 08:25:53,603 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 21 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-19 08:26:04,140 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4376190.0, ans=0.09899494936611666 2024-08-19 08:26:05,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4376190.0, ans=0.125 2024-08-19 08:26:12,077 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.516e+01 2.304e+01 2.520e+01 2.744e+01 4.658e+01, threshold=5.040e+01, percent-clipped=0.0 2024-08-19 08:26:16,759 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4376290.0, ans=0.2 2024-08-19 08:26:28,065 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.82 vs. limit=6.0 2024-08-19 08:26:31,733 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 23 from LS+wenet, 31 from Vox, 38 fro AS 2024-08-19 08:26:37,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4376390.0, ans=0.125 2024-08-19 08:26:41,916 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4376390.0, ans=0.2 2024-08-19 08:26:44,127 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 7550, loss[loss=0.1117, beats_loss=0.008792, ecapa_loss=0.0001345, whisper_loss=0.1016, over 18119.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01047, ecapa_loss=0.0001421, whisper_loss=0.09015, over 3878780.31 frames. ], batch size: 69, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:26:52,968 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-19 08:26:56,709 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4376490.0, ans=0.125 2024-08-19 08:26:58,863 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=18.11 vs. limit=22.5 2024-08-19 08:27:06,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4376590.0, ans=0.0 2024-08-19 08:27:13,108 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-19 08:27:17,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4376690.0, ans=0.125 2024-08-19 08:27:23,499 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-19 08:27:36,703 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4376790.0, ans=0.125 2024-08-19 08:27:41,730 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 14 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-19 08:27:57,211 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4376890.0, ans=0.125 2024-08-19 08:27:58,195 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4376890.0, ans=0.0 2024-08-19 08:28:01,321 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 7600, loss[loss=0.09164, beats_loss=0.01036, ecapa_loss=0.0001625, whisper_loss=0.07966, over 18292.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01045, ecapa_loss=0.0001415, whisper_loss=0.09068, over 3869314.22 frames. ], batch size: 78, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:28:08,865 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 13 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-19 08:28:09,437 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.63 vs. limit=15.0 2024-08-19 08:28:12,510 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.96 vs. limit=15.0 2024-08-19 08:28:38,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4377190.0, ans=0.125 2024-08-19 08:28:41,214 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4377190.0, ans=0.0 2024-08-19 08:28:45,026 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.278e+01 2.433e+01 2.694e+01 5.084e+01, threshold=4.867e+01, percent-clipped=1.0 2024-08-19 08:28:46,499 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-19 08:28:51,340 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2024-08-19 08:29:13,586 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-19 08:29:15,038 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 7650, loss[loss=0.0936, beats_loss=0.01168, ecapa_loss=0.0001428, whisper_loss=0.08049, over 22962.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01046, ecapa_loss=0.0001421, whisper_loss=0.09035, over 3856907.39 frames. ], batch size: 93, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:29:18,783 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.41 vs. limit=15.0 2024-08-19 08:29:38,310 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-19 08:29:55,896 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4377790.0, ans=0.0 2024-08-19 08:30:10,111 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-19 08:30:11,515 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-19 08:30:22,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4377890.0, ans=0.125 2024-08-19 08:30:24,334 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 7700, loss[loss=0.1201, beats_loss=0.007144, ecapa_loss=0.0001789, whisper_loss=0.1112, over 20674.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0104, ecapa_loss=0.0001425, whisper_loss=0.09053, over 3868107.85 frames. ], batch size: 85, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:30:37,048 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4377990.0, ans=0.1 2024-08-19 08:30:39,323 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 08:30:41,348 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.74 vs. limit=15.0 2024-08-19 08:30:43,758 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 24 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-19 08:31:04,906 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 11 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-19 08:31:05,963 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.820e+01 2.389e+01 2.553e+01 2.809e+01 4.632e+01, threshold=5.107e+01, percent-clipped=0.0 2024-08-19 08:31:25,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4378390.0, ans=0.1 2024-08-19 08:31:34,546 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 7750, loss[loss=0.1095, beats_loss=0.01018, ecapa_loss=0.0001573, whisper_loss=0.09771, over 22276.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01044, ecapa_loss=0.0001411, whisper_loss=0.09039, over 3904062.35 frames. ], batch size: 91, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:31:51,564 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.96 vs. limit=15.0 2024-08-19 08:31:54,061 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 08:31:56,829 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4378590.0, ans=0.0 2024-08-19 08:32:00,346 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 16 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-19 08:32:07,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4378690.0, ans=0.07 2024-08-19 08:32:07,852 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.13 vs. limit=15.0 2024-08-19 08:32:16,951 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 26 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-19 08:32:32,061 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 13 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-19 08:32:40,724 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-19 08:32:41,815 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 7800, loss[loss=0.09756, beats_loss=0.0123, ecapa_loss=0.0001261, whisper_loss=0.084, over 22445.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0104, ecapa_loss=0.0001405, whisper_loss=0.09022, over 3918305.70 frames. ], batch size: 89, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:32:43,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4378990.0, ans=0.125 2024-08-19 08:32:54,463 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4379090.0, ans=0.125 2024-08-19 08:32:56,107 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.59 vs. limit=10.0 2024-08-19 08:32:57,251 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4379090.0, ans=0.125 2024-08-19 08:33:17,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4379190.0, ans=0.125 2024-08-19 08:33:20,978 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.315e+01 2.563e+01 2.893e+01 6.411e+01, threshold=5.126e+01, percent-clipped=2.0 2024-08-19 08:33:29,402 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4379290.0, ans=0.0 2024-08-19 08:33:44,032 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4379390.0, ans=0.0 2024-08-19 08:33:48,981 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 7850, loss[loss=0.1074, beats_loss=0.01015, ecapa_loss=0.0001383, whisper_loss=0.09588, over 17707.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01043, ecapa_loss=0.0001417, whisper_loss=0.09037, over 3902577.14 frames. ], batch size: 66, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:33:49,569 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4379490.0, ans=0.125 2024-08-19 08:33:58,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4379490.0, ans=0.2 2024-08-19 08:34:10,595 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.80 vs. limit=22.5 2024-08-19 08:34:15,573 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4379690.0, ans=0.125 2024-08-19 08:34:32,546 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4379790.0, ans=0.125 2024-08-19 08:34:41,774 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4379890.0, ans=0.0 2024-08-19 08:34:54,673 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 7900, loss[loss=0.09053, beats_loss=0.01103, ecapa_loss=0.0001687, whisper_loss=0.07781, over 18837.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01054, ecapa_loss=0.0001419, whisper_loss=0.08938, over 3881261.03 frames. ], batch size: 79, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:34:56,554 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 25 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-19 08:34:59,502 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=4379990.0, ans=0.05 2024-08-19 08:34:59,609 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4379990.0, ans=0.125 2024-08-19 08:35:00,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4379990.0, ans=0.125 2024-08-19 08:35:12,015 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 21 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-19 08:35:14,702 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 29 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-19 08:35:19,983 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4380190.0, ans=0.0 2024-08-19 08:35:32,857 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 21 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-19 08:35:34,025 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.772e+01 2.323e+01 2.601e+01 2.970e+01 4.832e+01, threshold=5.202e+01, percent-clipped=0.0 2024-08-19 08:35:34,888 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.65 vs. limit=15.0 2024-08-19 08:35:42,633 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 15 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-19 08:35:56,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4380390.0, ans=0.025 2024-08-19 08:35:57,763 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4380390.0, ans=0.0 2024-08-19 08:36:01,048 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 7950, loss[loss=0.1099, beats_loss=0.009445, ecapa_loss=0.0001619, whisper_loss=0.09882, over 23361.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01063, ecapa_loss=0.0001415, whisper_loss=0.08897, over 3901933.68 frames. ], batch size: 95, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:36:05,141 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 15 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-19 08:36:20,631 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-19 08:36:23,751 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4380590.0, ans=0.125 2024-08-19 08:36:23,767 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4380590.0, ans=0.125 2024-08-19 08:36:32,618 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 20 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-19 08:36:39,217 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-19 08:36:55,512 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4380890.0, ans=0.125 2024-08-19 08:37:08,549 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 8000, loss[loss=0.09815, beats_loss=0.01103, ecapa_loss=0.000147, whisper_loss=0.08565, over 20008.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0106, ecapa_loss=0.0001409, whisper_loss=0.08939, over 3902930.28 frames. ], batch size: 82, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:37:09,142 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4380990.0, ans=0.0 2024-08-19 08:37:10,121 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 26 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-19 08:37:11,720 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 29 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-19 08:37:18,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4380990.0, ans=0.0 2024-08-19 08:37:22,648 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4381090.0, ans=0.1 2024-08-19 08:37:30,256 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 17 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-19 08:37:32,980 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 20 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-19 08:37:39,704 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 16 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-19 08:37:39,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4381190.0, ans=0.125 2024-08-19 08:37:48,864 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.335e+01 2.541e+01 2.792e+01 1.974e+02, threshold=5.082e+01, percent-clipped=1.0 2024-08-19 08:37:49,334 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 23 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-19 08:38:16,350 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 8050, loss[loss=0.1035, beats_loss=0.01062, ecapa_loss=0.0001301, whisper_loss=0.09159, over 18532.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01054, ecapa_loss=0.0001414, whisper_loss=0.09008, over 3886924.77 frames. ], batch size: 74, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:38:28,714 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4381490.0, ans=0.2 2024-08-19 08:38:32,239 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 19 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-19 08:39:08,888 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 22 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-19 08:39:09,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4381790.0, ans=0.0 2024-08-19 08:39:09,289 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-19 08:39:13,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4381890.0, ans=0.2 2024-08-19 08:39:27,819 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4381990.0, ans=0.125 2024-08-19 08:39:29,633 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 8100, loss[loss=0.1051, beats_loss=0.01125, ecapa_loss=0.0001421, whisper_loss=0.09246, over 23212.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01046, ecapa_loss=0.0001418, whisper_loss=0.09043, over 3904929.31 frames. ], batch size: 92, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:39:36,796 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2024-08-19 08:39:40,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4381990.0, ans=0.2 2024-08-19 08:39:43,938 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.35 vs. limit=22.5 2024-08-19 08:39:47,354 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 35 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-19 08:40:12,651 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.771e+01 2.219e+01 2.443e+01 2.808e+01 4.973e+01, threshold=4.885e+01, percent-clipped=0.0 2024-08-19 08:40:29,580 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 22 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-19 08:40:32,742 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4382390.0, ans=0.125 2024-08-19 08:40:40,440 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4382490.0, ans=0.125 2024-08-19 08:40:41,174 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 8150, loss[loss=0.08867, beats_loss=0.01082, ecapa_loss=0.0001661, whisper_loss=0.07619, over 19756.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01043, ecapa_loss=0.0001414, whisper_loss=0.09009, over 3885856.95 frames. ], batch size: 84, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:40:44,839 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.54 vs. limit=12.0 2024-08-19 08:40:45,905 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4382490.0, ans=0.0 2024-08-19 08:40:48,389 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 23 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-19 08:40:58,698 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4382590.0, ans=0.125 2024-08-19 08:41:00,942 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 15 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-19 08:41:04,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4382590.0, ans=0.04949747468305833 2024-08-19 08:41:04,687 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.55 vs. limit=22.5 2024-08-19 08:41:11,446 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-19 08:41:12,837 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 22 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-19 08:41:28,109 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 25 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-19 08:41:38,897 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 21 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-19 08:41:40,879 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.74 vs. limit=10.0 2024-08-19 08:41:44,871 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4382890.0, ans=0.125 2024-08-19 08:41:50,099 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 25 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-19 08:41:52,721 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 8200, loss[loss=0.093, beats_loss=0.00943, ecapa_loss=0.000183, whisper_loss=0.08174, over 14293.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01047, ecapa_loss=0.000142, whisper_loss=0.08945, over 3879302.52 frames. ], batch size: 59, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:41:58,911 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=4382990.0, ans=0.05 2024-08-19 08:42:01,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4382990.0, ans=0.125 2024-08-19 08:42:32,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4383190.0, ans=0.2 2024-08-19 08:42:35,677 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.301e+01 2.611e+01 2.872e+01 3.807e+01, threshold=5.223e+01, percent-clipped=0.0 2024-08-19 08:42:43,424 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4383290.0, ans=0.1 2024-08-19 08:42:52,490 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.87 vs. limit=22.5 2024-08-19 08:42:56,799 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.51 vs. limit=15.0 2024-08-19 08:43:00,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4383390.0, ans=0.125 2024-08-19 08:43:03,981 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 8250, loss[loss=0.1145, beats_loss=0.01065, ecapa_loss=0.0001482, whisper_loss=0.1024, over 22403.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01054, ecapa_loss=0.0001413, whisper_loss=0.0897, over 3874076.11 frames. ], batch size: 90, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:43:33,806 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4383690.0, ans=0.0 2024-08-19 08:43:49,817 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.29 vs. limit=15.0 2024-08-19 08:44:01,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4383790.0, ans=0.1 2024-08-19 08:44:09,938 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4383890.0, ans=0.1 2024-08-19 08:44:13,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4383890.0, ans=0.0 2024-08-19 08:44:22,521 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 8300, loss[loss=0.06279, beats_loss=0.01199, ecapa_loss=0.000167, whisper_loss=0.04914, over 15491.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01047, ecapa_loss=0.0001422, whisper_loss=0.08969, over 3869979.96 frames. ], batch size: 67, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:44:49,425 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.43 vs. limit=15.0 2024-08-19 08:44:52,541 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 23 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-19 08:45:07,634 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.704e+01 2.452e+01 2.719e+01 3.109e+01 1.763e+02, threshold=5.438e+01, percent-clipped=2.0 2024-08-19 08:45:30,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4384390.0, ans=0.125 2024-08-19 08:45:36,742 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 8350, loss[loss=0.1152, beats_loss=0.008573, ecapa_loss=0.0001367, whisper_loss=0.1052, over 16916.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01052, ecapa_loss=0.0001415, whisper_loss=0.08996, over 3906546.93 frames. ], batch size: 64, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:45:48,193 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4384490.0, ans=0.125 2024-08-19 08:45:51,063 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4384590.0, ans=0.1 2024-08-19 08:46:00,612 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.79 vs. limit=22.5 2024-08-19 08:46:03,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4384590.0, ans=0.0 2024-08-19 08:46:09,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4384690.0, ans=0.125 2024-08-19 08:46:36,781 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-19 08:46:39,807 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 23 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-19 08:46:50,717 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 8400, loss[loss=0.1271, beats_loss=0.008215, ecapa_loss=0.000132, whisper_loss=0.1175, over 23726.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01041, ecapa_loss=0.0001423, whisper_loss=0.09031, over 3896979.82 frames. ], batch size: 90, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:47:25,990 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4385190.0, ans=0.125 2024-08-19 08:47:28,918 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 29 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-19 08:47:39,326 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.283e+01 2.484e+01 2.806e+01 4.179e+01, threshold=4.968e+01, percent-clipped=0.0 2024-08-19 08:47:46,633 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 14 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-19 08:47:48,414 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.832e+00 2024-08-19 08:47:49,821 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-19 08:47:50,106 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=4385290.0, ans=0.5 2024-08-19 08:47:51,029 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 24 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-19 08:48:01,298 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 20 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-19 08:48:10,980 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 8450, loss[loss=0.09226, beats_loss=0.01103, ecapa_loss=0.0001307, whisper_loss=0.07992, over 21759.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01039, ecapa_loss=0.0001426, whisper_loss=0.0903, over 3895890.59 frames. ], batch size: 90, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:48:24,725 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 32 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-19 08:48:36,653 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-19 08:48:37,980 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-19 08:48:54,690 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 18 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-19 08:48:59,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4385790.0, ans=0.125 2024-08-19 08:49:01,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4385790.0, ans=0.0 2024-08-19 08:49:09,487 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-19 08:49:11,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4385890.0, ans=0.125 2024-08-19 08:49:17,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4385890.0, ans=0.0 2024-08-19 08:49:22,653 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 8500, loss[loss=0.1054, beats_loss=0.01024, ecapa_loss=0.0001666, whisper_loss=0.09349, over 22519.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01043, ecapa_loss=0.0001423, whisper_loss=0.09021, over 3924092.12 frames. ], batch size: 90, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:49:35,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4385990.0, ans=0.0 2024-08-19 08:49:46,408 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-19 08:49:55,610 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-19 08:50:01,644 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 19 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-19 08:50:04,790 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=4386290.0, ans=0.1 2024-08-19 08:50:06,339 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+01 2.300e+01 2.586e+01 2.886e+01 4.322e+01, threshold=5.173e+01, percent-clipped=0.0 2024-08-19 08:50:17,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4386290.0, ans=0.125 2024-08-19 08:50:27,292 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-19 08:50:30,422 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 19 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-19 08:50:36,433 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 8550, loss[loss=0.1175, beats_loss=0.01083, ecapa_loss=0.0001701, whisper_loss=0.105, over 21808.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01042, ecapa_loss=0.0001411, whisper_loss=0.09042, over 3904068.80 frames. ], batch size: 89, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:50:40,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4386490.0, ans=0.2 2024-08-19 08:50:59,581 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4386590.0, ans=0.125 2024-08-19 08:51:07,132 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-19 08:51:19,430 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4386690.0, ans=0.125 2024-08-19 08:51:36,294 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 17 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-19 08:51:40,971 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4386890.0, ans=0.125 2024-08-19 08:51:41,046 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4386890.0, ans=0.125 2024-08-19 08:51:53,651 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 8600, loss[loss=0.1011, beats_loss=0.01039, ecapa_loss=0.0001077, whisper_loss=0.08964, over 19938.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01041, ecapa_loss=0.0001416, whisper_loss=0.09064, over 3887705.09 frames. ], batch size: 73, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:51:54,388 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4386990.0, ans=0.0 2024-08-19 08:52:08,156 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.17 vs. limit=15.0 2024-08-19 08:52:14,307 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-19 08:52:14,887 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4387090.0, ans=0.125 2024-08-19 08:52:21,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4387090.0, ans=0.125 2024-08-19 08:52:37,658 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-19 08:52:38,999 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.282e+01 2.546e+01 2.881e+01 4.529e+01, threshold=5.091e+01, percent-clipped=0.0 2024-08-19 08:52:50,939 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4387290.0, ans=0.0 2024-08-19 08:52:54,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4387390.0, ans=0.2 2024-08-19 08:53:00,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4387390.0, ans=0.125 2024-08-19 08:53:05,309 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-19 08:53:06,413 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 8650, loss[loss=0.1173, beats_loss=0.009057, ecapa_loss=0.0001792, whisper_loss=0.1065, over 21856.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01041, ecapa_loss=0.0001409, whisper_loss=0.09077, over 3878556.62 frames. ], batch size: 91, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:53:15,551 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-19 08:53:15,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4387490.0, ans=0.125 2024-08-19 08:53:16,166 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.44 vs. limit=15.0 2024-08-19 08:53:34,212 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 20 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-19 08:53:40,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4387690.0, ans=0.1 2024-08-19 08:53:40,638 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.82 vs. limit=15.0 2024-08-19 08:53:45,439 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4387690.0, ans=0.1 2024-08-19 08:53:50,633 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-19 08:53:56,818 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4387790.0, ans=0.025 2024-08-19 08:53:57,787 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-19 08:54:03,838 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.66 vs. limit=12.0 2024-08-19 08:54:05,191 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.39 vs. limit=12.0 2024-08-19 08:54:14,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.whiten.whitening_limit, batch_count=4387890.0, ans=12.0 2024-08-19 08:54:17,931 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 8700, loss[loss=0.0737, beats_loss=0.01285, ecapa_loss=0.0001019, whisper_loss=0.05983, over 20669.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01046, ecapa_loss=0.0001404, whisper_loss=0.09013, over 3879154.61 frames. ], batch size: 81, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:54:41,939 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4388090.0, ans=0.125 2024-08-19 08:54:42,578 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.39 vs. limit=6.0 2024-08-19 08:54:44,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4388190.0, ans=0.2 2024-08-19 08:54:45,060 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.81 vs. limit=15.0 2024-08-19 08:54:48,821 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4388190.0, ans=0.0 2024-08-19 08:54:49,895 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 20 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-19 08:54:56,814 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-19 08:54:57,011 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4388190.0, ans=0.07 2024-08-19 08:54:59,156 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.796e+01 2.272e+01 2.455e+01 2.713e+01 3.409e+01, threshold=4.910e+01, percent-clipped=0.0 2024-08-19 08:55:29,304 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 8750, loss[loss=0.08907, beats_loss=0.01057, ecapa_loss=0.0001419, whisper_loss=0.07708, over 18440.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01044, ecapa_loss=0.0001414, whisper_loss=0.08994, over 3856634.90 frames. ], batch size: 72, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:55:33,290 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.70 vs. limit=15.0 2024-08-19 08:55:36,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4388490.0, ans=0.125 2024-08-19 08:55:40,352 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 21 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-19 08:55:43,589 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4388590.0, ans=0.125 2024-08-19 08:55:47,527 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4388590.0, ans=0.125 2024-08-19 08:56:03,425 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-19 08:56:20,988 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4388790.0, ans=0.2 2024-08-19 08:56:31,269 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-19 08:56:33,707 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.73 vs. limit=15.0 2024-08-19 08:56:41,747 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4388890.0, ans=0.125 2024-08-19 08:56:44,308 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 8800, loss[loss=0.08597, beats_loss=0.01233, ecapa_loss=0.0001468, whisper_loss=0.07217, over 21229.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01044, ecapa_loss=0.0001411, whisper_loss=0.09005, over 3858217.97 frames. ], batch size: 88, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:56:50,897 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4388990.0, ans=0.125 2024-08-19 08:56:58,692 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 28 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-19 08:57:00,343 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4389090.0, ans=0.125 2024-08-19 08:57:02,971 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 23 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-19 08:57:06,981 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.46 vs. limit=15.0 2024-08-19 08:57:07,392 WARNING [optim.py:496] (2/4) Scaling gradients by 0.08632224053144455, model_norm_threshold=49.099090576171875 2024-08-19 08:57:07,555 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.22, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.037e+04, grad_sumsq=6.733e+06, orig_rms_sq=1.045e-02 2024-08-19 08:57:09,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4389090.0, ans=0.2 2024-08-19 08:57:23,911 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 29 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-19 08:57:27,447 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.642e+01 2.295e+01 2.616e+01 2.854e+01 5.688e+02, threshold=5.231e+01, percent-clipped=2.0 2024-08-19 08:57:38,255 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-19 08:57:41,215 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4389390.0, ans=0.125 2024-08-19 08:57:53,212 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 20 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-19 08:57:56,614 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.16 vs. limit=12.0 2024-08-19 08:57:57,253 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 8850, loss[loss=0.0736, beats_loss=0.01442, ecapa_loss=0.0001363, whisper_loss=0.05782, over 17444.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01049, ecapa_loss=0.0001409, whisper_loss=0.0896, over 3849252.54 frames. ], batch size: 75, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:58:12,883 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4389590.0, ans=0.125 2024-08-19 08:58:32,140 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4389690.0, ans=0.1 2024-08-19 08:58:33,683 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.89 vs. limit=15.0 2024-08-19 08:58:34,810 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4389690.0, ans=0.125 2024-08-19 08:58:35,100 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.90 vs. limit=22.5 2024-08-19 08:59:05,198 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-19 08:59:06,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4389890.0, ans=0.125 2024-08-19 08:59:12,686 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 8900, loss[loss=0.1176, beats_loss=0.01013, ecapa_loss=0.0001577, whisper_loss=0.1059, over 21940.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01045, ecapa_loss=0.00014, whisper_loss=0.09093, over 3847966.17 frames. ], batch size: 90, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:59:19,807 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 18 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-19 08:59:24,788 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 20 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-19 08:59:35,078 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 32 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-19 08:59:56,107 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.234e+01 2.571e+01 2.897e+01 3.544e+02, threshold=5.141e+01, percent-clipped=1.0 2024-08-19 09:00:00,064 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4390290.0, ans=0.2 2024-08-19 09:00:00,511 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.29 vs. limit=22.5 2024-08-19 09:00:01,438 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-19 09:00:07,093 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-19 09:00:10,382 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4390390.0, ans=0.0 2024-08-19 09:00:24,879 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 27 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-19 09:00:25,794 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 8950, loss[loss=0.1216, beats_loss=0.009822, ecapa_loss=0.0001065, whisper_loss=0.1107, over 18216.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01051, ecapa_loss=0.0001397, whisper_loss=0.09066, over 3886391.50 frames. ], batch size: 68, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:00:30,649 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4390490.0, ans=0.2 2024-08-19 09:00:50,757 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 28 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-19 09:01:02,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4390690.0, ans=0.125 2024-08-19 09:01:08,795 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-19 09:01:36,626 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 9000, loss[loss=0.1174, beats_loss=0.007625, ecapa_loss=0.0001459, whisper_loss=0.1083, over 23361.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01044, ecapa_loss=0.0001402, whisper_loss=0.09161, over 3887528.71 frames. ], batch size: 89, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:01:36,626 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-19 09:02:14,338 INFO [train_multi_KD3.py:1149] (2/4) Epoch 30, validation on ASR_libri: loss=0.2532, beats_loss=0, ecapa_loss=0.0005125, whisper_loss=0.2481, over 922467.00 frames. 2024-08-19 09:02:33,148 INFO [train_multi_KD3.py:1149] (2/4) Epoch 30, validation on SV_voxceleb1: loss=0.003997, beats_loss=0, ecapa_loss=0.0003997, whisper_loss=0, over 939242.00 frames. 2024-08-19 09:04:17,200 INFO [train_multi_KD3.py:1149] (2/4) Epoch 30, validation on AT_audioset: loss=0.02307, beats_loss=0.02307, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 09:04:17,204 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31584MB 2024-08-19 09:04:27,238 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 19 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-19 09:04:34,592 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4391090.0, ans=0.09899494936611666 2024-08-19 09:04:39,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4391090.0, ans=0.0 2024-08-19 09:04:41,276 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4391090.0, ans=0.1 2024-08-19 09:05:01,276 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.350e+01 2.627e+01 2.965e+01 6.113e+01, threshold=5.254e+01, percent-clipped=2.0 2024-08-19 09:05:11,503 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-19 09:05:33,570 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 9050, loss[loss=0.1309, beats_loss=0.007759, ecapa_loss=0.0001475, whisper_loss=0.1216, over 19870.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01048, ecapa_loss=0.00014, whisper_loss=0.09088, over 3889017.85 frames. ], batch size: 78, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:05:45,055 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.27 vs. limit=12.0 2024-08-19 09:05:56,385 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.96 vs. limit=15.0 2024-08-19 09:05:58,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4391590.0, ans=0.125 2024-08-19 09:06:03,742 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.69 vs. limit=22.5 2024-08-19 09:06:06,755 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-19 09:06:22,429 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-19 09:06:53,984 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 9100, loss[loss=0.08899, beats_loss=0.01014, ecapa_loss=0.0001553, whisper_loss=0.07729, over 21074.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01046, ecapa_loss=0.0001399, whisper_loss=0.09059, over 3875171.41 frames. ], batch size: 88, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:06:56,681 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4391990.0, ans=0.07 2024-08-19 09:06:57,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4391990.0, ans=0.125 2024-08-19 09:07:34,342 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-19 09:07:34,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4392190.0, ans=0.2 2024-08-19 09:07:42,479 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.265e+01 2.525e+01 2.711e+01 1.200e+02, threshold=5.050e+01, percent-clipped=1.0 2024-08-19 09:07:43,431 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 35 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-19 09:08:04,301 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4392390.0, ans=0.125 2024-08-19 09:08:13,349 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 9150, loss[loss=0.1079, beats_loss=0.01101, ecapa_loss=0.0001349, whisper_loss=0.09557, over 22090.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01048, ecapa_loss=0.0001413, whisper_loss=0.09041, over 3892627.24 frames. ], batch size: 90, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:08:17,309 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4392490.0, ans=0.125 2024-08-19 09:08:24,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4392490.0, ans=0.125 2024-08-19 09:08:27,094 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.72 vs. limit=15.0 2024-08-19 09:08:42,845 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 16 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-19 09:09:28,368 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 9200, loss[loss=0.1031, beats_loss=0.0114, ecapa_loss=0.0001604, whisper_loss=0.09014, over 21555.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01046, ecapa_loss=0.000142, whisper_loss=0.09081, over 3900236.77 frames. ], batch size: 88, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:09:38,275 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4392990.0, ans=0.125 2024-08-19 09:09:48,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4393090.0, ans=0.0 2024-08-19 09:09:51,502 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4393090.0, ans=0.125 2024-08-19 09:10:03,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4393190.0, ans=0.125 2024-08-19 09:10:03,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4393190.0, ans=0.125 2024-08-19 09:10:05,593 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-19 09:10:11,935 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.633e+01 2.355e+01 2.575e+01 2.865e+01 1.533e+02, threshold=5.149e+01, percent-clipped=1.0 2024-08-19 09:10:18,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4393290.0, ans=0.1 2024-08-19 09:10:24,037 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-19 09:10:29,848 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-19 09:10:36,737 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.36 vs. limit=15.0 2024-08-19 09:10:41,567 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 9250, loss[loss=0.1101, beats_loss=0.01077, ecapa_loss=0.0001568, whisper_loss=0.09772, over 22101.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01041, ecapa_loss=0.000142, whisper_loss=0.09099, over 3919112.03 frames. ], batch size: 89, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:10:45,259 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4393490.0, ans=0.125 2024-08-19 09:11:07,858 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 21 from LS+wenet, 26 from Vox, 45 fro AS 2024-08-19 09:11:12,537 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4393690.0, ans=0.125 2024-08-19 09:11:16,897 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4393690.0, ans=0.035 2024-08-19 09:11:45,794 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 18 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-19 09:11:53,494 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-19 09:11:54,362 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.18 vs. limit=10.0 2024-08-19 09:11:55,208 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 9300, loss[loss=0.1098, beats_loss=0.009659, ecapa_loss=0.0001505, whisper_loss=0.09861, over 22820.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01045, ecapa_loss=0.000142, whisper_loss=0.09039, over 3925868.57 frames. ], batch size: 91, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:12:02,596 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-19 09:12:08,935 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4394090.0, ans=0.0 2024-08-19 09:12:11,834 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 09:12:14,242 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-19 09:12:34,587 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=20.91 vs. limit=22.5 2024-08-19 09:12:37,375 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4394290.0, ans=0.125 2024-08-19 09:12:37,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4394290.0, ans=0.0 2024-08-19 09:12:38,104 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.989e+01 2.432e+01 2.581e+01 2.836e+01 1.723e+02, threshold=5.163e+01, percent-clipped=2.0 2024-08-19 09:12:46,468 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.73 vs. limit=15.0 2024-08-19 09:12:50,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4394290.0, ans=0.2 2024-08-19 09:12:50,072 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4394290.0, ans=0.0 2024-08-19 09:13:02,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4394390.0, ans=0.125 2024-08-19 09:13:06,078 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 9350, loss[loss=0.1148, beats_loss=0.01034, ecapa_loss=0.0001215, whisper_loss=0.1032, over 22683.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0104, ecapa_loss=0.0001424, whisper_loss=0.09057, over 3907520.34 frames. ], batch size: 87, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:13:21,710 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4394590.0, ans=0.125 2024-08-19 09:13:35,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4394690.0, ans=0.2 2024-08-19 09:13:35,624 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4394690.0, ans=0.0 2024-08-19 09:13:36,575 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 29 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-19 09:13:39,699 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4394690.0, ans=0.0 2024-08-19 09:13:56,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4394790.0, ans=0.1 2024-08-19 09:14:06,553 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4394890.0, ans=0.125 2024-08-19 09:14:13,914 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 9400, loss[loss=0.08322, beats_loss=0.01276, ecapa_loss=0.0001213, whisper_loss=0.06925, over 15431.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01042, ecapa_loss=0.000142, whisper_loss=0.09037, over 3904138.65 frames. ], batch size: 60, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:14:18,866 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4394990.0, ans=0.125 2024-08-19 09:14:22,425 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 19 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-19 09:14:22,646 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4394990.0, ans=0.125 2024-08-19 09:14:31,464 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4395090.0, ans=0.125 2024-08-19 09:14:32,769 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4395090.0, ans=0.0 2024-08-19 09:14:34,013 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4395090.0, ans=0.125 2024-08-19 09:14:48,164 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 14 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-19 09:14:54,365 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.883e+01 2.333e+01 2.565e+01 2.846e+01 4.265e+02, threshold=5.130e+01, percent-clipped=2.0 2024-08-19 09:14:55,157 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.62 vs. limit=22.5 2024-08-19 09:15:12,501 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 26 from LS+wenet, 14 from Vox, 18 fro AS 2024-08-19 09:15:19,871 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 9450, loss[loss=0.1096, beats_loss=0.009189, ecapa_loss=0.00014, whisper_loss=0.09903, over 22410.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01041, ecapa_loss=0.0001416, whisper_loss=0.09047, over 3876185.22 frames. ], batch size: 86, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:15:30,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4395490.0, ans=0.125 2024-08-19 09:15:42,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4395590.0, ans=0.125 2024-08-19 09:15:43,020 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.48 vs. limit=22.5 2024-08-19 09:15:46,326 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-19 09:16:01,644 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.10 vs. limit=22.5 2024-08-19 09:16:06,704 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.47 vs. limit=15.0 2024-08-19 09:16:08,469 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 26 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-19 09:16:19,241 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 12 from Vox, 48 fro AS 2024-08-19 09:16:26,127 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 9500, loss[loss=0.07033, beats_loss=0.01224, ecapa_loss=0.0001229, whisper_loss=0.05687, over 17715.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01048, ecapa_loss=0.0001419, whisper_loss=0.08941, over 3883476.05 frames. ], batch size: 73, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:16:28,563 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.58 vs. limit=15.0 2024-08-19 09:16:45,060 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-19 09:16:46,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4396090.0, ans=0.125 2024-08-19 09:17:06,172 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.770e+01 2.270e+01 2.577e+01 2.905e+01 4.057e+01, threshold=5.153e+01, percent-clipped=0.0 2024-08-19 09:17:08,336 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.47 vs. limit=15.0 2024-08-19 09:17:09,043 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 26 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-19 09:17:15,986 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4396290.0, ans=0.0 2024-08-19 09:17:17,297 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4396290.0, ans=0.1 2024-08-19 09:17:32,744 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 9550, loss[loss=0.09236, beats_loss=0.01103, ecapa_loss=0.000133, whisper_loss=0.08, over 21904.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01046, ecapa_loss=0.0001426, whisper_loss=0.08925, over 3880896.85 frames. ], batch size: 90, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:17:37,553 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4396490.0, ans=0.025 2024-08-19 09:17:50,650 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 22 from LS+wenet, 17 from Vox, 54 fro AS 2024-08-19 09:18:02,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=4396690.0, ans=0.05 2024-08-19 09:18:09,976 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4396690.0, ans=0.125 2024-08-19 09:18:15,060 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4396790.0, ans=0.1 2024-08-19 09:18:15,091 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4396790.0, ans=0.125 2024-08-19 09:18:30,887 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.78 vs. limit=22.5 2024-08-19 09:18:36,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4396990.0, ans=0.1 2024-08-19 09:18:37,654 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 9600, loss[loss=0.08929, beats_loss=0.01177, ecapa_loss=0.0001808, whisper_loss=0.07571, over 21558.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01051, ecapa_loss=0.0001437, whisper_loss=0.08892, over 3871652.73 frames. ], batch size: 93, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:18:50,282 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.69 vs. limit=15.0 2024-08-19 09:18:54,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4397090.0, ans=0.125 2024-08-19 09:19:05,531 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.64 vs. limit=22.5 2024-08-19 09:19:17,720 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.332e+01 2.537e+01 2.796e+01 5.515e+01, threshold=5.073e+01, percent-clipped=1.0 2024-08-19 09:19:17,830 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-19 09:19:24,520 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-19 09:19:30,384 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-19 09:19:47,039 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 9650, loss[loss=0.09656, beats_loss=0.01075, ecapa_loss=0.0001275, whisper_loss=0.08453, over 22940.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01046, ecapa_loss=0.0001442, whisper_loss=0.08851, over 3826788.69 frames. ], batch size: 90, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:19:50,321 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.69 vs. limit=15.0 2024-08-19 09:20:07,072 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-19 09:20:09,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4397590.0, ans=0.125 2024-08-19 09:20:10,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4397590.0, ans=0.1 2024-08-19 09:20:17,166 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-19 09:20:50,694 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.19 vs. limit=15.0 2024-08-19 09:20:57,063 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 9700, loss[loss=0.1206, beats_loss=0.008221, ecapa_loss=0.0001484, whisper_loss=0.1109, over 21246.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01032, ecapa_loss=0.0001441, whisper_loss=0.08993, over 3864625.39 frames. ], batch size: 81, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:21:16,164 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.02 vs. limit=12.0 2024-08-19 09:21:22,531 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.87 vs. limit=15.0 2024-08-19 09:21:30,331 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 16 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-19 09:21:37,818 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.414e+01 2.657e+01 3.096e+01 1.946e+02, threshold=5.314e+01, percent-clipped=1.0 2024-08-19 09:22:02,489 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4398390.0, ans=0.0 2024-08-19 09:22:04,340 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 9750, loss[loss=0.1086, beats_loss=0.01011, ecapa_loss=0.0001273, whisper_loss=0.09721, over 14771.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01035, ecapa_loss=0.0001431, whisper_loss=0.08972, over 3843990.11 frames. ], batch size: 56, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:22:15,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4398490.0, ans=0.125 2024-08-19 09:22:21,630 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 09:22:45,717 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 19 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-19 09:22:53,317 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 21 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-19 09:23:07,552 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 26 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-19 09:23:08,525 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 9800, loss[loss=0.1193, beats_loss=0.00938, ecapa_loss=0.0001311, whisper_loss=0.1086, over 18388.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01053, ecapa_loss=0.0001415, whisper_loss=0.08877, over 3847661.22 frames. ], batch size: 69, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:23:14,638 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.53 vs. limit=15.0 2024-08-19 09:23:21,141 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.33 vs. limit=22.5 2024-08-19 09:23:23,304 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-19 09:23:26,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4399090.0, ans=0.0 2024-08-19 09:23:44,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4399190.0, ans=0.1 2024-08-19 09:23:47,997 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.301e+01 2.526e+01 2.757e+01 3.952e+01, threshold=5.052e+01, percent-clipped=0.0 2024-08-19 09:23:48,691 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.58 vs. limit=22.5 2024-08-19 09:24:02,562 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4399390.0, ans=0.0 2024-08-19 09:24:11,160 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4399390.0, ans=0.1 2024-08-19 09:24:13,319 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 9850, loss[loss=0.1039, beats_loss=0.009526, ecapa_loss=0.0001789, whisper_loss=0.09257, over 15229.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01052, ecapa_loss=0.0001418, whisper_loss=0.0888, over 3829618.97 frames. ], batch size: 64, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:24:22,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4399490.0, ans=0.0 2024-08-19 09:24:27,383 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-19 09:24:56,689 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4399790.0, ans=0.125 2024-08-19 09:25:02,458 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-19 09:25:04,165 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.78 vs. limit=12.0 2024-08-19 09:25:13,203 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.03 vs. limit=15.0 2024-08-19 09:25:17,625 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 9900, loss[loss=0.0933, beats_loss=0.01372, ecapa_loss=0.0001133, whisper_loss=0.07845, over 15902.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01056, ecapa_loss=0.0001408, whisper_loss=0.08878, over 3817138.04 frames. ], batch size: 63, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:25:22,817 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-19 09:25:26,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4399990.0, ans=0.1 2024-08-19 09:25:29,805 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4399990.0, ans=0.1 2024-08-19 09:25:35,253 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4400090.0, ans=0.125 2024-08-19 09:25:36,997 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.90 vs. limit=10.0 2024-08-19 09:25:53,402 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 21 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-19 09:25:59,912 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.262e+01 2.560e+01 2.824e+01 4.177e+01, threshold=5.120e+01, percent-clipped=0.0 2024-08-19 09:26:15,988 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 09:26:15,999 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4400390.0, ans=0.125 2024-08-19 09:26:25,646 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 9950, loss[loss=0.1143, beats_loss=0.009479, ecapa_loss=0.0001428, whisper_loss=0.1034, over 20350.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01047, ecapa_loss=0.0001421, whisper_loss=0.08931, over 3844875.93 frames. ], batch size: 83, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:26:34,565 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4400490.0, ans=0.0 2024-08-19 09:26:37,626 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.00 vs. limit=15.0 2024-08-19 09:26:54,361 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4400690.0, ans=0.125 2024-08-19 09:27:07,911 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4400790.0, ans=0.0 2024-08-19 09:27:14,415 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4400790.0, ans=0.04949747468305833 2024-08-19 09:27:25,390 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4400890.0, ans=0.1 2024-08-19 09:27:32,798 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 10000, loss[loss=0.1039, beats_loss=0.01131, ecapa_loss=0.000142, whisper_loss=0.09115, over 21870.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01043, ecapa_loss=0.0001426, whisper_loss=0.09018, over 3853540.84 frames. ], batch size: 93, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:27:40,377 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=15.0 2024-08-19 09:27:54,553 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 27 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-19 09:27:55,942 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 25 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-19 09:27:56,248 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4401090.0, ans=0.125 2024-08-19 09:28:00,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=4401190.0, ans=10.0 2024-08-19 09:28:04,876 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=4401190.0, ans=0.5 2024-08-19 09:28:13,420 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.203e+01 2.418e+01 2.701e+01 3.828e+01, threshold=4.836e+01, percent-clipped=0.0 2024-08-19 09:28:19,433 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 23 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-19 09:28:28,033 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.07 vs. limit=15.0 2024-08-19 09:28:36,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4401390.0, ans=0.125 2024-08-19 09:28:40,076 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 10050, loss[loss=0.05709, beats_loss=0.01352, ecapa_loss=0.0001605, whisper_loss=0.04197, over 12917.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0104, ecapa_loss=0.0001424, whisper_loss=0.09068, over 3873704.66 frames. ], batch size: 57, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:28:44,909 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.79 vs. limit=22.5 2024-08-19 09:29:10,952 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-19 09:29:35,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4401890.0, ans=0.1 2024-08-19 09:29:39,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4401890.0, ans=0.0 2024-08-19 09:29:45,534 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 10100, loss[loss=0.09156, beats_loss=0.01166, ecapa_loss=0.0001416, whisper_loss=0.07849, over 17849.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01049, ecapa_loss=0.0001417, whisper_loss=0.09013, over 3895051.57 frames. ], batch size: 72, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:30:00,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4402090.0, ans=0.125 2024-08-19 09:30:08,777 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.75 vs. limit=15.0 2024-08-19 09:30:13,527 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4402190.0, ans=0.125 2024-08-19 09:30:27,050 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.294e+01 2.554e+01 2.790e+01 3.607e+01, threshold=5.108e+01, percent-clipped=0.0 2024-08-19 09:30:35,223 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-19 09:30:58,394 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 10150, loss[loss=0.08115, beats_loss=0.01009, ecapa_loss=0.0001688, whisper_loss=0.06938, over 13977.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01049, ecapa_loss=0.0001419, whisper_loss=0.08985, over 3909173.31 frames. ], batch size: 58, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:31:00,019 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4402490.0, ans=0.125 2024-08-19 09:31:03,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4402490.0, ans=0.2 2024-08-19 09:31:07,833 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-19 09:31:08,086 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4402490.0, ans=0.125 2024-08-19 09:31:08,323 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.88 vs. limit=22.5 2024-08-19 09:31:22,919 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-19 09:31:26,130 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4402690.0, ans=0.1 2024-08-19 09:31:27,274 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 09:31:37,841 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4402690.0, ans=0.04949747468305833 2024-08-19 09:32:02,535 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.61 vs. limit=10.0 2024-08-19 09:32:05,473 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 30 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-19 09:32:05,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4402890.0, ans=0.035 2024-08-19 09:32:11,432 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 33 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-19 09:32:12,448 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 10200, loss[loss=0.1112, beats_loss=0.008781, ecapa_loss=0.000129, whisper_loss=0.1011, over 21951.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01043, ecapa_loss=0.0001425, whisper_loss=0.09034, over 3916688.58 frames. ], batch size: 87, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:32:13,204 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=6.494e+01 2024-08-19 09:32:22,491 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4402990.0, ans=0.125 2024-08-19 09:32:48,158 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 14 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-19 09:32:54,600 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4403190.0, ans=0.125 2024-08-19 09:32:56,923 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.315e+01 2.558e+01 2.832e+01 4.132e+01, threshold=5.117e+01, percent-clipped=0.0 2024-08-19 09:32:57,041 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-19 09:33:00,760 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.89 vs. limit=15.0 2024-08-19 09:33:08,613 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-19 09:33:12,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4403390.0, ans=0.2 2024-08-19 09:33:18,241 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.33 vs. limit=10.0 2024-08-19 09:33:19,997 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.66 vs. limit=22.5 2024-08-19 09:33:25,363 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 10250, loss[loss=0.09539, beats_loss=0.01135, ecapa_loss=0.0001344, whisper_loss=0.08269, over 15776.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01041, ecapa_loss=0.000143, whisper_loss=0.09102, over 3907457.94 frames. ], batch size: 62, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:33:32,913 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4403490.0, ans=0.07 2024-08-19 09:34:02,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4403690.0, ans=0.0 2024-08-19 09:34:06,162 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.88 vs. limit=15.0 2024-08-19 09:34:08,890 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 23 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-19 09:34:45,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4403890.0, ans=0.09899494936611666 2024-08-19 09:34:45,300 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4403890.0, ans=0.0 2024-08-19 09:34:50,233 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 10300, loss[loss=0.09503, beats_loss=0.01005, ecapa_loss=0.000133, whisper_loss=0.08364, over 22086.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01044, ecapa_loss=0.0001421, whisper_loss=0.09034, over 3895932.28 frames. ], batch size: 88, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:34:57,240 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 14 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-19 09:35:29,677 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 09:35:30,561 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-19 09:35:41,381 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 31 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-19 09:35:42,351 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.424e+01 2.703e+01 3.011e+01 5.965e+01, threshold=5.405e+01, percent-clipped=1.0 2024-08-19 09:35:48,140 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 25 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-19 09:36:08,103 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-19 09:36:20,088 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 10350, loss[loss=0.1101, beats_loss=0.009507, ecapa_loss=0.0001136, whisper_loss=0.09947, over 15170.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01045, ecapa_loss=0.0001421, whisper_loss=0.09033, over 3904292.97 frames. ], batch size: 55, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:36:34,513 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 18 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-19 09:36:36,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4404590.0, ans=0.0 2024-08-19 09:36:47,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4404590.0, ans=0.125 2024-08-19 09:36:54,435 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4404690.0, ans=0.125 2024-08-19 09:37:52,730 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 10400, loss[loss=0.1057, beats_loss=0.01086, ecapa_loss=0.0001203, whisper_loss=0.09368, over 23701.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01054, ecapa_loss=0.000141, whisper_loss=0.08924, over 3888858.38 frames. ], batch size: 91, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:38:02,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4404990.0, ans=0.1 2024-08-19 09:38:19,959 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-19 09:38:20,183 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4405090.0, ans=0.125 2024-08-19 09:38:27,286 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 10 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-19 09:38:27,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4405090.0, ans=0.125 2024-08-19 09:38:38,156 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4405190.0, ans=0.125 2024-08-19 09:38:41,530 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 15 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-19 09:38:41,719 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4405190.0, ans=0.125 2024-08-19 09:38:44,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4405190.0, ans=0.09899494936611666 2024-08-19 09:38:46,998 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.316e+01 2.550e+01 2.840e+01 5.090e+01, threshold=5.101e+01, percent-clipped=0.0 2024-08-19 09:39:03,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4405390.0, ans=0.125 2024-08-19 09:39:13,766 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 10450, loss[loss=0.0785, beats_loss=0.01148, ecapa_loss=0.0001543, whisper_loss=0.06548, over 19209.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01058, ecapa_loss=0.000141, whisper_loss=0.08834, over 3841562.98 frames. ], batch size: 82, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:39:13,913 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 28 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-19 09:39:21,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4405490.0, ans=0.125 2024-08-19 09:39:23,248 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4405490.0, ans=0.1 2024-08-19 09:39:25,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4405490.0, ans=0.05 2024-08-19 09:39:27,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4405590.0, ans=0.125 2024-08-19 09:39:40,744 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-19 09:39:56,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4405790.0, ans=0.125 2024-08-19 09:40:22,966 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.50 vs. limit=22.5 2024-08-19 09:40:23,383 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 10500, loss[loss=0.1259, beats_loss=0.009988, ecapa_loss=0.0001197, whisper_loss=0.1147, over 24140.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01058, ecapa_loss=0.0001409, whisper_loss=0.08861, over 3821571.59 frames. ], batch size: 92, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:40:25,316 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-19 09:40:32,966 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-19 09:40:41,448 INFO [train_multi_KD3.py:844] (2/4) A total of 98 cuts. 27 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-19 09:40:41,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4406090.0, ans=0.125 2024-08-19 09:40:47,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4406090.0, ans=0.1 2024-08-19 09:40:49,839 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4406190.0, ans=0.125 2024-08-19 09:40:56,874 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.96 vs. limit=15.0 2024-08-19 09:41:00,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4406190.0, ans=0.2 2024-08-19 09:41:02,927 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.686e+01 2.206e+01 2.360e+01 2.626e+01 1.632e+02, threshold=4.720e+01, percent-clipped=1.0 2024-08-19 09:41:22,898 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 16 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-19 09:41:29,349 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 10550, loss[loss=0.09776, beats_loss=0.01149, ecapa_loss=0.0001591, whisper_loss=0.08468, over 22276.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01061, ecapa_loss=0.0001406, whisper_loss=0.08818, over 3833947.23 frames. ], batch size: 93, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:41:34,387 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.16 vs. limit=10.0 2024-08-19 09:41:53,072 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 16 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-19 09:42:11,580 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-19 09:42:36,126 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-19 09:42:40,598 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 10600, loss[loss=0.1121, beats_loss=0.008611, ecapa_loss=0.0002004, whisper_loss=0.1015, over 19841.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01061, ecapa_loss=0.0001407, whisper_loss=0.08858, over 3895215.34 frames. ], batch size: 84, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:42:53,547 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-19 09:43:03,039 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.45 vs. limit=15.0 2024-08-19 09:43:09,368 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 16 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-19 09:43:16,706 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 09:43:22,463 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.96 vs. limit=15.0 2024-08-19 09:43:22,874 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.325e+01 2.532e+01 2.911e+01 7.295e+01, threshold=5.064e+01, percent-clipped=1.0 2024-08-19 09:43:25,215 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2024-08-19 09:43:50,720 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 10650, loss[loss=0.1139, beats_loss=0.01137, ecapa_loss=0.0001307, whisper_loss=0.1012, over 23350.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01055, ecapa_loss=0.0001393, whisper_loss=0.08906, over 3877282.22 frames. ], batch size: 94, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:43:50,830 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-19 09:43:58,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4407490.0, ans=0.0 2024-08-19 09:44:19,028 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4407690.0, ans=0.1 2024-08-19 09:44:24,724 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 26 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-19 09:44:30,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4407690.0, ans=0.125 2024-08-19 09:44:50,621 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4407890.0, ans=0.1 2024-08-19 09:45:02,609 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 10700, loss[loss=0.1017, beats_loss=0.01069, ecapa_loss=0.0001487, whisper_loss=0.08953, over 15222.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01059, ecapa_loss=0.000138, whisper_loss=0.0896, over 3868967.05 frames. ], batch size: 61, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:45:10,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4407990.0, ans=0.0 2024-08-19 09:45:19,419 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4408090.0, ans=0.2 2024-08-19 09:45:23,909 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4408090.0, ans=0.1 2024-08-19 09:45:25,148 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4408090.0, ans=0.1 2024-08-19 09:45:25,244 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4408090.0, ans=0.125 2024-08-19 09:45:26,424 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4408090.0, ans=0.09899494936611666 2024-08-19 09:45:31,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4408190.0, ans=0.2 2024-08-19 09:45:43,426 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.323e+01 2.559e+01 2.783e+01 4.084e+01, threshold=5.118e+01, percent-clipped=0.0 2024-08-19 09:46:09,981 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 10750, loss[loss=0.09884, beats_loss=0.0112, ecapa_loss=0.0001381, whisper_loss=0.08626, over 18090.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01062, ecapa_loss=0.0001383, whisper_loss=0.08973, over 3877826.96 frames. ], batch size: 72, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:46:16,987 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4408490.0, ans=0.125 2024-08-19 09:46:19,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4408490.0, ans=0.95 2024-08-19 09:46:30,668 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=4408590.0, ans=0.2 2024-08-19 09:46:33,184 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4408590.0, ans=0.125 2024-08-19 09:46:42,778 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-19 09:47:00,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4408890.0, ans=0.125 2024-08-19 09:47:04,098 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4408890.0, ans=0.1 2024-08-19 09:47:14,121 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 10800, loss[loss=0.1234, beats_loss=0.01047, ecapa_loss=0.0001314, whisper_loss=0.1116, over 20969.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01057, ecapa_loss=0.0001395, whisper_loss=0.0907, over 3901766.54 frames. ], batch size: 84, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:47:19,389 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 32 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-19 09:47:28,590 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4409090.0, ans=0.125 2024-08-19 09:47:33,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4409090.0, ans=0.125 2024-08-19 09:47:33,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4409090.0, ans=0.0 2024-08-19 09:47:33,453 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4409090.0, ans=0.0 2024-08-19 09:47:37,056 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4409090.0, ans=0.035 2024-08-19 09:47:40,344 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 11 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-19 09:47:51,572 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.386e+01 2.632e+01 3.001e+01 4.725e+01, threshold=5.265e+01, percent-clipped=0.0 2024-08-19 09:48:05,332 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4409390.0, ans=0.0 2024-08-19 09:48:11,904 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4409390.0, ans=0.125 2024-08-19 09:48:17,657 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 10850, loss[loss=0.07883, beats_loss=0.01196, ecapa_loss=0.0001642, whisper_loss=0.06523, over 22414.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01051, ecapa_loss=0.0001396, whisper_loss=0.09196, over 3927299.66 frames. ], batch size: 95, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:48:19,458 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4409490.0, ans=0.125 2024-08-19 09:48:37,668 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4409590.0, ans=0.0 2024-08-19 09:48:45,776 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.73 vs. limit=22.5 2024-08-19 09:48:46,468 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4409690.0, ans=0.0 2024-08-19 09:48:51,606 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4409690.0, ans=0.0 2024-08-19 09:49:00,149 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4409790.0, ans=0.0 2024-08-19 09:49:02,584 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-19 09:49:04,963 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 27 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-19 09:49:10,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4409890.0, ans=0.125 2024-08-19 09:49:14,622 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.18 vs. limit=15.0 2024-08-19 09:49:21,260 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 10900, loss[loss=0.07619, beats_loss=0.009366, ecapa_loss=0.0001897, whisper_loss=0.06493, over 12441.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01051, ecapa_loss=0.0001393, whisper_loss=0.0918, over 3966543.28 frames. ], batch size: 55, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:49:24,900 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.91 vs. limit=10.0 2024-08-19 09:49:25,898 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4409990.0, ans=0.1 2024-08-19 09:49:43,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4410090.0, ans=0.125 2024-08-19 09:49:48,143 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-19 09:49:59,521 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.400e+01 2.615e+01 2.977e+01 1.064e+02, threshold=5.230e+01, percent-clipped=2.0 2024-08-19 09:50:09,028 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4410290.0, ans=0.07 2024-08-19 09:50:13,696 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-19 09:50:14,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4410390.0, ans=0.0 2024-08-19 09:50:25,135 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 10950, loss[loss=0.1009, beats_loss=0.00774, ecapa_loss=0.0001306, whisper_loss=0.0919, over 15980.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01036, ecapa_loss=0.0001405, whisper_loss=0.09237, over 3952743.39 frames. ], batch size: 59, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:50:35,377 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4410490.0, ans=0.125 2024-08-19 09:50:41,298 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-19 09:50:51,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4410690.0, ans=0.125 2024-08-19 09:50:59,535 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 23 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-19 09:51:07,919 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.17 vs. limit=15.0 2024-08-19 09:51:13,599 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-19 09:51:15,763 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4410890.0, ans=0.125 2024-08-19 09:51:19,255 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4410890.0, ans=0.125 2024-08-19 09:51:25,646 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-19 09:51:30,076 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 11000, loss[loss=0.09699, beats_loss=0.0107, ecapa_loss=0.0001592, whisper_loss=0.0847, over 17758.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01037, ecapa_loss=0.000142, whisper_loss=0.09201, over 3970212.43 frames. ], batch size: 76, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:51:35,764 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.90 vs. limit=15.0 2024-08-19 09:51:43,141 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4410990.0, ans=0.125 2024-08-19 09:52:12,344 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.368e+01 2.497e+01 2.768e+01 3.279e+02, threshold=4.993e+01, percent-clipped=1.0 2024-08-19 09:52:26,329 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 15 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-19 09:52:27,807 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4411390.0, ans=0.0 2024-08-19 09:52:32,297 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-19 09:52:38,971 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 11050, loss[loss=0.0987, beats_loss=0.01061, ecapa_loss=0.0001341, whisper_loss=0.08676, over 16484.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01043, ecapa_loss=0.0001417, whisper_loss=0.09123, over 3950933.62 frames. ], batch size: 65, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:52:40,244 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 25 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-19 09:52:52,198 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-19 09:53:14,650 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4411690.0, ans=0.0 2024-08-19 09:53:20,104 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4411790.0, ans=0.2 2024-08-19 09:53:38,229 WARNING [optim.py:496] (2/4) Scaling gradients by 0.06586580723524094, model_norm_threshold=49.93263626098633 2024-08-19 09:53:38,407 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.23, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.337e+05, grad_sumsq=1.337e+05, orig_rms_sq=1.000e+00 2024-08-19 09:53:45,249 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 11100, loss[loss=0.1046, beats_loss=0.009495, ecapa_loss=0.0001499, whisper_loss=0.09362, over 14722.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01044, ecapa_loss=0.0001421, whisper_loss=0.09065, over 3921832.76 frames. ], batch size: 54, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:54:20,673 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.99 vs. limit=22.5 2024-08-19 09:54:26,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4412190.0, ans=0.125 2024-08-19 09:54:28,470 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.940e+01 2.398e+01 2.719e+01 3.101e+01 7.581e+02, threshold=5.438e+01, percent-clipped=4.0 2024-08-19 09:54:34,339 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=4412290.0, ans=22.5 2024-08-19 09:54:46,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4412390.0, ans=0.1 2024-08-19 09:54:52,985 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4412390.0, ans=0.0 2024-08-19 09:54:58,141 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 11150, loss[loss=0.09295, beats_loss=0.009912, ecapa_loss=0.0001428, whisper_loss=0.08161, over 21507.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0104, ecapa_loss=0.0001421, whisper_loss=0.09053, over 3901884.31 frames. ], batch size: 88, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:55:23,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4412590.0, ans=0.125 2024-08-19 09:56:03,379 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-19 09:56:08,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4412990.0, ans=0.125 2024-08-19 09:56:09,786 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 11200, loss[loss=0.1129, beats_loss=0.009644, ecapa_loss=0.0001694, whisper_loss=0.1016, over 21593.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01039, ecapa_loss=0.0001419, whisper_loss=0.09038, over 3873814.41 frames. ], batch size: 90, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:56:13,875 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4412990.0, ans=0.125 2024-08-19 09:56:38,367 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4413190.0, ans=0.125 2024-08-19 09:56:38,462 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4413190.0, ans=0.2 2024-08-19 09:56:39,817 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-19 09:56:49,693 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 18 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-19 09:56:50,878 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.289e+01 2.521e+01 2.778e+01 3.744e+01, threshold=5.043e+01, percent-clipped=0.0 2024-08-19 09:56:54,052 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-19 09:57:01,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4413290.0, ans=0.0 2024-08-19 09:57:05,794 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-19 09:57:11,803 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4413390.0, ans=0.2 2024-08-19 09:57:19,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4413490.0, ans=0.1 2024-08-19 09:57:20,484 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 11250, loss[loss=0.1116, beats_loss=0.008577, ecapa_loss=0.0001486, whisper_loss=0.1015, over 14271.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01041, ecapa_loss=0.0001418, whisper_loss=0.09041, over 3851084.56 frames. ], batch size: 56, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:57:23,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4413490.0, ans=0.0 2024-08-19 09:57:29,462 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4413490.0, ans=0.2 2024-08-19 09:57:51,074 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.27 vs. limit=15.0 2024-08-19 09:57:51,720 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 20 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-19 09:58:07,810 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4413790.0, ans=0.0 2024-08-19 09:58:28,595 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 11300, loss[loss=0.084, beats_loss=0.01068, ecapa_loss=0.0001743, whisper_loss=0.07158, over 19773.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01043, ecapa_loss=0.0001416, whisper_loss=0.09009, over 3855431.05 frames. ], batch size: 87, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:58:33,129 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 09:58:51,609 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4414090.0, ans=0.0 2024-08-19 09:59:07,856 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 09:59:08,502 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.992e+01 2.325e+01 2.531e+01 2.748e+01 4.040e+01, threshold=5.062e+01, percent-clipped=0.0 2024-08-19 09:59:11,789 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.15 vs. limit=22.5 2024-08-19 09:59:15,534 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4414290.0, ans=0.125 2024-08-19 09:59:16,903 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4414290.0, ans=0.035 2024-08-19 09:59:23,343 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-08-19 09:59:27,712 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 30 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-19 09:59:30,980 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4414390.0, ans=0.125 2024-08-19 09:59:34,833 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 11350, loss[loss=0.1141, beats_loss=0.008117, ecapa_loss=0.0001527, whisper_loss=0.1045, over 20324.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01036, ecapa_loss=0.0001416, whisper_loss=0.09043, over 3877319.76 frames. ], batch size: 81, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:59:37,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4414490.0, ans=0.2 2024-08-19 09:59:52,972 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4414590.0, ans=0.125 2024-08-19 10:00:01,855 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4414690.0, ans=0.125 2024-08-19 10:00:05,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4414690.0, ans=0.1 2024-08-19 10:00:37,604 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 11400, loss[loss=0.0891, beats_loss=0.01341, ecapa_loss=0.000128, whisper_loss=0.07441, over 22435.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01038, ecapa_loss=0.0001416, whisper_loss=0.08997, over 3852231.47 frames. ], batch size: 92, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 10:00:40,475 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4414990.0, ans=0.125 2024-08-19 10:00:43,227 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4414990.0, ans=0.0 2024-08-19 10:00:55,599 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4415090.0, ans=0.1 2024-08-19 10:00:56,536 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-19 10:01:15,083 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.359e+01 2.595e+01 2.973e+01 3.861e+01, threshold=5.190e+01, percent-clipped=0.0 2024-08-19 10:01:26,293 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-19 10:01:31,150 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-19 10:01:34,808 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 24 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-19 10:01:36,309 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4415390.0, ans=0.125 2024-08-19 10:01:39,602 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 11450, loss[loss=0.09512, beats_loss=0.01193, ecapa_loss=0.000125, whisper_loss=0.08195, over 18296.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01036, ecapa_loss=0.0001412, whisper_loss=0.09035, over 3857341.78 frames. ], batch size: 73, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 10:01:43,551 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 27 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-19 10:01:49,317 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.69 vs. limit=22.5 2024-08-19 10:02:06,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4415690.0, ans=0.125 2024-08-19 10:02:08,503 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 29 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-19 10:02:25,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4415790.0, ans=0.125 2024-08-19 10:02:35,828 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 17 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-19 10:02:38,982 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.02 vs. limit=15.0 2024-08-19 10:02:41,948 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 11500, loss[loss=0.1167, beats_loss=0.008014, ecapa_loss=0.0001527, whisper_loss=0.1072, over 23347.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01037, ecapa_loss=0.0001403, whisper_loss=0.09059, over 3896594.65 frames. ], batch size: 92, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 10:02:52,304 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4415990.0, ans=0.125 2024-08-19 10:02:52,587 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.39 vs. limit=15.0 2024-08-19 10:02:55,232 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.23 vs. limit=15.0 2024-08-19 10:03:01,081 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4416090.0, ans=0.125 2024-08-19 10:03:03,071 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-19 10:03:15,375 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-19 10:03:18,811 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.451e+01 2.614e+01 2.882e+01 3.813e+01, threshold=5.229e+01, percent-clipped=0.0 2024-08-19 10:03:18,998 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 18 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-19 10:03:27,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4416290.0, ans=0.1 2024-08-19 10:03:29,304 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4416290.0, ans=0.1 2024-08-19 10:03:31,988 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4416390.0, ans=0.2 2024-08-19 10:03:32,250 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.28 vs. limit=22.5 2024-08-19 10:03:43,769 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 11550, loss[loss=0.1156, beats_loss=0.008022, ecapa_loss=0.0001664, whisper_loss=0.1059, over 21410.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01034, ecapa_loss=0.0001409, whisper_loss=0.0912, over 3879046.35 frames. ], batch size: 88, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 10:03:44,185 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4416490.0, ans=0.0 2024-08-19 10:03:51,363 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4416490.0, ans=0.2 2024-08-19 10:04:11,392 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4416690.0, ans=0.1 2024-08-19 10:04:25,982 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 22 from LS+wenet, 26 from Vox, 20 fro AS 2024-08-19 10:04:32,565 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4416890.0, ans=0.0 2024-08-19 10:04:45,841 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 11600, loss[loss=0.1195, beats_loss=0.00937, ecapa_loss=0.000133, whisper_loss=0.1088, over 23665.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0103, ecapa_loss=0.000141, whisper_loss=0.09175, over 3885798.23 frames. ], batch size: 91, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 10:04:55,671 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-19 10:05:11,251 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-19 10:05:17,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4417190.0, ans=0.2 2024-08-19 10:05:21,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4417290.0, ans=0.1 2024-08-19 10:05:22,511 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.308e+01 2.624e+01 2.875e+01 7.013e+01, threshold=5.249e+01, percent-clipped=1.0 2024-08-19 10:05:46,077 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4417390.0, ans=0.125 2024-08-19 10:05:48,082 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 11650, loss[loss=0.08264, beats_loss=0.01142, ecapa_loss=0.0001449, whisper_loss=0.06978, over 14249.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01035, ecapa_loss=0.0001415, whisper_loss=0.09143, over 3898705.22 frames. ], batch size: 57, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:05:58,531 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4417490.0, ans=0.125 2024-08-19 10:06:09,551 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.40 vs. limit=15.0 2024-08-19 10:06:40,753 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-19 10:06:41,972 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 23 from LS+wenet, 25 from Vox, 22 fro AS 2024-08-19 10:06:44,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=4417890.0, ans=22.5 2024-08-19 10:06:48,101 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 25 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-19 10:06:50,599 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 11700, loss[loss=0.09784, beats_loss=0.009858, ecapa_loss=0.0001728, whisper_loss=0.08626, over 19708.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0105, ecapa_loss=0.0001415, whisper_loss=0.09074, over 3907138.26 frames. ], batch size: 79, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:06:54,332 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 30 from LS+wenet, 11 from Vox, 34 fro AS 2024-08-19 10:07:03,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4418090.0, ans=0.0 2024-08-19 10:07:29,312 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.361e+01 2.643e+01 2.921e+01 4.842e+01, threshold=5.287e+01, percent-clipped=0.0 2024-08-19 10:07:32,135 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4418290.0, ans=0.0 2024-08-19 10:07:52,541 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 11750, loss[loss=0.09588, beats_loss=0.01125, ecapa_loss=0.000156, whisper_loss=0.08307, over 21260.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01055, ecapa_loss=0.0001417, whisper_loss=0.09052, over 3911415.66 frames. ], batch size: 89, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:07:58,977 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 36 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-19 10:08:12,605 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-19 10:08:20,441 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.44 vs. limit=15.0 2024-08-19 10:08:31,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4418790.0, ans=0.0 2024-08-19 10:08:31,768 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.25 vs. limit=10.0 2024-08-19 10:08:35,853 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 26 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-19 10:08:36,092 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4418790.0, ans=0.0 2024-08-19 10:08:44,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4418890.0, ans=0.125 2024-08-19 10:08:53,911 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 11800, loss[loss=0.1044, beats_loss=0.006741, ecapa_loss=0.0001562, whisper_loss=0.0961, over 14161.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01048, ecapa_loss=0.0001417, whisper_loss=0.09118, over 3917858.09 frames. ], batch size: 56, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:09:00,732 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4418990.0, ans=0.125 2024-08-19 10:09:16,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4419090.0, ans=0.0 2024-08-19 10:09:19,357 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4419190.0, ans=0.1 2024-08-19 10:09:30,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4419290.0, ans=0.1 2024-08-19 10:09:32,069 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.601e+01 2.354e+01 2.552e+01 2.696e+01 6.400e+01, threshold=5.104e+01, percent-clipped=1.0 2024-08-19 10:09:42,855 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.07 vs. limit=15.0 2024-08-19 10:09:45,810 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 27 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-19 10:09:55,726 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 11850, loss[loss=0.08591, beats_loss=0.01241, ecapa_loss=0.000116, whisper_loss=0.07234, over 16215.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01053, ecapa_loss=0.0001413, whisper_loss=0.09079, over 3916233.52 frames. ], batch size: 62, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:09:56,593 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.28 vs. limit=15.0 2024-08-19 10:10:09,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4419590.0, ans=0.0 2024-08-19 10:10:39,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4419790.0, ans=0.0 2024-08-19 10:10:44,870 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4419890.0, ans=0.125 2024-08-19 10:10:58,361 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 11900, loss[loss=0.1013, beats_loss=0.01062, ecapa_loss=0.0001436, whisper_loss=0.08922, over 16764.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01049, ecapa_loss=0.0001422, whisper_loss=0.09115, over 3931585.67 frames. ], batch size: 66, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:11:05,996 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-19 10:11:07,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4419990.0, ans=0.125 2024-08-19 10:11:22,135 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-19 10:11:30,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4420190.0, ans=0.1 2024-08-19 10:11:33,074 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 21 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-19 10:11:36,736 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.653e+01 2.301e+01 2.576e+01 2.913e+01 4.968e+01, threshold=5.153e+01, percent-clipped=0.0 2024-08-19 10:11:41,134 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4420290.0, ans=0.0 2024-08-19 10:11:46,911 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 20 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-19 10:11:47,113 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4420390.0, ans=0.2 2024-08-19 10:11:58,028 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.21 vs. limit=5.0 2024-08-19 10:12:00,697 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 11950, loss[loss=0.1183, beats_loss=0.01165, ecapa_loss=0.0001075, whisper_loss=0.1055, over 14513.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01044, ecapa_loss=0.0001418, whisper_loss=0.09064, over 3888839.90 frames. ], batch size: 54, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:12:02,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=4420490.0, ans=10.0 2024-08-19 10:12:05,754 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 14 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-19 10:12:07,171 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4420490.0, ans=0.125 2024-08-19 10:12:09,035 WARNING [optim.py:496] (2/4) Scaling gradients by 0.09744685143232346, model_norm_threshold=51.527191162109375 2024-08-19 10:12:09,198 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.20, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.481e+04, grad_sumsq=5.241e+06, orig_rms_sq=1.046e-02 2024-08-19 10:12:09,317 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-19 10:12:11,626 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.63 vs. limit=22.5 2024-08-19 10:12:31,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4420690.0, ans=0.0 2024-08-19 10:12:33,676 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4420690.0, ans=0.1 2024-08-19 10:12:37,296 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4420790.0, ans=0.07 2024-08-19 10:12:45,885 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4420790.0, ans=0.125 2024-08-19 10:12:48,417 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4420790.0, ans=0.1 2024-08-19 10:12:54,143 WARNING [optim.py:496] (2/4) Scaling gradients by 0.08515588939189911, model_norm_threshold=51.527191162109375 2024-08-19 10:12:54,308 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.12, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.362e+04, grad_sumsq=4.362e+04, orig_rms_sq=1.000e+00 2024-08-19 10:12:56,223 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4420890.0, ans=0.0 2024-08-19 10:12:58,042 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.06 vs. limit=15.0 2024-08-19 10:12:58,374 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 21 from LS+wenet, 27 from Vox, 19 fro AS 2024-08-19 10:12:59,756 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4420890.0, ans=0.05 2024-08-19 10:13:02,908 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 12000, loss[loss=0.1162, beats_loss=0.007565, ecapa_loss=0.0001881, whisper_loss=0.1067, over 21572.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01033, ecapa_loss=0.0001431, whisper_loss=0.09092, over 3877640.43 frames. ], batch size: 89, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:13:02,908 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-19 10:13:40,088 INFO [train_multi_KD3.py:1149] (2/4) Epoch 30, validation on ASR_libri: loss=0.2551, beats_loss=0, ecapa_loss=0.0005098, whisper_loss=0.25, over 922467.00 frames. 2024-08-19 10:13:57,403 INFO [train_multi_KD3.py:1149] (2/4) Epoch 30, validation on SV_voxceleb1: loss=0.003986, beats_loss=0, ecapa_loss=0.0003986, whisper_loss=0, over 939242.00 frames. 2024-08-19 10:15:43,703 INFO [train_multi_KD3.py:1149] (2/4) Epoch 30, validation on AT_audioset: loss=0.02302, beats_loss=0.02302, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 10:15:43,706 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31584MB 2024-08-19 10:15:46,188 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 23 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-19 10:15:50,538 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.80 vs. limit=10.0 2024-08-19 10:16:22,828 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.555e+01 2.285e+01 2.523e+01 2.857e+01 6.051e+02, threshold=5.046e+01, percent-clipped=3.0 2024-08-19 10:16:31,098 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.88 vs. limit=22.5 2024-08-19 10:16:34,132 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 37 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-19 10:16:44,269 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4421390.0, ans=0.125 2024-08-19 10:16:46,476 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 12050, loss[loss=0.09934, beats_loss=0.009732, ecapa_loss=0.0001462, whisper_loss=0.08815, over 18889.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01038, ecapa_loss=0.0001424, whisper_loss=0.09041, over 3863179.43 frames. ], batch size: 76, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:16:47,898 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 37 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-19 10:16:51,042 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.85 vs. limit=15.0 2024-08-19 10:16:53,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4421490.0, ans=0.1 2024-08-19 10:16:55,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4421490.0, ans=0.1 2024-08-19 10:16:57,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4421490.0, ans=0.2 2024-08-19 10:17:20,414 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4421690.0, ans=0.0 2024-08-19 10:17:21,313 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 12 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-19 10:17:49,686 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 12100, loss[loss=0.1346, beats_loss=0.005748, ecapa_loss=0.0001432, whisper_loss=0.1274, over 16734.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01037, ecapa_loss=0.0001433, whisper_loss=0.09072, over 3883971.69 frames. ], batch size: 62, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:18:06,033 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-19 10:18:08,431 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 21 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-19 10:18:18,855 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4422190.0, ans=0.1 2024-08-19 10:18:28,090 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.327e+01 2.569e+01 2.945e+01 4.765e+01, threshold=5.137e+01, percent-clipped=0.0 2024-08-19 10:18:31,830 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 24 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-19 10:18:51,763 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 12150, loss[loss=0.1062, beats_loss=0.008284, ecapa_loss=0.000152, whisper_loss=0.09639, over 18083.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01037, ecapa_loss=0.0001427, whisper_loss=0.09073, over 3890575.45 frames. ], batch size: 71, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:18:55,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4422490.0, ans=0.5 2024-08-19 10:18:57,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4422490.0, ans=0.07 2024-08-19 10:18:59,898 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4422490.0, ans=0.125 2024-08-19 10:19:05,055 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.17 vs. limit=15.0 2024-08-19 10:19:06,338 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.60 vs. limit=22.5 2024-08-19 10:19:08,439 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4422590.0, ans=0.125 2024-08-19 10:19:09,622 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4422590.0, ans=0.125 2024-08-19 10:19:13,303 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 37 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-19 10:19:17,070 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 31 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-19 10:19:38,474 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4422790.0, ans=0.1 2024-08-19 10:19:50,145 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.04 vs. limit=15.0 2024-08-19 10:19:54,292 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 12200, loss[loss=0.112, beats_loss=0.01047, ecapa_loss=0.0001356, whisper_loss=0.1002, over 22799.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01038, ecapa_loss=0.0001424, whisper_loss=0.09086, over 3897651.38 frames. ], batch size: 90, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:19:58,483 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 10:20:00,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4422990.0, ans=0.1 2024-08-19 10:20:01,703 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4422990.0, ans=0.125 2024-08-19 10:20:10,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4423090.0, ans=0.1 2024-08-19 10:20:18,582 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.67 vs. limit=15.0 2024-08-19 10:20:24,406 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4423190.0, ans=0.1 2024-08-19 10:20:25,675 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4423190.0, ans=0.0 2024-08-19 10:20:26,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4423190.0, ans=0.125 2024-08-19 10:20:28,128 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4423190.0, ans=0.125 2024-08-19 10:20:30,395 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 33 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-19 10:20:32,463 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.316e+01 2.610e+01 2.989e+01 7.361e+01, threshold=5.220e+01, percent-clipped=1.0 2024-08-19 10:20:35,076 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 19 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-19 10:20:36,472 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 24 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-19 10:20:46,131 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 32 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-19 10:20:49,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4423390.0, ans=0.0 2024-08-19 10:20:52,371 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-19 10:20:55,996 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 12250, loss[loss=0.08073, beats_loss=0.0122, ecapa_loss=0.000162, whisper_loss=0.06691, over 21107.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01044, ecapa_loss=0.0001415, whisper_loss=0.09064, over 3879059.94 frames. ], batch size: 91, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:21:01,032 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 20 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-19 10:21:04,938 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4423490.0, ans=0.125 2024-08-19 10:21:24,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4423690.0, ans=0.125 2024-08-19 10:21:25,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4423690.0, ans=0.125 2024-08-19 10:21:27,052 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-19 10:21:30,777 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-19 10:21:38,294 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 21 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-19 10:21:41,190 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4423790.0, ans=0.125 2024-08-19 10:21:47,685 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4423890.0, ans=0.125 2024-08-19 10:21:52,453 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4423890.0, ans=0.0 2024-08-19 10:21:54,816 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 26 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-19 10:21:58,062 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.98 vs. limit=15.0 2024-08-19 10:21:58,501 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 12300, loss[loss=0.09238, beats_loss=0.0113, ecapa_loss=0.0001316, whisper_loss=0.07976, over 18849.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01045, ecapa_loss=0.0001412, whisper_loss=0.09053, over 3899249.96 frames. ], batch size: 76, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:22:02,440 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 21 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-19 10:22:12,477 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4424090.0, ans=0.0 2024-08-19 10:22:16,217 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 32 from LS+wenet, 15 from Vox, 47 fro AS 2024-08-19 10:22:20,250 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4424090.0, ans=0.0 2024-08-19 10:22:25,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4424190.0, ans=0.0 2024-08-19 10:22:34,576 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.72 vs. limit=22.5 2024-08-19 10:22:38,475 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+01 2.350e+01 2.584e+01 3.104e+01 8.227e+01, threshold=5.169e+01, percent-clipped=2.0 2024-08-19 10:22:51,203 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.16 vs. limit=12.0 2024-08-19 10:22:59,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4424390.0, ans=0.125 2024-08-19 10:23:00,510 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.20 vs. limit=15.0 2024-08-19 10:23:03,567 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 12350, loss[loss=0.09301, beats_loss=0.01093, ecapa_loss=0.0001323, whisper_loss=0.08075, over 17206.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01041, ecapa_loss=0.0001423, whisper_loss=0.09076, over 3873604.94 frames. ], batch size: 69, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:23:04,639 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4424490.0, ans=0.2 2024-08-19 10:23:07,550 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4424490.0, ans=0.125 2024-08-19 10:23:12,476 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.387e+00 2024-08-19 10:23:16,326 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-19 10:23:18,012 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4424590.0, ans=0.1 2024-08-19 10:23:48,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4424790.0, ans=0.125 2024-08-19 10:23:58,985 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-19 10:24:03,570 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 20 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-19 10:24:12,975 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 12400, loss[loss=0.1025, beats_loss=0.01033, ecapa_loss=0.0001383, whisper_loss=0.0908, over 16564.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01041, ecapa_loss=0.0001404, whisper_loss=0.09103, over 3909543.19 frames. ], batch size: 68, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:24:21,227 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4424990.0, ans=0.0 2024-08-19 10:24:29,951 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.87 vs. limit=15.0 2024-08-19 10:24:33,666 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4425090.0, ans=0.0 2024-08-19 10:24:48,887 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 31 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-19 10:24:52,892 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 25 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-19 10:24:56,424 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.733e+01 2.365e+01 2.557e+01 2.842e+01 4.073e+01, threshold=5.113e+01, percent-clipped=0.0 2024-08-19 10:25:06,528 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-19 10:25:06,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4425290.0, ans=0.1 2024-08-19 10:25:15,344 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 20 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-19 10:25:24,593 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 12450, loss[loss=0.1078, beats_loss=0.009954, ecapa_loss=0.0001536, whisper_loss=0.09632, over 19340.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01042, ecapa_loss=0.0001413, whisper_loss=0.09032, over 3933616.79 frames. ], batch size: 81, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:25:40,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4425590.0, ans=0.1 2024-08-19 10:25:44,950 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-19 10:25:50,067 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 20 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-19 10:25:52,194 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4425690.0, ans=0.1 2024-08-19 10:26:00,503 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-19 10:26:17,834 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.62 vs. limit=15.0 2024-08-19 10:26:26,939 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4425890.0, ans=0.0 2024-08-19 10:26:34,553 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 12500, loss[loss=0.08396, beats_loss=0.009433, ecapa_loss=0.000153, whisper_loss=0.073, over 15518.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01045, ecapa_loss=0.0001406, whisper_loss=0.09062, over 3937887.85 frames. ], batch size: 61, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:26:36,766 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4425990.0, ans=0.125 2024-08-19 10:26:53,157 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4426090.0, ans=0.1 2024-08-19 10:26:55,448 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-19 10:26:56,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4426090.0, ans=0.2 2024-08-19 10:26:59,289 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 20 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-19 10:27:00,751 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-19 10:27:08,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4426190.0, ans=0.1 2024-08-19 10:27:10,774 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 20 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-19 10:27:17,644 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.209e+01 2.418e+01 2.656e+01 4.014e+01, threshold=4.837e+01, percent-clipped=0.0 2024-08-19 10:27:21,013 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4426290.0, ans=0.125 2024-08-19 10:27:38,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4426390.0, ans=0.125 2024-08-19 10:27:41,367 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.55 vs. limit=22.5 2024-08-19 10:27:43,645 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 12550, loss[loss=0.103, beats_loss=0.01042, ecapa_loss=0.0001593, whisper_loss=0.091, over 19992.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01043, ecapa_loss=0.0001407, whisper_loss=0.0906, over 3935388.92 frames. ], batch size: 85, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:27:48,813 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.29 vs. limit=15.0 2024-08-19 10:27:49,552 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 25 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-19 10:27:51,755 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.75 vs. limit=12.0 2024-08-19 10:27:58,631 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-19 10:28:04,190 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4426590.0, ans=0.09899494936611666 2024-08-19 10:28:18,190 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 18 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-19 10:28:19,717 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-19 10:28:20,738 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 28 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-19 10:28:38,807 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 14 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-19 10:28:40,310 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 11 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-19 10:28:43,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4426890.0, ans=0.0 2024-08-19 10:28:52,396 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 12600, loss[loss=0.1015, beats_loss=0.008951, ecapa_loss=0.0001142, whisper_loss=0.09141, over 15651.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01047, ecapa_loss=0.0001411, whisper_loss=0.09031, over 3928224.74 frames. ], batch size: 58, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:29:12,848 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 19 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-19 10:29:14,316 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4427090.0, ans=0.1 2024-08-19 10:29:27,504 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4427190.0, ans=0.0 2024-08-19 10:29:28,599 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-19 10:29:30,541 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.65 vs. limit=10.0 2024-08-19 10:29:33,321 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.965e+01 2.274e+01 2.527e+01 2.762e+01 5.696e+01, threshold=5.054e+01, percent-clipped=1.0 2024-08-19 10:29:40,379 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4427290.0, ans=0.0 2024-08-19 10:29:50,600 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4427390.0, ans=0.125 2024-08-19 10:29:55,656 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-19 10:29:58,668 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 12650, loss[loss=0.1126, beats_loss=0.01219, ecapa_loss=0.0001551, whisper_loss=0.09884, over 20639.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01055, ecapa_loss=0.0001411, whisper_loss=0.09035, over 3943060.94 frames. ], batch size: 89, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:30:00,058 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 23 from LS+wenet, 29 from Vox, 42 fro AS 2024-08-19 10:30:10,615 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 29 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-19 10:30:29,060 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4427690.0, ans=0.1 2024-08-19 10:30:36,548 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4427790.0, ans=0.09899494936611666 2024-08-19 10:30:46,145 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 19 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-19 10:30:47,361 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 35 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-19 10:30:52,389 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-19 10:31:01,105 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 16 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-19 10:31:03,557 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 12700, loss[loss=0.0901, beats_loss=0.009113, ecapa_loss=0.0001983, whisper_loss=0.079, over 13769.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01055, ecapa_loss=0.0001408, whisper_loss=0.0903, over 3923571.39 frames. ], batch size: 58, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:31:06,788 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 13 from Vox, 42 fro AS 2024-08-19 10:31:07,998 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 30 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-19 10:31:11,870 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4427990.0, ans=0.125 2024-08-19 10:31:45,158 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.368e+01 2.557e+01 2.871e+01 4.652e+01, threshold=5.113e+01, percent-clipped=0.0 2024-08-19 10:31:47,904 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4428290.0, ans=0.125 2024-08-19 10:31:48,528 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.73 vs. limit=22.5 2024-08-19 10:31:55,202 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.50 vs. limit=15.0 2024-08-19 10:32:05,189 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.25 vs. limit=15.0 2024-08-19 10:32:10,632 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 12750, loss[loss=0.1091, beats_loss=0.01042, ecapa_loss=0.0001285, whisper_loss=0.09737, over 20137.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01055, ecapa_loss=0.0001417, whisper_loss=0.09043, over 3889753.68 frames. ], batch size: 79, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:32:21,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4428490.0, ans=0.125 2024-08-19 10:32:24,118 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4428590.0, ans=0.125 2024-08-19 10:32:31,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4428590.0, ans=0.125 2024-08-19 10:32:35,606 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4428690.0, ans=0.125 2024-08-19 10:32:41,570 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 26 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-19 10:32:46,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4428690.0, ans=0.0 2024-08-19 10:32:48,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4428790.0, ans=0.2 2024-08-19 10:33:05,758 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4428890.0, ans=0.2 2024-08-19 10:33:15,407 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 12800, loss[loss=0.09809, beats_loss=0.01007, ecapa_loss=0.0001376, whisper_loss=0.08665, over 19570.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01061, ecapa_loss=0.000142, whisper_loss=0.09047, over 3902665.15 frames. ], batch size: 78, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:33:22,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4428990.0, ans=0.04949747468305833 2024-08-19 10:33:25,213 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-19 10:33:27,021 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.32 vs. limit=15.0 2024-08-19 10:33:28,575 WARNING [optim.py:496] (2/4) Scaling gradients by 0.025292346253991127, model_norm_threshold=51.13230514526367 2024-08-19 10:33:28,741 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.17, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.119e+05, grad_sumsq=7.119e+05, orig_rms_sq=1.000e+00 2024-08-19 10:33:35,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4429090.0, ans=0.0 2024-08-19 10:33:57,296 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.329e+01 2.594e+01 2.877e+01 2.022e+03, threshold=5.187e+01, percent-clipped=2.0 2024-08-19 10:33:57,469 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-19 10:34:02,785 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-19 10:34:11,096 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 34 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-19 10:34:17,998 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4429390.0, ans=0.1 2024-08-19 10:34:21,567 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 12850, loss[loss=0.0772, beats_loss=0.01093, ecapa_loss=0.0001569, whisper_loss=0.0647, over 14338.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01059, ecapa_loss=0.0001425, whisper_loss=0.08926, over 3845361.44 frames. ], batch size: 59, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:34:23,128 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-19 10:34:31,007 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2024-08-19 10:34:43,929 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4429590.0, ans=0.125 2024-08-19 10:34:45,242 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=4429590.0, ans=0.05 2024-08-19 10:34:47,820 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 24 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-19 10:34:49,598 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 25 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-19 10:35:01,124 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2024-08-19 10:35:08,811 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4429790.0, ans=0.2 2024-08-19 10:35:09,704 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 23 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-19 10:35:29,004 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 12900, loss[loss=0.08929, beats_loss=0.01047, ecapa_loss=0.0001425, whisper_loss=0.07739, over 15647.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01054, ecapa_loss=0.000143, whisper_loss=0.08867, over 3805437.74 frames. ], batch size: 61, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:35:30,926 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.81 vs. limit=10.0 2024-08-19 10:35:34,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4429990.0, ans=0.125 2024-08-19 10:35:43,091 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4430090.0, ans=0.125 2024-08-19 10:35:43,184 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4430090.0, ans=0.0 2024-08-19 10:35:43,193 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4430090.0, ans=0.2 2024-08-19 10:35:59,395 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4430190.0, ans=0.125 2024-08-19 10:36:02,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4430190.0, ans=0.0 2024-08-19 10:36:11,887 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.655e+01 2.251e+01 2.498e+01 2.810e+01 4.118e+01, threshold=4.997e+01, percent-clipped=0.0 2024-08-19 10:36:29,361 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4430390.0, ans=0.125 2024-08-19 10:36:32,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=4430390.0, ans=0.02 2024-08-19 10:36:37,359 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 12950, loss[loss=0.1306, beats_loss=0.006117, ecapa_loss=0.0001633, whisper_loss=0.1229, over 16882.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01041, ecapa_loss=0.0001429, whisper_loss=0.0896, over 3830410.10 frames. ], batch size: 65, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:36:51,542 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.41 vs. limit=15.0 2024-08-19 10:37:15,979 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4430690.0, ans=0.2 2024-08-19 10:37:18,877 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4430790.0, ans=0.0 2024-08-19 10:37:38,105 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 21 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-19 10:37:45,773 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 13000, loss[loss=0.1024, beats_loss=0.008628, ecapa_loss=0.000174, whisper_loss=0.09206, over 19719.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01036, ecapa_loss=0.0001425, whisper_loss=0.09045, over 3844119.08 frames. ], batch size: 80, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:38:07,491 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 10:38:22,184 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4431190.0, ans=0.125 2024-08-19 10:38:29,946 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.790e+01 2.291e+01 2.501e+01 2.762e+01 5.240e+01, threshold=5.001e+01, percent-clipped=1.0 2024-08-19 10:38:54,509 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 13050, loss[loss=0.08139, beats_loss=0.008923, ecapa_loss=0.0001362, whisper_loss=0.0711, over 13887.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01036, ecapa_loss=0.0001419, whisper_loss=0.09077, over 3831170.05 frames. ], batch size: 54, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:39:07,116 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4431490.0, ans=0.07 2024-08-19 10:39:09,767 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-19 10:39:30,157 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4431690.0, ans=0.0 2024-08-19 10:39:42,929 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4431790.0, ans=0.035 2024-08-19 10:39:46,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4431790.0, ans=0.0 2024-08-19 10:39:49,996 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 21 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-19 10:39:50,614 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.69 vs. limit=15.0 2024-08-19 10:39:54,014 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 23 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-19 10:40:00,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4431890.0, ans=0.125 2024-08-19 10:40:05,010 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 18 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-19 10:40:06,800 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 13100, loss[loss=0.07444, beats_loss=0.01103, ecapa_loss=0.0001283, whisper_loss=0.06212, over 22191.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01039, ecapa_loss=0.000142, whisper_loss=0.09054, over 3828540.14 frames. ], batch size: 88, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:40:08,536 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4431990.0, ans=0.0 2024-08-19 10:40:20,709 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-19 10:40:23,965 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.38 vs. limit=22.5 2024-08-19 10:40:25,142 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4432090.0, ans=0.0 2024-08-19 10:40:26,305 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-19 10:40:28,044 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 24 from LS+wenet, 16 from Vox, 16 fro AS 2024-08-19 10:40:32,807 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.73 vs. limit=15.0 2024-08-19 10:40:51,946 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.322e+01 2.530e+01 2.794e+01 4.175e+01, threshold=5.059e+01, percent-clipped=0.0 2024-08-19 10:40:57,441 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 18 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-19 10:41:15,884 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.03 vs. limit=15.0 2024-08-19 10:41:16,659 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 22 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-19 10:41:17,792 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 13150, loss[loss=0.1203, beats_loss=0.008826, ecapa_loss=0.0001469, whisper_loss=0.11, over 15491.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01042, ecapa_loss=0.0001414, whisper_loss=0.09094, over 3833761.51 frames. ], batch size: 58, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:41:22,036 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 17 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-19 10:41:28,902 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4432490.0, ans=0.1 2024-08-19 10:41:32,941 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.38 vs. limit=15.0 2024-08-19 10:41:53,740 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4432690.0, ans=0.125 2024-08-19 10:42:08,620 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.32 vs. limit=15.0 2024-08-19 10:42:28,869 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 13200, loss[loss=0.1208, beats_loss=0.008885, ecapa_loss=0.0001582, whisper_loss=0.1104, over 22379.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01045, ecapa_loss=0.0001419, whisper_loss=0.09052, over 3826543.48 frames. ], batch size: 92, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:42:41,391 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-19 10:42:48,674 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-19 10:42:48,989 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=4433090.0, ans=0.05 2024-08-19 10:42:51,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4433090.0, ans=0.125 2024-08-19 10:42:55,202 WARNING [optim.py:496] (2/4) Scaling gradients by 0.07873938977718353, model_norm_threshold=50.591163635253906 2024-08-19 10:42:55,367 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.905e+04, grad_sumsq=5.905e+04, orig_rms_sq=1.000e+00 2024-08-19 10:43:02,559 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 15 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-19 10:43:13,599 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.266e+01 2.510e+01 2.814e+01 6.425e+02, threshold=5.020e+01, percent-clipped=2.0 2024-08-19 10:43:16,613 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-19 10:43:26,133 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-19 10:43:39,375 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4433390.0, ans=0.2 2024-08-19 10:43:41,522 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 13250, loss[loss=0.1059, beats_loss=0.008571, ecapa_loss=0.0001674, whisper_loss=0.09567, over 16493.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01044, ecapa_loss=0.0001419, whisper_loss=0.09101, over 3815083.97 frames. ], batch size: 65, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:43:43,668 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 10:43:46,175 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4433490.0, ans=0.125 2024-08-19 10:43:47,352 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 19 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-19 10:44:00,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4433590.0, ans=0.125 2024-08-19 10:44:53,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4433990.0, ans=0.1 2024-08-19 10:44:54,444 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 13300, loss[loss=0.08673, beats_loss=0.009597, ecapa_loss=0.000116, whisper_loss=0.07598, over 15933.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01046, ecapa_loss=0.000141, whisper_loss=0.0905, over 3836332.22 frames. ], batch size: 60, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:45:00,332 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4433990.0, ans=0.05 2024-08-19 10:45:01,918 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4433990.0, ans=0.125 2024-08-19 10:45:02,005 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.918e+01 2024-08-19 10:45:06,625 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-19 10:45:08,068 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-19 10:45:28,732 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 29 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-19 10:45:31,988 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4434190.0, ans=0.125 2024-08-19 10:45:33,060 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 34 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-19 10:45:42,573 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.762e+01 2.304e+01 2.512e+01 2.762e+01 6.116e+01, threshold=5.024e+01, percent-clipped=2.0 2024-08-19 10:45:55,015 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.60 vs. limit=15.0 2024-08-19 10:45:56,612 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4434390.0, ans=0.04949747468305833 2024-08-19 10:45:59,679 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4434390.0, ans=0.0 2024-08-19 10:46:00,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4434390.0, ans=0.125 2024-08-19 10:46:04,149 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4434390.0, ans=0.0 2024-08-19 10:46:04,206 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4434390.0, ans=0.2 2024-08-19 10:46:09,131 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 13350, loss[loss=0.1253, beats_loss=0.009147, ecapa_loss=0.000151, whisper_loss=0.1147, over 23827.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0105, ecapa_loss=0.0001411, whisper_loss=0.08981, over 3857970.45 frames. ], batch size: 93, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:46:09,902 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.85 vs. limit=22.5 2024-08-19 10:46:36,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4434590.0, ans=0.1 2024-08-19 10:46:54,671 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-19 10:47:00,503 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 37 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-19 10:47:16,707 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.93 vs. limit=22.5 2024-08-19 10:47:21,302 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 13400, loss[loss=0.08727, beats_loss=0.01158, ecapa_loss=0.0001161, whisper_loss=0.07453, over 14312.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01048, ecapa_loss=0.000141, whisper_loss=0.08992, over 3858864.66 frames. ], batch size: 56, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:47:22,042 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.76 vs. limit=15.0 2024-08-19 10:47:25,319 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 20 from LS+wenet, 18 from Vox, 55 fro AS 2024-08-19 10:47:29,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4434990.0, ans=0.125 2024-08-19 10:47:52,864 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 19 from LS+wenet, 29 from Vox, 44 fro AS 2024-08-19 10:47:56,503 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.11 vs. limit=22.5 2024-08-19 10:47:57,195 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-19 10:47:58,447 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-19 10:47:58,706 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 10:48:01,950 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4435290.0, ans=0.125 2024-08-19 10:48:05,887 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.287e+01 2.514e+01 2.765e+01 5.474e+01, threshold=5.028e+01, percent-clipped=1.0 2024-08-19 10:48:31,444 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 13450, loss[loss=0.1023, beats_loss=0.009346, ecapa_loss=0.0001518, whisper_loss=0.09141, over 17443.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0105, ecapa_loss=0.0001412, whisper_loss=0.08966, over 3856489.10 frames. ], batch size: 69, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:49:06,388 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.288e+01 2024-08-19 10:49:13,189 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-19 10:49:14,760 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4435790.0, ans=0.125 2024-08-19 10:49:19,998 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4435790.0, ans=0.125 2024-08-19 10:49:40,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4435990.0, ans=0.0 2024-08-19 10:49:41,220 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 13500, loss[loss=0.1072, beats_loss=0.008242, ecapa_loss=0.0001618, whisper_loss=0.09738, over 21722.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01054, ecapa_loss=0.0001417, whisper_loss=0.08957, over 3868947.39 frames. ], batch size: 88, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:50:04,637 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-19 10:50:05,824 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 37 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-19 10:50:07,748 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.80 vs. limit=15.0 2024-08-19 10:50:24,354 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.393e+01 2.618e+01 2.905e+01 4.660e+01, threshold=5.237e+01, percent-clipped=0.0 2024-08-19 10:50:30,212 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4436290.0, ans=0.125 2024-08-19 10:50:32,610 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4436290.0, ans=0.125 2024-08-19 10:50:37,750 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 15 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-19 10:50:47,001 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4436490.0, ans=0.0 2024-08-19 10:50:47,736 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 13550, loss[loss=0.1018, beats_loss=0.01026, ecapa_loss=0.0001361, whisper_loss=0.09019, over 20747.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01051, ecapa_loss=0.0001414, whisper_loss=0.08943, over 3843168.56 frames. ], batch size: 83, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:50:50,416 WARNING [optim.py:496] (2/4) Scaling gradients by 0.0421869195997715, model_norm_threshold=52.36887741088867 2024-08-19 10:50:50,581 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.15, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.302e+05, grad_sumsq=2.302e+05, orig_rms_sq=1.000e+00 2024-08-19 10:50:55,148 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 36 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-19 10:50:56,353 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 21 from LS+wenet, 32 from Vox, 38 fro AS 2024-08-19 10:50:58,258 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.99 vs. limit=15.0 2024-08-19 10:51:02,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4436590.0, ans=0.0 2024-08-19 10:51:11,287 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4436590.0, ans=0.125 2024-08-19 10:51:11,299 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4436590.0, ans=0.0 2024-08-19 10:51:15,189 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4436690.0, ans=0.125 2024-08-19 10:51:22,472 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4436690.0, ans=0.125 2024-08-19 10:51:36,516 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4436790.0, ans=0.1 2024-08-19 10:51:45,526 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=31.72 vs. limit=22.5 2024-08-19 10:51:55,148 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 13600, loss[loss=0.1106, beats_loss=0.01019, ecapa_loss=0.0001163, whisper_loss=0.09929, over 21645.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01051, ecapa_loss=0.0001414, whisper_loss=0.08953, over 3852572.75 frames. ], batch size: 81, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:52:00,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4436990.0, ans=0.1 2024-08-19 10:52:09,608 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.20 vs. limit=22.5 2024-08-19 10:52:26,478 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 31 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-19 10:52:29,391 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-19 10:52:29,946 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.78 vs. limit=22.5 2024-08-19 10:52:35,397 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 32 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-19 10:52:39,017 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.971e+01 2.324e+01 2.607e+01 2.904e+01 1.241e+03, threshold=5.213e+01, percent-clipped=1.0 2024-08-19 10:52:49,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4437390.0, ans=0.125 2024-08-19 10:53:02,340 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-19 10:53:03,390 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 13650, loss[loss=0.1059, beats_loss=0.01205, ecapa_loss=0.0001405, whisper_loss=0.0924, over 22612.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01053, ecapa_loss=0.0001412, whisper_loss=0.08996, over 3868721.63 frames. ], batch size: 92, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:53:21,710 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 21 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-19 10:53:30,706 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 26 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-19 10:53:30,923 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4437690.0, ans=0.125 2024-08-19 10:53:36,650 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4437690.0, ans=0.1 2024-08-19 10:53:40,515 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 23 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-19 10:53:55,287 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4437790.0, ans=0.125 2024-08-19 10:53:59,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4437890.0, ans=0.125 2024-08-19 10:53:59,965 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.705e+01 2024-08-19 10:54:08,132 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4437890.0, ans=0.0 2024-08-19 10:54:11,780 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-19 10:54:14,113 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 13700, loss[loss=0.1089, beats_loss=0.008663, ecapa_loss=0.0001258, whisper_loss=0.09903, over 17265.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01046, ecapa_loss=0.0001409, whisper_loss=0.09081, over 3901969.24 frames. ], batch size: 63, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:54:14,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=4437990.0, ans=0.02 2024-08-19 10:54:21,346 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-19 10:54:23,018 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4437990.0, ans=0.125 2024-08-19 10:54:40,802 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.381e+01 2024-08-19 10:54:51,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4438190.0, ans=0.1 2024-08-19 10:54:58,262 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4438190.0, ans=0.125 2024-08-19 10:54:59,646 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4438290.0, ans=0.125 2024-08-19 10:55:01,833 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.71 vs. limit=12.0 2024-08-19 10:55:04,002 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.299e+01 2.529e+01 2.834e+01 4.971e+01, threshold=5.058e+01, percent-clipped=0.0 2024-08-19 10:55:15,688 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-19 10:55:21,276 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-19 10:55:30,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4438390.0, ans=0.2 2024-08-19 10:55:36,421 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 13750, loss[loss=0.08509, beats_loss=0.009991, ecapa_loss=0.0001532, whisper_loss=0.07357, over 17994.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01047, ecapa_loss=0.0001417, whisper_loss=0.09037, over 3901773.72 frames. ], batch size: 70, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:55:38,403 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 33 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-19 10:55:57,717 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4438590.0, ans=0.0 2024-08-19 10:56:05,275 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4438590.0, ans=0.0 2024-08-19 10:56:38,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4438790.0, ans=0.0 2024-08-19 10:57:13,437 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 13800, loss[loss=0.1057, beats_loss=0.008653, ecapa_loss=0.0001261, whisper_loss=0.09577, over 19792.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01047, ecapa_loss=0.0001418, whisper_loss=0.08996, over 3894495.80 frames. ], batch size: 75, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:57:17,584 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-19 10:57:39,388 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=4439090.0, ans=15.0 2024-08-19 10:57:59,308 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-19 10:58:10,789 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.280e+01 2.516e+01 2.824e+01 6.653e+01, threshold=5.033e+01, percent-clipped=2.0 2024-08-19 10:58:18,905 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2024-08-19 10:58:30,860 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4439390.0, ans=0.125 2024-08-19 10:58:32,714 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4439390.0, ans=0.2 2024-08-19 10:58:37,328 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-19 10:58:40,805 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 13850, loss[loss=0.08925, beats_loss=0.009447, ecapa_loss=0.0001524, whisper_loss=0.07828, over 16001.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01042, ecapa_loss=0.0001422, whisper_loss=0.09006, over 3901982.25 frames. ], batch size: 66, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:58:40,951 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 23 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-19 10:58:52,704 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4439490.0, ans=0.0 2024-08-19 10:58:57,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4439590.0, ans=0.0 2024-08-19 10:59:03,376 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4439590.0, ans=0.125 2024-08-19 10:59:04,588 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-19 10:59:05,545 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.74 vs. limit=15.0 2024-08-19 10:59:08,335 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4439590.0, ans=0.0 2024-08-19 10:59:12,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4439690.0, ans=0.125 2024-08-19 10:59:23,769 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.02 vs. limit=15.0 2024-08-19 11:00:05,831 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 13900, loss[loss=0.11, beats_loss=0.01204, ecapa_loss=0.0001089, whisper_loss=0.09687, over 23548.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01044, ecapa_loss=0.0001424, whisper_loss=0.08981, over 3882539.28 frames. ], batch size: 93, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:00:22,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4439990.0, ans=0.125 2024-08-19 11:00:46,520 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 21 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-19 11:01:03,246 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.454e+01 2.690e+01 3.083e+01 4.559e+01, threshold=5.380e+01, percent-clipped=0.0 2024-08-19 11:01:03,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4440290.0, ans=0.1 2024-08-19 11:01:10,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4440290.0, ans=0.0 2024-08-19 11:01:20,194 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4440390.0, ans=0.0 2024-08-19 11:01:29,885 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-19 11:01:31,135 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 13950, loss[loss=0.1065, beats_loss=0.01088, ecapa_loss=0.0001329, whisper_loss=0.09431, over 17776.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01034, ecapa_loss=0.0001425, whisper_loss=0.09065, over 3901044.13 frames. ], batch size: 66, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:01:42,681 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 13 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-19 11:02:05,829 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4440690.0, ans=0.125 2024-08-19 11:02:50,963 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 14000, loss[loss=0.08418, beats_loss=0.009615, ecapa_loss=0.0001374, whisper_loss=0.07319, over 15154.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01042, ecapa_loss=0.0001411, whisper_loss=0.09015, over 3910153.21 frames. ], batch size: 62, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:02:57,878 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-19 11:03:22,546 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4441090.0, ans=0.0 2024-08-19 11:03:49,621 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+01 2.282e+01 2.482e+01 2.793e+01 5.797e+01, threshold=4.965e+01, percent-clipped=1.0 2024-08-19 11:04:20,283 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.48 vs. limit=15.0 2024-08-19 11:04:25,769 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 14050, loss[loss=0.09924, beats_loss=0.01142, ecapa_loss=0.0001267, whisper_loss=0.08655, over 19328.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0104, ecapa_loss=0.0001392, whisper_loss=0.09084, over 3903079.04 frames. ], batch size: 77, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:04:30,487 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4441490.0, ans=0.125 2024-08-19 11:04:44,183 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 19 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-19 11:04:47,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4441590.0, ans=0.2 2024-08-19 11:04:47,278 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4441590.0, ans=0.125 2024-08-19 11:04:55,091 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-19 11:05:12,870 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.26 vs. limit=15.0 2024-08-19 11:05:16,598 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 24 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-19 11:05:26,921 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.75 vs. limit=15.0 2024-08-19 11:05:34,211 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 14 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-19 11:05:48,417 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 20 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-19 11:05:49,015 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.35 vs. limit=15.0 2024-08-19 11:05:49,885 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 14100, loss[loss=0.09446, beats_loss=0.01237, ecapa_loss=0.0001339, whisper_loss=0.08076, over 19064.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01046, ecapa_loss=0.0001406, whisper_loss=0.09016, over 3834173.81 frames. ], batch size: 75, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:05:57,576 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-19 11:06:01,433 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4441990.0, ans=0.05 2024-08-19 11:06:06,914 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4442090.0, ans=0.0 2024-08-19 11:06:12,957 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4442090.0, ans=0.0 2024-08-19 11:06:22,950 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=4442090.0, ans=0.025 2024-08-19 11:06:30,661 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.45 vs. limit=10.0 2024-08-19 11:06:32,435 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4442190.0, ans=0.125 2024-08-19 11:06:34,095 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4442190.0, ans=0.0 2024-08-19 11:06:47,653 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.904e+01 2.310e+01 2.577e+01 2.983e+01 3.825e+01, threshold=5.154e+01, percent-clipped=0.0 2024-08-19 11:07:13,069 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-19 11:07:25,250 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 14150, loss[loss=0.0937, beats_loss=0.01266, ecapa_loss=0.0001452, whisper_loss=0.07959, over 21898.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01056, ecapa_loss=0.0001397, whisper_loss=0.08969, over 3891648.38 frames. ], batch size: 90, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:07:32,405 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 35 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-19 11:07:50,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4442590.0, ans=0.125 2024-08-19 11:07:54,416 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-19 11:08:49,801 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4442890.0, ans=0.0 2024-08-19 11:08:52,550 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 14200, loss[loss=0.1127, beats_loss=0.01062, ecapa_loss=0.0001511, whisper_loss=0.1005, over 21325.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0105, ecapa_loss=0.0001405, whisper_loss=0.08997, over 3894600.16 frames. ], batch size: 86, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:08:55,159 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.72 vs. limit=22.5 2024-08-19 11:09:01,836 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=4442990.0, ans=0.02 2024-08-19 11:09:43,680 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 18 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-19 11:09:50,399 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.637e+01 2.398e+01 2.635e+01 3.009e+01 4.372e+01, threshold=5.270e+01, percent-clipped=0.0 2024-08-19 11:10:27,894 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 14250, loss[loss=0.1041, beats_loss=0.01017, ecapa_loss=0.0001487, whisper_loss=0.09247, over 23362.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01041, ecapa_loss=0.0001396, whisper_loss=0.09034, over 3887946.01 frames. ], batch size: 93, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:10:59,871 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4443590.0, ans=0.2 2024-08-19 11:11:07,086 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 37 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-19 11:11:37,924 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4443790.0, ans=0.1 2024-08-19 11:11:39,050 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 18 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-19 11:11:57,622 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4443990.0, ans=0.0 2024-08-19 11:11:58,572 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 14300, loss[loss=0.112, beats_loss=0.006354, ecapa_loss=0.0001161, whisper_loss=0.1045, over 17717.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01043, ecapa_loss=0.0001383, whisper_loss=0.08994, over 3879842.06 frames. ], batch size: 65, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:12:12,436 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 15 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-19 11:12:14,527 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 11:12:18,554 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 22 from LS+wenet, 25 from Vox, 47 fro AS 2024-08-19 11:12:28,627 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-19 11:12:49,998 WARNING [optim.py:496] (2/4) Scaling gradients by 0.05855522304773331, model_norm_threshold=52.698848724365234 2024-08-19 11:12:50,161 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.25, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.023e+05, grad_sumsq=2.023e+05, orig_rms_sq=1.000e+00 2024-08-19 11:12:56,610 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.271e+01 2.635e+01 2.896e+01 9.000e+02, threshold=5.270e+01, percent-clipped=1.0 2024-08-19 11:12:58,437 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-19 11:13:19,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4444390.0, ans=0.0 2024-08-19 11:13:25,818 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4444390.0, ans=0.125 2024-08-19 11:13:29,807 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 14350, loss[loss=0.1097, beats_loss=0.01255, ecapa_loss=0.0001123, whisper_loss=0.09605, over 23481.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01044, ecapa_loss=0.0001378, whisper_loss=0.09018, over 3874570.83 frames. ], batch size: 90, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:13:35,027 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 11:13:39,151 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 33 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-19 11:13:39,673 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=4444490.0, ans=0.5 2024-08-19 11:13:49,648 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.55 vs. limit=15.0 2024-08-19 11:13:53,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4444590.0, ans=0.0 2024-08-19 11:13:53,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4444590.0, ans=0.125 2024-08-19 11:14:06,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4444690.0, ans=0.0 2024-08-19 11:14:07,282 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.92 vs. limit=10.0 2024-08-19 11:14:50,746 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4444890.0, ans=0.125 2024-08-19 11:14:57,050 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4444890.0, ans=0.2 2024-08-19 11:15:04,209 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 14400, loss[loss=0.09471, beats_loss=0.01153, ecapa_loss=0.0001555, whisper_loss=0.08162, over 22916.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01046, ecapa_loss=0.00014, whisper_loss=0.0901, over 3868417.02 frames. ], batch size: 96, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:15:09,004 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 28 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-19 11:15:11,140 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 26 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-19 11:15:25,487 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 24 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-19 11:15:29,823 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4445090.0, ans=0.125 2024-08-19 11:15:47,521 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 19 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-19 11:15:52,779 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 28 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-19 11:15:57,399 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4445290.0, ans=0.125 2024-08-19 11:16:00,040 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-19 11:16:01,339 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+01 2.334e+01 2.545e+01 2.906e+01 1.418e+02, threshold=5.090e+01, percent-clipped=1.0 2024-08-19 11:16:09,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4445290.0, ans=0.125 2024-08-19 11:16:18,876 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-19 11:16:27,823 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4445390.0, ans=0.1 2024-08-19 11:16:35,729 INFO [train_multi_KD3.py:1116] (2/4) Epoch 30, batch 14450, loss[loss=0.09555, beats_loss=0.01147, ecapa_loss=0.0001672, whisper_loss=0.08241, over 22330.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01047, ecapa_loss=0.0001404, whisper_loss=0.09012, over 3891736.67 frames. ], batch size: 92, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:16:52,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=4445590.0, ans=0.05 2024-08-19 11:16:55,403 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 28 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-19 11:17:10,530 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.50 vs. limit=12.0 2024-08-19 11:17:14,068 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-19 11:17:18,515 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4445690.0, ans=0.0 2024-08-19 11:17:42,625 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 24 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-19 11:18:30,849 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 0, loss[loss=0.09595, beats_loss=0.00976, ecapa_loss=0.000143, whisper_loss=0.08476, over 16820.00 frames. ], tot_loss[loss=0.09595, beats_loss=0.00976, ecapa_loss=0.000143, whisper_loss=0.08476, over 16820.00 frames. ], batch size: 66, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:18:30,849 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-19 11:19:11,974 INFO [train_multi_KD3.py:1149] (2/4) Epoch 31, validation on ASR_libri: loss=0.2529, beats_loss=0, ecapa_loss=0.0005129, whisper_loss=0.2478, over 922467.00 frames. 2024-08-19 11:19:31,597 INFO [train_multi_KD3.py:1149] (2/4) Epoch 31, validation on SV_voxceleb1: loss=0.003975, beats_loss=0, ecapa_loss=0.0003975, whisper_loss=0, over 939242.00 frames. 2024-08-19 11:20:32,651 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.1541, 2.0532, 2.0284, 2.3909], device='cuda:2') 2024-08-19 11:20:34,947 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.5488, 2.6449, 2.6650, 2.4229], device='cuda:2') 2024-08-19 11:20:58,126 INFO [train_multi_KD3.py:1149] (2/4) Epoch 31, validation on AT_audioset: loss=0.02297, beats_loss=0.02297, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 11:20:58,129 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31584MB 2024-08-19 11:21:22,861 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4445890.0, ans=0.0 2024-08-19 11:21:58,044 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.88 vs. limit=15.0 2024-08-19 11:22:22,386 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 29 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-19 11:22:53,611 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4446090.0, ans=0.125 2024-08-19 11:23:16,857 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 17 from LS+wenet, 25 from Vox, 49 fro AS 2024-08-19 11:23:27,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4446190.0, ans=0.125 2024-08-19 11:23:39,434 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 24 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-19 11:24:18,263 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.992e+01 2.403e+01 2.773e+01 3.093e+01 8.282e+01, threshold=5.547e+01, percent-clipped=1.0 2024-08-19 11:24:55,634 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 50, loss[loss=0.08969, beats_loss=0.009256, ecapa_loss=0.0001563, whisper_loss=0.07887, over 16640.00 frames. ], tot_loss[loss=0.09831, beats_loss=0.00967, ecapa_loss=0.0001471, whisper_loss=0.08717, over 879325.21 frames. ], batch size: 66, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:26:17,184 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.77 vs. limit=15.0 2024-08-19 11:26:59,162 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2024-08-19 11:26:59,308 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.44 vs. limit=15.0 2024-08-19 11:27:29,208 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-19 11:27:38,808 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4446690.0, ans=0.125 2024-08-19 11:28:04,801 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.11 vs. limit=15.0 2024-08-19 11:28:32,045 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4446890.0, ans=0.125 2024-08-19 11:28:34,182 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 100, loss[loss=0.1145, beats_loss=0.007371, ecapa_loss=0.0001312, whisper_loss=0.1058, over 20136.00 frames. ], tot_loss[loss=0.09721, beats_loss=0.00968, ecapa_loss=0.0001436, whisper_loss=0.08609, over 1525515.66 frames. ], batch size: 73, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:28:39,994 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.524e-01 2024-08-19 11:28:47,762 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 29 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-19 11:28:52,029 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-19 11:29:00,171 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 38 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-19 11:29:28,074 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 26 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-19 11:29:44,442 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4447090.0, ans=0.0 2024-08-19 11:30:10,234 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-19 11:30:21,155 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.140e+01 2.553e+01 2.874e+01 3.336e+01 1.667e+02, threshold=5.748e+01, percent-clipped=2.0 2024-08-19 11:30:31,488 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4447290.0, ans=0.0 2024-08-19 11:30:39,349 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 150, loss[loss=0.09574, beats_loss=0.009985, ecapa_loss=0.0001621, whisper_loss=0.08413, over 21204.00 frames. ], tot_loss[loss=0.09853, beats_loss=0.009515, ecapa_loss=0.0001444, whisper_loss=0.08757, over 2028583.31 frames. ], batch size: 89, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:30:43,821 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 22 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-19 11:30:59,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4447490.0, ans=0.125 2024-08-19 11:31:09,640 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4447490.0, ans=0.1 2024-08-19 11:31:18,436 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 35 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-19 11:31:32,306 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-19 11:31:46,458 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4447690.0, ans=0.0 2024-08-19 11:32:17,627 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 200, loss[loss=0.09084, beats_loss=0.01015, ecapa_loss=0.0001953, whisper_loss=0.07874, over 19902.00 frames. ], tot_loss[loss=0.09991, beats_loss=0.009608, ecapa_loss=0.0001437, whisper_loss=0.08886, over 2424975.29 frames. ], batch size: 89, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:32:33,234 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 25 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-19 11:32:36,460 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4447990.0, ans=0.0 2024-08-19 11:32:37,817 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=4447990.0, ans=0.05 2024-08-19 11:33:09,244 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 31 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-19 11:33:21,941 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-19 11:33:24,407 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4448190.0, ans=0.125 2024-08-19 11:33:32,443 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.387e+01 2.633e+01 3.004e+01 1.700e+02, threshold=5.266e+01, percent-clipped=1.0 2024-08-19 11:33:37,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4448290.0, ans=0.125 2024-08-19 11:33:48,396 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 250, loss[loss=0.1201, beats_loss=0.008475, ecapa_loss=0.0001378, whisper_loss=0.1102, over 23441.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.009745, ecapa_loss=0.0001443, whisper_loss=0.08945, over 2730375.40 frames. ], batch size: 88, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:34:11,278 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4448490.0, ans=0.1 2024-08-19 11:34:24,103 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 12 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-19 11:34:44,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4448690.0, ans=0.125 2024-08-19 11:34:45,885 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4448690.0, ans=0.125 2024-08-19 11:34:49,371 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4448690.0, ans=0.125 2024-08-19 11:35:05,580 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-19 11:35:07,758 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 15 from LS+wenet, 25 from Vox, 47 fro AS 2024-08-19 11:35:17,458 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 300, loss[loss=0.101, beats_loss=0.009079, ecapa_loss=0.0001538, whisper_loss=0.09035, over 22562.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01004, ecapa_loss=0.0001429, whisper_loss=0.08865, over 2970756.99 frames. ], batch size: 88, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:35:35,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4448990.0, ans=0.025 2024-08-19 11:35:45,988 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-19 11:35:56,639 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4449090.0, ans=0.125 2024-08-19 11:36:18,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4449190.0, ans=0.0 2024-08-19 11:36:23,317 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4449190.0, ans=0.125 2024-08-19 11:36:24,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4449190.0, ans=0.125 2024-08-19 11:36:32,050 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.740e+01 2.162e+01 2.418e+01 2.637e+01 1.048e+02, threshold=4.837e+01, percent-clipped=1.0 2024-08-19 11:36:45,203 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 350, loss[loss=0.09873, beats_loss=0.0108, ecapa_loss=0.0001423, whisper_loss=0.08651, over 21152.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01008, ecapa_loss=0.0001433, whisper_loss=0.08895, over 3134481.38 frames. ], batch size: 84, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:36:45,363 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 30 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-19 11:36:47,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4449390.0, ans=0.125 2024-08-19 11:36:56,226 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4449390.0, ans=0.0 2024-08-19 11:36:57,796 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 17 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-19 11:37:19,803 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4449590.0, ans=0.125 2024-08-19 11:37:23,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4449590.0, ans=0.2 2024-08-19 11:37:37,306 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4449690.0, ans=0.0 2024-08-19 11:37:58,454 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.27 vs. limit=22.5 2024-08-19 11:38:15,409 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 400, loss[loss=0.08054, beats_loss=0.01353, ecapa_loss=9.706e-05, whisper_loss=0.06603, over 23510.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.0101, ecapa_loss=0.0001428, whisper_loss=0.0892, over 3275813.47 frames. ], batch size: 90, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:38:31,661 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 11:38:37,136 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 20 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-19 11:38:47,478 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 21 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-19 11:38:51,990 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4450090.0, ans=0.0 2024-08-19 11:39:02,011 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=4450090.0, ans=0.05 2024-08-19 11:39:06,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4450090.0, ans=0.0 2024-08-19 11:39:13,109 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.90 vs. limit=15.0 2024-08-19 11:39:24,684 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4450190.0, ans=0.1 2024-08-19 11:39:27,758 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-19 11:39:32,656 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.597e+01 2.340e+01 2.668e+01 2.900e+01 1.466e+02, threshold=5.335e+01, percent-clipped=2.0 2024-08-19 11:39:33,333 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-19 11:39:46,328 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 450, loss[loss=0.1013, beats_loss=0.009914, ecapa_loss=0.0001395, whisper_loss=0.09003, over 21950.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01008, ecapa_loss=0.0001439, whisper_loss=0.08976, over 3395620.55 frames. ], batch size: 85, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:40:31,191 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 21 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-19 11:40:35,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4450590.0, ans=0.125 2024-08-19 11:40:44,054 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 15 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-19 11:41:03,082 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 24 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-19 11:41:09,688 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.63 vs. limit=10.0 2024-08-19 11:41:14,687 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 500, loss[loss=0.08409, beats_loss=0.01171, ecapa_loss=0.0001358, whisper_loss=0.07102, over 22167.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01015, ecapa_loss=0.0001419, whisper_loss=0.08902, over 3497402.52 frames. ], batch size: 88, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:41:16,701 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-19 11:41:34,326 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-19 11:41:39,841 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4450990.0, ans=0.125 2024-08-19 11:41:43,739 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4450990.0, ans=0.125 2024-08-19 11:41:45,927 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4450990.0, ans=0.0 2024-08-19 11:41:48,846 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4451090.0, ans=0.05 2024-08-19 11:41:56,896 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-19 11:42:31,924 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.295e+01 2.524e+01 2.813e+01 3.428e+01, threshold=5.047e+01, percent-clipped=0.0 2024-08-19 11:42:37,515 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4451290.0, ans=0.1 2024-08-19 11:42:40,149 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-19 11:42:41,486 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-19 11:42:45,228 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 550, loss[loss=0.1297, beats_loss=0.00819, ecapa_loss=0.0001298, whisper_loss=0.1202, over 21264.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01013, ecapa_loss=0.0001424, whisper_loss=0.08929, over 3580123.77 frames. ], batch size: 79, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:42:46,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4451390.0, ans=0.1 2024-08-19 11:42:46,287 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.76 vs. limit=15.0 2024-08-19 11:42:51,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4451390.0, ans=0.0 2024-08-19 11:42:56,741 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-19 11:43:05,285 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-19 11:43:07,362 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4451490.0, ans=0.1 2024-08-19 11:43:14,708 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-19 11:43:18,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4451490.0, ans=0.125 2024-08-19 11:43:30,757 WARNING [optim.py:496] (2/4) Scaling gradients by 0.03087170422077179, model_norm_threshold=50.47096252441406 2024-08-19 11:43:30,920 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.400e+05, grad_sumsq=1.036e+05, orig_rms_sq=3.283e+00 2024-08-19 11:43:37,747 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4451690.0, ans=0.125 2024-08-19 11:43:43,408 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-19 11:43:52,775 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4451690.0, ans=0.0 2024-08-19 11:43:55,131 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.29 vs. limit=15.0 2024-08-19 11:43:55,775 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-19 11:44:15,140 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 600, loss[loss=0.09801, beats_loss=0.01068, ecapa_loss=0.0001638, whisper_loss=0.08569, over 23260.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01014, ecapa_loss=0.0001426, whisper_loss=0.08948, over 3634044.79 frames. ], batch size: 91, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:44:29,585 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 16 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-19 11:44:32,771 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 11:44:35,417 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4451990.0, ans=0.125 2024-08-19 11:44:52,595 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.67 vs. limit=6.0 2024-08-19 11:45:02,643 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.56 vs. limit=15.0 2024-08-19 11:45:11,783 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4452190.0, ans=0.125 2024-08-19 11:45:24,842 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 14 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-19 11:45:26,788 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.353e+01 2.580e+01 2.899e+01 1.635e+03, threshold=5.160e+01, percent-clipped=1.0 2024-08-19 11:45:31,165 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4452290.0, ans=0.2 2024-08-19 11:45:40,113 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 650, loss[loss=0.1027, beats_loss=0.009866, ecapa_loss=0.0001376, whisper_loss=0.09144, over 20021.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01022, ecapa_loss=0.0001415, whisper_loss=0.08927, over 3677140.92 frames. ], batch size: 80, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:46:34,727 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.93 vs. limit=15.0 2024-08-19 11:46:41,857 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4452690.0, ans=0.125 2024-08-19 11:46:45,829 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-19 11:46:47,230 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.32 vs. limit=22.5 2024-08-19 11:46:49,315 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-19 11:46:51,120 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 11 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-19 11:47:00,503 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-19 11:47:08,473 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 700, loss[loss=0.0992, beats_loss=0.01104, ecapa_loss=0.0001247, whisper_loss=0.08692, over 15359.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01022, ecapa_loss=0.0001414, whisper_loss=0.08919, over 3703737.00 frames. ], batch size: 59, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:47:09,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4452890.0, ans=0.0 2024-08-19 11:47:09,106 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4452890.0, ans=0.2 2024-08-19 11:47:29,605 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.91 vs. limit=12.0 2024-08-19 11:47:33,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=4452990.0, ans=0.05 2024-08-19 11:47:44,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4453090.0, ans=0.125 2024-08-19 11:48:02,734 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4453090.0, ans=0.2 2024-08-19 11:48:02,916 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.40 vs. limit=15.0 2024-08-19 11:48:20,308 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4453190.0, ans=0.0 2024-08-19 11:48:28,043 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.237e+01 2.442e+01 2.764e+01 3.734e+01, threshold=4.884e+01, percent-clipped=0.0 2024-08-19 11:48:30,071 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 22 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-19 11:48:31,570 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-19 11:48:35,766 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-19 11:48:41,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4453390.0, ans=0.125 2024-08-19 11:48:42,457 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 750, loss[loss=0.1011, beats_loss=0.008669, ecapa_loss=0.0001561, whisper_loss=0.09088, over 21251.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01021, ecapa_loss=0.0001408, whisper_loss=0.08958, over 3734557.28 frames. ], batch size: 87, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:48:56,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4453390.0, ans=0.125 2024-08-19 11:49:00,612 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4453490.0, ans=0.125 2024-08-19 11:49:03,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4453490.0, ans=0.125 2024-08-19 11:49:14,824 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4453590.0, ans=0.2 2024-08-19 11:49:16,817 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4453590.0, ans=0.0 2024-08-19 11:49:27,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4453590.0, ans=0.0 2024-08-19 11:49:42,379 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 15 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-19 11:49:46,934 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-19 11:50:09,821 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4453890.0, ans=0.0 2024-08-19 11:50:10,544 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 800, loss[loss=0.09305, beats_loss=0.008784, ecapa_loss=0.0001264, whisper_loss=0.083, over 14400.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01019, ecapa_loss=0.0001415, whisper_loss=0.08923, over 3740034.33 frames. ], batch size: 53, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:50:23,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4453890.0, ans=10.0 2024-08-19 11:50:23,401 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4453890.0, ans=0.125 2024-08-19 11:50:29,166 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-19 11:50:29,514 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.262e+05 2024-08-19 11:50:35,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4453990.0, ans=0.1 2024-08-19 11:50:51,449 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4454090.0, ans=0.0 2024-08-19 11:51:06,047 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 14 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-19 11:51:07,096 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4454190.0, ans=0.0 2024-08-19 11:51:09,638 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2024-08-19 11:51:25,760 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 23 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-19 11:51:28,462 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.213e+01 2.446e+01 2.635e+01 4.034e+01, threshold=4.891e+01, percent-clipped=0.0 2024-08-19 11:51:40,097 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.15 vs. limit=15.0 2024-08-19 11:51:43,759 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4454390.0, ans=0.125 2024-08-19 11:51:45,072 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 850, loss[loss=0.09996, beats_loss=0.009416, ecapa_loss=0.0001667, whisper_loss=0.08887, over 17275.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01024, ecapa_loss=0.0001411, whisper_loss=0.08862, over 3768238.71 frames. ], batch size: 72, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:51:45,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4454390.0, ans=0.125 2024-08-19 11:51:56,068 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 23 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-19 11:52:17,101 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-19 11:52:31,436 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 21 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-19 11:52:41,537 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-19 11:52:51,171 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-19 11:53:03,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4454790.0, ans=0.0 2024-08-19 11:53:11,092 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 22 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-19 11:53:15,943 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 900, loss[loss=0.1045, beats_loss=0.01029, ecapa_loss=0.0001319, whisper_loss=0.09288, over 23160.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01017, ecapa_loss=0.0001405, whisper_loss=0.08887, over 3745816.05 frames. ], batch size: 91, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:53:27,539 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 22 from LS+wenet, 13 from Vox, 43 fro AS 2024-08-19 11:53:29,734 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4454890.0, ans=0.0 2024-08-19 11:53:33,305 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 24 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-19 11:53:47,917 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4454990.0, ans=0.0 2024-08-19 11:54:11,647 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4455190.0, ans=0.125 2024-08-19 11:54:20,954 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 13 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-19 11:54:33,152 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.958e+01 2.411e+01 2.627e+01 3.077e+01 2.229e+02, threshold=5.255e+01, percent-clipped=1.0 2024-08-19 11:54:34,678 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 25 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-19 11:54:49,759 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 950, loss[loss=0.1082, beats_loss=0.01043, ecapa_loss=0.0001286, whisper_loss=0.09645, over 16139.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01019, ecapa_loss=0.00014, whisper_loss=0.08913, over 3780857.14 frames. ], batch size: 62, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:55:04,310 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-19 11:55:07,875 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4455490.0, ans=0.95 2024-08-19 11:55:10,796 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 21 from LS+wenet, 16 from Vox, 17 fro AS 2024-08-19 11:55:12,698 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4455490.0, ans=0.125 2024-08-19 11:55:25,394 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4455590.0, ans=0.07 2024-08-19 11:55:28,827 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.68 vs. limit=10.0 2024-08-19 11:55:32,886 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 26 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-19 11:55:39,001 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4455590.0, ans=0.0 2024-08-19 11:55:56,073 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 31 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-19 11:56:11,723 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 17 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-19 11:56:12,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4455790.0, ans=0.0 2024-08-19 11:56:22,902 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 1000, loss[loss=0.1044, beats_loss=0.01175, ecapa_loss=0.0001135, whisper_loss=0.09155, over 23398.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01018, ecapa_loss=0.0001401, whisper_loss=0.08968, over 3817376.65 frames. ], batch size: 92, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:56:41,457 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4455990.0, ans=0.1 2024-08-19 11:56:53,677 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4455990.0, ans=0.04949747468305833 2024-08-19 11:57:05,676 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4456090.0, ans=0.125 2024-08-19 11:57:05,823 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.63 vs. limit=15.0 2024-08-19 11:57:16,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4456190.0, ans=0.125 2024-08-19 11:57:23,313 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=18.15 vs. limit=22.5 2024-08-19 11:57:37,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4456290.0, ans=0.2 2024-08-19 11:57:40,062 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.280e+01 2.488e+01 2.674e+01 4.382e+01, threshold=4.976e+01, percent-clipped=0.0 2024-08-19 11:57:45,164 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4456290.0, ans=0.1 2024-08-19 11:57:54,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4456290.0, ans=0.125 2024-08-19 11:57:57,845 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 1050, loss[loss=0.08704, beats_loss=0.01056, ecapa_loss=0.0001521, whisper_loss=0.07496, over 22727.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01021, ecapa_loss=0.0001395, whisper_loss=0.09002, over 3825612.59 frames. ], batch size: 93, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:58:01,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4456390.0, ans=0.95 2024-08-19 11:58:05,469 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 31 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-19 11:58:10,412 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4456390.0, ans=0.125 2024-08-19 11:58:15,090 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4456490.0, ans=0.015 2024-08-19 11:58:21,209 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4456490.0, ans=0.0 2024-08-19 11:58:25,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4456490.0, ans=0.125 2024-08-19 11:58:33,567 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 19 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-19 11:58:37,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4456590.0, ans=0.125 2024-08-19 11:58:52,982 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.60 vs. limit=15.0 2024-08-19 11:58:54,228 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 34 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-19 11:58:56,333 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-19 11:59:09,101 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4456790.0, ans=0.125 2024-08-19 11:59:10,163 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 21 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-19 11:59:15,000 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4456790.0, ans=0.09899494936611666 2024-08-19 11:59:20,650 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 22 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-19 11:59:29,752 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 1100, loss[loss=0.09809, beats_loss=0.01156, ecapa_loss=0.000102, whisper_loss=0.08552, over 21972.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01026, ecapa_loss=0.000139, whisper_loss=0.08987, over 3830235.68 frames. ], batch size: 84, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:59:30,141 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4456890.0, ans=0.125 2024-08-19 11:59:32,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4456890.0, ans=0.125 2024-08-19 11:59:46,668 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4456890.0, ans=0.125 2024-08-19 12:00:02,794 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 24 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-19 12:00:09,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4457090.0, ans=0.125 2024-08-19 12:00:39,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4457190.0, ans=0.125 2024-08-19 12:00:54,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4457290.0, ans=0.2 2024-08-19 12:00:59,218 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.392e+01 2.653e+01 2.944e+01 8.213e+01, threshold=5.307e+01, percent-clipped=1.0 2024-08-19 12:01:10,438 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 1150, loss[loss=0.08012, beats_loss=0.01058, ecapa_loss=0.0001587, whisper_loss=0.06796, over 14601.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01028, ecapa_loss=0.0001388, whisper_loss=0.08993, over 3816126.63 frames. ], batch size: 58, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:01:17,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4457390.0, ans=0.07 2024-08-19 12:01:19,640 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4457390.0, ans=0.125 2024-08-19 12:01:28,138 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4457490.0, ans=0.2 2024-08-19 12:01:47,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4457590.0, ans=0.125 2024-08-19 12:01:50,846 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4457590.0, ans=0.0 2024-08-19 12:01:55,250 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.23 vs. limit=15.0 2024-08-19 12:01:55,406 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.71 vs. limit=15.0 2024-08-19 12:01:58,042 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 20 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-19 12:02:08,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4457690.0, ans=0.125 2024-08-19 12:02:28,814 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 26 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-19 12:02:43,281 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 1200, loss[loss=0.1131, beats_loss=0.009261, ecapa_loss=0.0001297, whisper_loss=0.1026, over 22191.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01026, ecapa_loss=0.0001386, whisper_loss=0.09009, over 3813056.66 frames. ], batch size: 86, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:02:47,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4457890.0, ans=0.5 2024-08-19 12:03:09,734 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-19 12:03:38,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4458190.0, ans=0.125 2024-08-19 12:03:46,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4458190.0, ans=0.2 2024-08-19 12:04:04,454 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.704e+01 2.388e+01 2.700e+01 3.012e+01 6.792e+01, threshold=5.400e+01, percent-clipped=1.0 2024-08-19 12:04:18,792 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 1250, loss[loss=0.09788, beats_loss=0.01105, ecapa_loss=0.0001538, whisper_loss=0.08528, over 22462.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01031, ecapa_loss=0.0001373, whisper_loss=0.08977, over 3831737.65 frames. ], batch size: 91, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:05:07,637 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 15 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-19 12:05:28,579 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.70 vs. limit=22.5 2024-08-19 12:05:32,553 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 25 from Vox, 22 fro AS 2024-08-19 12:05:40,963 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-19 12:05:53,792 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4458790.0, ans=0.125 2024-08-19 12:05:56,295 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 1300, loss[loss=0.08412, beats_loss=0.01206, ecapa_loss=0.0001322, whisper_loss=0.07074, over 19551.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01039, ecapa_loss=0.0001379, whisper_loss=0.08904, over 3821541.13 frames. ], batch size: 82, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:06:17,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4458990.0, ans=0.0 2024-08-19 12:06:22,143 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-19 12:06:22,912 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.22 vs. limit=15.0 2024-08-19 12:06:33,604 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.96 vs. limit=15.0 2024-08-19 12:06:37,871 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4459090.0, ans=0.125 2024-08-19 12:07:07,702 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.63 vs. limit=15.0 2024-08-19 12:07:10,982 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 12:07:17,537 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.691e+01 2.249e+01 2.463e+01 2.703e+01 4.319e+01, threshold=4.927e+01, percent-clipped=0.0 2024-08-19 12:07:19,522 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 12:07:21,235 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 26 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-19 12:07:32,289 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 1350, loss[loss=0.1154, beats_loss=0.008771, ecapa_loss=0.000147, whisper_loss=0.1052, over 20934.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01036, ecapa_loss=0.0001378, whisper_loss=0.08908, over 3817258.47 frames. ], batch size: 82, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:07:34,708 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 26 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-19 12:07:40,392 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4459390.0, ans=0.0 2024-08-19 12:07:56,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4459490.0, ans=0.0 2024-08-19 12:07:58,478 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4459490.0, ans=0.125 2024-08-19 12:08:15,217 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4459590.0, ans=0.125 2024-08-19 12:08:42,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4459690.0, ans=0.1 2024-08-19 12:08:55,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4459790.0, ans=0.1 2024-08-19 12:08:58,203 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 14 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-19 12:09:04,126 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 1400, loss[loss=0.07462, beats_loss=0.01249, ecapa_loss=0.0001219, whisper_loss=0.06091, over 21417.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01036, ecapa_loss=0.0001384, whisper_loss=0.08869, over 3809309.35 frames. ], batch size: 88, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:09:13,400 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 20 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-19 12:09:13,714 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4459890.0, ans=0.125 2024-08-19 12:09:50,747 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4460190.0, ans=0.125 2024-08-19 12:09:57,354 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4460190.0, ans=0.125 2024-08-19 12:10:11,813 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.232e+01 2.433e+01 2.753e+01 5.485e+01, threshold=4.866e+01, percent-clipped=1.0 2024-08-19 12:10:14,089 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4460290.0, ans=0.125 2024-08-19 12:10:26,659 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 1450, loss[loss=0.07255, beats_loss=0.01359, ecapa_loss=0.0001131, whisper_loss=0.05783, over 17040.00 frames. ], tot_loss[loss=0.09996, beats_loss=0.01043, ecapa_loss=0.0001378, whisper_loss=0.08815, over 3795128.72 frames. ], batch size: 69, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:10:54,652 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.64 vs. limit=15.0 2024-08-19 12:11:08,027 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4460490.0, ans=0.0 2024-08-19 12:11:09,843 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4460490.0, ans=0.04949747468305833 2024-08-19 12:11:16,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4460490.0, ans=0.2 2024-08-19 12:11:16,713 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.12 vs. limit=15.0 2024-08-19 12:11:29,224 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-19 12:12:11,516 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4460790.0, ans=0.2 2024-08-19 12:12:22,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4460790.0, ans=0.125 2024-08-19 12:12:26,462 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 1500, loss[loss=0.07767, beats_loss=0.0112, ecapa_loss=0.0001105, whisper_loss=0.06537, over 14682.00 frames. ], tot_loss[loss=0.09979, beats_loss=0.01046, ecapa_loss=0.0001368, whisper_loss=0.08796, over 3811132.84 frames. ], batch size: 55, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:12:37,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4460890.0, ans=0.125 2024-08-19 12:13:01,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4461090.0, ans=0.125 2024-08-19 12:13:19,155 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 29 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-19 12:13:28,268 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 16 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-19 12:13:35,246 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-19 12:13:40,935 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.81 vs. limit=12.0 2024-08-19 12:13:41,564 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.290e+01 2.508e+01 2.829e+01 3.950e+01, threshold=5.015e+01, percent-clipped=0.0 2024-08-19 12:13:48,819 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4461290.0, ans=0.125 2024-08-19 12:13:56,718 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 1550, loss[loss=0.0816, beats_loss=0.009906, ecapa_loss=0.0001318, whisper_loss=0.07038, over 17816.00 frames. ], tot_loss[loss=0.09968, beats_loss=0.01047, ecapa_loss=0.0001365, whisper_loss=0.08784, over 3800716.00 frames. ], batch size: 69, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:14:03,059 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4461390.0, ans=0.1 2024-08-19 12:14:04,909 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4461390.0, ans=0.0 2024-08-19 12:14:27,559 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-19 12:14:38,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4461590.0, ans=0.0 2024-08-19 12:15:07,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4461690.0, ans=0.04949747468305833 2024-08-19 12:15:10,681 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-19 12:15:17,246 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 20 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-19 12:15:17,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4461790.0, ans=0.1 2024-08-19 12:15:23,906 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4461790.0, ans=0.125 2024-08-19 12:15:29,692 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 1600, loss[loss=0.07396, beats_loss=0.01105, ecapa_loss=0.0001566, whisper_loss=0.06134, over 21817.00 frames. ], tot_loss[loss=0.09966, beats_loss=0.01047, ecapa_loss=0.0001355, whisper_loss=0.08783, over 3820142.64 frames. ], batch size: 93, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:15:40,307 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.28 vs. limit=22.5 2024-08-19 12:16:03,593 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 15 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-19 12:16:45,903 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.703e+01 2.361e+01 2.575e+01 2.927e+01 3.831e+01, threshold=5.149e+01, percent-clipped=0.0 2024-08-19 12:16:57,820 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 1650, loss[loss=0.09808, beats_loss=0.009208, ecapa_loss=9.605e-05, whisper_loss=0.08791, over 16501.00 frames. ], tot_loss[loss=0.09989, beats_loss=0.01049, ecapa_loss=0.000135, whisper_loss=0.08804, over 3825414.02 frames. ], batch size: 60, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:16:59,413 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 20 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-19 12:16:59,935 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4462390.0, ans=0.1 2024-08-19 12:17:07,256 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4462390.0, ans=0.125 2024-08-19 12:17:15,268 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 22 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-19 12:17:35,689 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 13 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-19 12:18:15,088 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4462790.0, ans=0.0 2024-08-19 12:18:19,541 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-19 12:18:34,760 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.65 vs. limit=15.0 2024-08-19 12:18:35,151 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 1700, loss[loss=0.1192, beats_loss=0.007214, ecapa_loss=0.0001603, whisper_loss=0.1104, over 22938.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01042, ecapa_loss=0.0001352, whisper_loss=0.08838, over 3840506.16 frames. ], batch size: 91, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:18:38,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4462890.0, ans=0.04949747468305833 2024-08-19 12:18:40,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4462890.0, ans=0.0 2024-08-19 12:19:26,264 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 32 from Vox, 28 fro AS 2024-08-19 12:19:26,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4463190.0, ans=0.1 2024-08-19 12:19:38,941 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.84 vs. limit=12.0 2024-08-19 12:19:44,234 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.732e+01 2.356e+01 2.588e+01 2.791e+01 7.851e+01, threshold=5.177e+01, percent-clipped=4.0 2024-08-19 12:19:47,573 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=4463290.0, ans=0.5 2024-08-19 12:19:58,196 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 20 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-19 12:20:00,997 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 1750, loss[loss=0.1057, beats_loss=0.01114, ecapa_loss=0.0001393, whisper_loss=0.09321, over 23135.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01031, ecapa_loss=0.0001377, whisper_loss=0.08888, over 3855983.74 frames. ], batch size: 91, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:20:01,766 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4463390.0, ans=0.125 2024-08-19 12:20:30,721 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 27 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-19 12:20:36,971 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-19 12:20:39,341 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 24 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-19 12:20:55,886 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4463590.0, ans=0.125 2024-08-19 12:21:07,202 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4463690.0, ans=0.125 2024-08-19 12:21:07,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4463690.0, ans=0.125 2024-08-19 12:21:07,703 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.55 vs. limit=10.0 2024-08-19 12:21:09,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4463690.0, ans=0.09899494936611666 2024-08-19 12:21:25,419 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.65 vs. limit=15.0 2024-08-19 12:21:34,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4463790.0, ans=0.125 2024-08-19 12:21:34,355 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4463790.0, ans=0.125 2024-08-19 12:21:36,332 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-19 12:21:45,351 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 1800, loss[loss=0.09661, beats_loss=0.007675, ecapa_loss=0.0001492, whisper_loss=0.08744, over 14374.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01027, ecapa_loss=0.0001376, whisper_loss=0.08962, over 3863770.82 frames. ], batch size: 57, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:22:01,456 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-19 12:22:08,879 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=4463990.0, ans=0.2 2024-08-19 12:22:26,673 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 30 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-19 12:22:42,718 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-19 12:22:52,732 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.40 vs. limit=15.0 2024-08-19 12:22:57,446 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 25 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-19 12:23:13,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4464290.0, ans=0.125 2024-08-19 12:23:17,111 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.58 vs. limit=15.0 2024-08-19 12:23:21,352 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.692e+01 2.194e+01 2.455e+01 2.727e+01 5.354e+01, threshold=4.909e+01, percent-clipped=1.0 2024-08-19 12:23:30,259 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.90 vs. limit=22.5 2024-08-19 12:23:42,028 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 1850, loss[loss=0.08198, beats_loss=0.01148, ecapa_loss=0.0001215, whisper_loss=0.06928, over 18931.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01025, ecapa_loss=0.000137, whisper_loss=0.08953, over 3842842.41 frames. ], batch size: 76, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:24:08,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4464490.0, ans=0.125 2024-08-19 12:24:19,276 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4464490.0, ans=0.1 2024-08-19 12:24:51,821 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4464590.0, ans=0.0 2024-08-19 12:24:51,848 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4464590.0, ans=0.2 2024-08-19 12:24:56,359 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.57 vs. limit=22.5 2024-08-19 12:25:20,342 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 28 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-19 12:25:32,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4464790.0, ans=0.1 2024-08-19 12:25:44,914 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 1900, loss[loss=0.1241, beats_loss=0.008232, ecapa_loss=0.0001567, whisper_loss=0.1143, over 23373.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01018, ecapa_loss=0.0001372, whisper_loss=0.08974, over 3825468.95 frames. ], batch size: 94, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:26:33,871 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 17 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-19 12:26:37,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4465090.0, ans=0.0 2024-08-19 12:26:59,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4465090.0, ans=0.125 2024-08-19 12:27:06,477 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4465190.0, ans=0.125 2024-08-19 12:27:16,948 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 23 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-19 12:27:33,014 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.271e+01 2.463e+01 2.812e+01 5.945e+01, threshold=4.926e+01, percent-clipped=2.0 2024-08-19 12:27:54,063 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-19 12:27:55,046 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 1950, loss[loss=0.08847, beats_loss=0.01263, ecapa_loss=0.0001588, whisper_loss=0.07425, over 21199.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01023, ecapa_loss=0.0001361, whisper_loss=0.08948, over 3810060.41 frames. ], batch size: 93, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:27:56,241 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.48 vs. limit=15.0 2024-08-19 12:28:05,726 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.46 vs. limit=15.0 2024-08-19 12:28:32,504 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4465490.0, ans=0.2 2024-08-19 12:28:34,332 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4465490.0, ans=0.05 2024-08-19 12:28:54,150 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-19 12:28:54,562 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.80 vs. limit=22.5 2024-08-19 12:28:59,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=4465590.0, ans=0.05 2024-08-19 12:29:23,821 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-19 12:29:43,404 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 2000, loss[loss=0.09201, beats_loss=0.01354, ecapa_loss=0.0001172, whisper_loss=0.07729, over 16923.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01032, ecapa_loss=0.0001358, whisper_loss=0.08947, over 3849372.47 frames. ], batch size: 71, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:29:56,731 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4465890.0, ans=0.125 2024-08-19 12:30:33,489 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4466090.0, ans=0.0 2024-08-19 12:30:51,025 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 23 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-19 12:30:56,276 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.73 vs. limit=12.0 2024-08-19 12:30:58,857 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4466290.0, ans=0.04949747468305833 2024-08-19 12:31:01,173 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.320e+01 2.587e+01 3.185e+01 3.788e+02, threshold=5.175e+01, percent-clipped=4.0 2024-08-19 12:31:02,616 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 27 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-19 12:31:11,757 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4466290.0, ans=0.05 2024-08-19 12:31:14,350 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 2050, loss[loss=0.09547, beats_loss=0.01054, ecapa_loss=0.0001405, whisper_loss=0.08352, over 19229.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01035, ecapa_loss=0.0001368, whisper_loss=0.08995, over 3850963.91 frames. ], batch size: 77, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:31:17,551 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.49 vs. limit=15.0 2024-08-19 12:31:29,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4466390.0, ans=0.125 2024-08-19 12:31:38,275 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 29 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-19 12:31:59,263 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.518e-03 2024-08-19 12:32:11,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4466690.0, ans=0.0 2024-08-19 12:32:16,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4466690.0, ans=0.125 2024-08-19 12:32:21,463 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4466690.0, ans=0.125 2024-08-19 12:32:26,949 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.86 vs. limit=22.5 2024-08-19 12:32:48,698 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 2100, loss[loss=0.1054, beats_loss=0.009233, ecapa_loss=0.0001394, whisper_loss=0.09472, over 16330.00 frames. ], tot_loss[loss=0.101, beats_loss=0.0104, ecapa_loss=0.0001373, whisper_loss=0.08921, over 3825903.11 frames. ], batch size: 63, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:32:57,515 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4466890.0, ans=0.125 2024-08-19 12:33:12,172 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 31 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-19 12:33:14,862 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4466990.0, ans=0.0 2024-08-19 12:33:21,992 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 26 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-19 12:33:23,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4466990.0, ans=0.125 2024-08-19 12:33:47,811 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4467190.0, ans=0.0 2024-08-19 12:34:11,801 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.656e+01 2.305e+01 2.500e+01 2.818e+01 4.309e+01, threshold=5.000e+01, percent-clipped=0.0 2024-08-19 12:34:19,462 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-19 12:34:24,118 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 10 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-19 12:34:29,976 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 2150, loss[loss=0.09907, beats_loss=0.01035, ecapa_loss=0.000137, whisper_loss=0.08735, over 19703.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01033, ecapa_loss=0.0001373, whisper_loss=0.08968, over 3828584.80 frames. ], batch size: 77, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:34:41,693 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.82 vs. limit=15.0 2024-08-19 12:34:45,980 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 14 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-19 12:35:03,785 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-19 12:35:08,225 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=5.173e-03 2024-08-19 12:35:14,923 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-19 12:35:21,676 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4467590.0, ans=0.04949747468305833 2024-08-19 12:35:44,563 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4467790.0, ans=0.07 2024-08-19 12:35:46,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4467790.0, ans=0.125 2024-08-19 12:36:01,844 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 2200, loss[loss=0.1113, beats_loss=0.01025, ecapa_loss=9.354e-05, whisper_loss=0.1002, over 17653.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01044, ecapa_loss=0.0001354, whisper_loss=0.09033, over 3858503.90 frames. ], batch size: 63, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:36:12,485 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4467890.0, ans=0.0 2024-08-19 12:36:18,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4467890.0, ans=0.2 2024-08-19 12:36:22,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4467990.0, ans=0.0 2024-08-19 12:36:23,444 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-19 12:36:51,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4468090.0, ans=0.2 2024-08-19 12:36:52,966 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 19 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-19 12:37:10,395 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 27 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-19 12:37:21,679 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.215e+01 2.622e+01 2.935e+01 2.277e+02, threshold=5.244e+01, percent-clipped=1.0 2024-08-19 12:37:34,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4468290.0, ans=0.125 2024-08-19 12:37:34,734 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.39 vs. limit=12.0 2024-08-19 12:37:34,785 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=4468290.0, ans=15.0 2024-08-19 12:37:38,371 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 2250, loss[loss=0.1148, beats_loss=0.009179, ecapa_loss=0.0001578, whisper_loss=0.104, over 22336.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01046, ecapa_loss=0.0001356, whisper_loss=0.0909, over 3839997.50 frames. ], batch size: 91, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:37:52,773 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-19 12:38:04,460 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4468490.0, ans=0.125 2024-08-19 12:38:06,509 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4468490.0, ans=0.0 2024-08-19 12:38:08,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4468490.0, ans=0.125 2024-08-19 12:38:16,349 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.96 vs. limit=5.0 2024-08-19 12:38:39,037 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4468590.0, ans=0.2 2024-08-19 12:38:40,785 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 31 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-19 12:39:09,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4468790.0, ans=0.125 2024-08-19 12:39:17,508 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-19 12:39:24,235 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 2300, loss[loss=0.1058, beats_loss=0.009963, ecapa_loss=0.0001491, whisper_loss=0.09433, over 23603.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01046, ecapa_loss=0.0001351, whisper_loss=0.09126, over 3854356.79 frames. ], batch size: 96, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:40:13,292 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 24 from LS+wenet, 32 from Vox, 28 fro AS 2024-08-19 12:40:35,290 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4469290.0, ans=0.125 2024-08-19 12:40:36,383 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 23 from LS+wenet, 31 from Vox, 30 fro AS 2024-08-19 12:40:40,224 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.339e+01 2.578e+01 2.838e+01 4.503e+01, threshold=5.155e+01, percent-clipped=1.0 2024-08-19 12:40:54,648 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 2350, loss[loss=0.08599, beats_loss=0.009047, ecapa_loss=0.0001633, whisper_loss=0.0753, over 21176.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01037, ecapa_loss=0.0001372, whisper_loss=0.09035, over 3827020.52 frames. ], batch size: 89, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 12:41:00,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4469390.0, ans=0.0 2024-08-19 12:41:22,645 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 26 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-19 12:41:34,632 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 25 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-19 12:41:49,273 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4469590.0, ans=10.0 2024-08-19 12:41:54,322 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 35 from Vox, 29 fro AS 2024-08-19 12:41:58,430 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4469690.0, ans=0.125 2024-08-19 12:42:14,606 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4469790.0, ans=0.2 2024-08-19 12:42:20,473 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.12 vs. limit=15.0 2024-08-19 12:42:21,569 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 15 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-19 12:42:26,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4469790.0, ans=0.125 2024-08-19 12:42:28,526 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 2400, loss[loss=0.1124, beats_loss=0.00981, ecapa_loss=0.000149, whisper_loss=0.1011, over 22439.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01032, ecapa_loss=0.0001382, whisper_loss=0.09073, over 3857013.30 frames. ], batch size: 89, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 12:42:46,289 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4469990.0, ans=0.125 2024-08-19 12:42:50,025 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.32 vs. limit=15.0 2024-08-19 12:42:56,906 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4469990.0, ans=0.125 2024-08-19 12:42:59,180 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4469990.0, ans=0.2 2024-08-19 12:43:11,104 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-19 12:43:34,862 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.83 vs. limit=10.0 2024-08-19 12:43:37,775 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-19 12:43:39,944 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=4470190.0, ans=22.5 2024-08-19 12:43:43,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4470290.0, ans=0.125 2024-08-19 12:43:45,930 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.014e+01 2.427e+01 2.678e+01 2.913e+01 4.859e+01, threshold=5.356e+01, percent-clipped=0.0 2024-08-19 12:43:57,000 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4470390.0, ans=0.2 2024-08-19 12:43:57,766 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 2450, loss[loss=0.1257, beats_loss=0.009004, ecapa_loss=0.0001574, whisper_loss=0.1152, over 21039.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0103, ecapa_loss=0.0001384, whisper_loss=0.09133, over 3869048.31 frames. ], batch size: 81, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 12:44:07,213 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-19 12:44:08,876 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-19 12:44:11,146 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4470390.0, ans=0.0 2024-08-19 12:44:35,367 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4470590.0, ans=0.2 2024-08-19 12:44:37,000 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4470590.0, ans=0.125 2024-08-19 12:44:48,804 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.62 vs. limit=8.0 2024-08-19 12:45:12,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4470790.0, ans=0.125 2024-08-19 12:45:25,560 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 2500, loss[loss=0.09026, beats_loss=0.01201, ecapa_loss=0.0001117, whisper_loss=0.07713, over 22408.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01034, ecapa_loss=0.0001378, whisper_loss=0.09095, over 3871842.82 frames. ], batch size: 90, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 12:45:41,654 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4470890.0, ans=0.125 2024-08-19 12:45:49,876 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-19 12:46:12,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4471090.0, ans=0.125 2024-08-19 12:46:17,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4471090.0, ans=0.125 2024-08-19 12:46:21,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4471190.0, ans=0.0 2024-08-19 12:46:44,876 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.332e+01 2.578e+01 2.868e+01 6.708e+01, threshold=5.156e+01, percent-clipped=1.0 2024-08-19 12:46:52,600 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-19 12:46:59,034 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 2550, loss[loss=0.09202, beats_loss=0.01068, ecapa_loss=0.0001475, whisper_loss=0.07987, over 19809.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01029, ecapa_loss=0.0001392, whisper_loss=0.0918, over 3877848.50 frames. ], batch size: 81, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 12:46:59,255 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 22 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-19 12:47:25,451 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-19 12:47:26,214 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.44 vs. limit=15.0 2024-08-19 12:47:51,846 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-19 12:48:02,965 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-19 12:48:06,658 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.89 vs. limit=15.0 2024-08-19 12:48:23,856 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 2600, loss[loss=0.08831, beats_loss=0.01026, ecapa_loss=0.0001334, whisper_loss=0.07672, over 19767.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01035, ecapa_loss=0.0001392, whisper_loss=0.09095, over 3880051.17 frames. ], batch size: 77, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 12:48:43,935 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-19 12:48:56,074 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.09 vs. limit=15.0 2024-08-19 12:48:59,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4472090.0, ans=0.05 2024-08-19 12:49:00,767 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.05 vs. limit=10.0 2024-08-19 12:49:05,896 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4472090.0, ans=0.2 2024-08-19 12:49:39,402 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.420e+01 2.627e+01 2.917e+01 8.623e+01, threshold=5.255e+01, percent-clipped=2.0 2024-08-19 12:49:40,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4472290.0, ans=0.1 2024-08-19 12:49:45,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4472290.0, ans=0.025 2024-08-19 12:49:51,326 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4472290.0, ans=0.125 2024-08-19 12:49:54,076 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 2650, loss[loss=0.09646, beats_loss=0.01315, ecapa_loss=0.0001085, whisper_loss=0.08222, over 15766.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0104, ecapa_loss=0.0001389, whisper_loss=0.09019, over 3872873.49 frames. ], batch size: 64, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 12:50:14,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4472490.0, ans=0.2 2024-08-19 12:50:38,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4472590.0, ans=0.125 2024-08-19 12:50:54,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4472690.0, ans=0.035 2024-08-19 12:51:23,211 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 2700, loss[loss=0.08334, beats_loss=0.0126, ecapa_loss=0.0001171, whisper_loss=0.06957, over 14613.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01039, ecapa_loss=0.0001396, whisper_loss=0.09029, over 3896497.96 frames. ], batch size: 58, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 12:51:36,930 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 12:51:50,715 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 17 from LS+wenet, 31 from Vox, 42 fro AS 2024-08-19 12:52:14,937 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.69 vs. limit=15.0 2024-08-19 12:52:27,850 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-19 12:52:34,861 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.658e+01 2.264e+01 2.469e+01 2.717e+01 4.475e+01, threshold=4.939e+01, percent-clipped=0.0 2024-08-19 12:52:39,786 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 23 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-19 12:52:43,005 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-19 12:52:44,562 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 22 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-19 12:52:48,107 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 2750, loss[loss=0.1024, beats_loss=0.01248, ecapa_loss=0.0001239, whisper_loss=0.08866, over 22972.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01038, ecapa_loss=0.0001392, whisper_loss=0.09052, over 3912463.85 frames. ], batch size: 93, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 12:52:48,243 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-19 12:53:00,691 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 12:53:42,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4473690.0, ans=0.09899494936611666 2024-08-19 12:53:54,446 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=7.486e+00 2024-08-19 12:53:57,579 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 31 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-19 12:54:15,083 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4473790.0, ans=0.125 2024-08-19 12:54:17,673 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 2800, loss[loss=0.1236, beats_loss=0.009987, ecapa_loss=0.0001264, whisper_loss=0.1123, over 22924.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01037, ecapa_loss=0.0001393, whisper_loss=0.09081, over 3899768.70 frames. ], batch size: 89, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 12:54:38,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4473990.0, ans=0.1 2024-08-19 12:54:51,111 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 14 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-19 12:54:53,181 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 20 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-19 12:54:58,113 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4474090.0, ans=0.2 2024-08-19 12:55:07,823 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4474090.0, ans=0.125 2024-08-19 12:55:09,182 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4474090.0, ans=0.125 2024-08-19 12:55:09,201 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4474090.0, ans=0.125 2024-08-19 12:55:18,129 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4474190.0, ans=0.2 2024-08-19 12:55:33,886 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.412e+01 2.633e+01 2.866e+01 1.596e+02, threshold=5.265e+01, percent-clipped=3.0 2024-08-19 12:55:50,094 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 2850, loss[loss=0.1055, beats_loss=0.01031, ecapa_loss=0.0001286, whisper_loss=0.09387, over 22119.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01033, ecapa_loss=0.0001396, whisper_loss=0.09147, over 3897308.92 frames. ], batch size: 89, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 12:56:02,330 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4474390.0, ans=0.125 2024-08-19 12:56:05,120 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4474490.0, ans=0.2 2024-08-19 12:56:08,999 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-19 12:56:26,158 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 19 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-19 12:56:31,117 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 15 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-19 12:56:38,009 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 24 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-19 12:56:40,317 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4474690.0, ans=0.1 2024-08-19 12:56:49,013 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-19 12:56:59,003 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.78 vs. limit=15.0 2024-08-19 12:57:14,194 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 2900, loss[loss=0.08797, beats_loss=0.01082, ecapa_loss=0.0001418, whisper_loss=0.07573, over 20771.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01041, ecapa_loss=0.0001412, whisper_loss=0.09073, over 3906805.40 frames. ], batch size: 86, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 12:57:29,277 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4474890.0, ans=0.1 2024-08-19 12:57:36,922 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4474990.0, ans=0.0 2024-08-19 12:57:38,054 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 24 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-19 12:57:40,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4474990.0, ans=0.125 2024-08-19 12:57:47,373 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 33 from Vox, 30 fro AS 2024-08-19 12:58:14,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4475190.0, ans=0.0 2024-08-19 12:58:26,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4475290.0, ans=0.125 2024-08-19 12:58:29,197 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-19 12:58:30,157 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.714e+01 2.344e+01 2.665e+01 2.999e+01 6.021e+01, threshold=5.330e+01, percent-clipped=1.0 2024-08-19 12:58:44,963 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 2950, loss[loss=0.1277, beats_loss=0.009567, ecapa_loss=0.0001316, whisper_loss=0.1168, over 23400.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01033, ecapa_loss=0.000142, whisper_loss=0.09163, over 3938883.12 frames. ], batch size: 92, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 12:59:21,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4475590.0, ans=0.1 2024-08-19 12:59:22,986 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-19 12:59:25,010 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.76 vs. limit=15.0 2024-08-19 12:59:30,784 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4475590.0, ans=0.125 2024-08-19 12:59:56,415 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-19 12:59:56,676 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4475790.0, ans=0.1 2024-08-19 13:00:09,948 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 33 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-19 13:00:13,956 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 3000, loss[loss=0.1177, beats_loss=0.01003, ecapa_loss=0.000115, whisper_loss=0.1065, over 13982.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01029, ecapa_loss=0.0001405, whisper_loss=0.09212, over 3928796.28 frames. ], batch size: 53, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 13:00:13,956 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-19 13:00:59,239 INFO [train_multi_KD3.py:1149] (2/4) Epoch 31, validation on ASR_libri: loss=0.2548, beats_loss=0, ecapa_loss=0.0005195, whisper_loss=0.2496, over 922467.00 frames. 2024-08-19 13:01:17,439 INFO [train_multi_KD3.py:1149] (2/4) Epoch 31, validation on SV_voxceleb1: loss=0.003921, beats_loss=0, ecapa_loss=0.0003921, whisper_loss=0, over 939242.00 frames. 2024-08-19 13:03:07,088 INFO [train_multi_KD3.py:1149] (2/4) Epoch 31, validation on AT_audioset: loss=0.02304, beats_loss=0.02304, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 13:03:07,092 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31584MB 2024-08-19 13:03:17,500 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 29 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-19 13:03:23,582 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4475990.0, ans=0.0 2024-08-19 13:03:24,896 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4475990.0, ans=0.125 2024-08-19 13:03:29,001 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4475990.0, ans=0.2 2024-08-19 13:03:30,227 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-19 13:03:32,940 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4475990.0, ans=0.0 2024-08-19 13:04:16,073 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.421e+01 2.691e+01 3.119e+01 4.810e+01, threshold=5.383e+01, percent-clipped=0.0 2024-08-19 13:04:17,115 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.565e-02 2024-08-19 13:04:19,616 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-19 13:04:30,496 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 3050, loss[loss=0.09936, beats_loss=0.009902, ecapa_loss=0.0001618, whisper_loss=0.08784, over 17772.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01036, ecapa_loss=0.0001412, whisper_loss=0.09199, over 3920399.36 frames. ], batch size: 73, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 13:04:34,345 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-19 13:04:48,427 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-19 13:04:55,292 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.10 vs. limit=15.0 2024-08-19 13:04:59,291 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 27 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-19 13:05:01,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4476490.0, ans=0.125 2024-08-19 13:05:04,518 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-19 13:05:06,283 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 23 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-19 13:05:08,460 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-19 13:05:10,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4476590.0, ans=0.125 2024-08-19 13:05:13,812 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.69 vs. limit=15.0 2024-08-19 13:05:32,166 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4476690.0, ans=0.0 2024-08-19 13:05:50,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4476790.0, ans=0.125 2024-08-19 13:06:00,815 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 3100, loss[loss=0.1202, beats_loss=0.009029, ecapa_loss=0.0001464, whisper_loss=0.1097, over 22368.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01041, ecapa_loss=0.0001415, whisper_loss=0.09157, over 3937846.08 frames. ], batch size: 87, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 13:06:04,798 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4476890.0, ans=0.2 2024-08-19 13:06:06,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4476890.0, ans=0.0 2024-08-19 13:06:06,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4476890.0, ans=0.1 2024-08-19 13:06:32,367 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 36 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-19 13:06:41,412 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4477090.0, ans=0.0 2024-08-19 13:06:43,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4477090.0, ans=0.125 2024-08-19 13:07:13,692 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-19 13:07:14,612 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.300e+01 2.518e+01 2.909e+01 4.343e+01, threshold=5.036e+01, percent-clipped=0.0 2024-08-19 13:07:24,768 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4477290.0, ans=0.1 2024-08-19 13:07:27,348 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 3150, loss[loss=0.1012, beats_loss=0.01053, ecapa_loss=0.0001624, whisper_loss=0.08904, over 21444.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01037, ecapa_loss=0.0001426, whisper_loss=0.09172, over 3912058.37 frames. ], batch size: 85, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 13:07:28,879 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 27 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-19 13:07:44,209 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4477490.0, ans=0.125 2024-08-19 13:08:14,662 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 44 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-19 13:08:30,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4477690.0, ans=0.0 2024-08-19 13:08:50,884 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 3200, loss[loss=0.1214, beats_loss=0.01145, ecapa_loss=0.0001225, whisper_loss=0.1088, over 16949.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01045, ecapa_loss=0.0001415, whisper_loss=0.09186, over 3917897.14 frames. ], batch size: 65, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 13:08:52,561 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 19 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-19 13:08:58,183 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-19 13:09:00,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4477890.0, ans=0.0 2024-08-19 13:09:13,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4477990.0, ans=0.2 2024-08-19 13:09:15,339 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.95 vs. limit=15.0 2024-08-19 13:09:29,512 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4478090.0, ans=0.125 2024-08-19 13:09:30,886 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4478090.0, ans=0.125 2024-08-19 13:09:32,368 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4478090.0, ans=0.04949747468305833 2024-08-19 13:09:42,876 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-19 13:09:57,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4478190.0, ans=0.0 2024-08-19 13:10:01,092 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4478290.0, ans=0.0 2024-08-19 13:10:02,219 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 17 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-19 13:10:05,688 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.281e+01 2.603e+01 2.850e+01 4.454e+01, threshold=5.206e+01, percent-clipped=0.0 2024-08-19 13:10:06,209 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=4478290.0, ans=0.2 2024-08-19 13:10:19,922 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 3250, loss[loss=0.09984, beats_loss=0.009518, ecapa_loss=0.0001438, whisper_loss=0.08888, over 19313.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01045, ecapa_loss=0.0001415, whisper_loss=0.09243, over 3908336.42 frames. ], batch size: 79, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 13:10:21,944 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4478390.0, ans=0.125 2024-08-19 13:10:28,745 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 24 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-19 13:10:38,370 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.67 vs. limit=10.0 2024-08-19 13:10:43,603 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-19 13:10:45,836 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 15 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-19 13:10:58,692 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.52 vs. limit=15.0 2024-08-19 13:11:05,231 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 13:11:06,506 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 20 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-19 13:11:08,590 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4478590.0, ans=0.1 2024-08-19 13:11:18,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4478690.0, ans=0.1 2024-08-19 13:11:27,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4478790.0, ans=0.0 2024-08-19 13:11:32,824 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 28 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-19 13:11:42,629 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 3300, loss[loss=0.1062, beats_loss=0.008514, ecapa_loss=0.0001602, whisper_loss=0.0961, over 13863.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01046, ecapa_loss=0.0001415, whisper_loss=0.09204, over 3891115.73 frames. ], batch size: 55, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:11:46,319 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4478890.0, ans=0.125 2024-08-19 13:11:56,113 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.14 vs. limit=22.5 2024-08-19 13:11:59,169 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4478990.0, ans=0.125 2024-08-19 13:11:59,347 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=15.0 2024-08-19 13:12:02,952 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-19 13:12:06,390 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=4478990.0, ans=0.025 2024-08-19 13:12:17,334 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.41 vs. limit=15.0 2024-08-19 13:12:20,884 WARNING [optim.py:496] (2/4) Scaling gradients by 0.09731415659189224, model_norm_threshold=52.06379318237305 2024-08-19 13:12:21,051 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.62, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.783e+05, grad_sumsq=1.704e+07, orig_rms_sq=1.046e-02 2024-08-19 13:12:41,020 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4479190.0, ans=0.125 2024-08-19 13:12:46,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4479290.0, ans=0.2 2024-08-19 13:12:50,636 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.743e+01 2.318e+01 2.677e+01 3.024e+01 5.350e+02, threshold=5.354e+01, percent-clipped=4.0 2024-08-19 13:12:55,335 WARNING [optim.py:496] (2/4) Scaling gradients by 0.09374216943979263, model_norm_threshold=53.53926467895508 2024-08-19 13:12:55,507 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.2.out_combiner.bypass_scale with proportion 0.08, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.490e+04, grad_sumsq=4.301e+04, orig_rms_sq=5.788e-01 2024-08-19 13:12:55,659 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 33 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-19 13:13:01,550 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 3350, loss[loss=0.08649, beats_loss=0.01147, ecapa_loss=0.0001345, whisper_loss=0.07367, over 15918.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01043, ecapa_loss=0.0001418, whisper_loss=0.09132, over 3859557.07 frames. ], batch size: 63, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:13:02,245 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.86 vs. limit=6.0 2024-08-19 13:13:17,229 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 19 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-19 13:13:20,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4479490.0, ans=0.125 2024-08-19 13:13:36,693 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 14 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-19 13:13:57,803 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 17 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-19 13:14:04,487 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4479790.0, ans=0.1 2024-08-19 13:14:16,254 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 3400, loss[loss=0.06708, beats_loss=0.0113, ecapa_loss=0.000169, whisper_loss=0.05409, over 18655.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01045, ecapa_loss=0.000142, whisper_loss=0.09085, over 3869385.57 frames. ], batch size: 82, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:14:28,130 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.04 vs. limit=15.0 2024-08-19 13:14:30,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4479990.0, ans=0.125 2024-08-19 13:14:40,067 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4479990.0, ans=0.125 2024-08-19 13:14:41,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4479990.0, ans=0.1 2024-08-19 13:14:44,008 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-19 13:14:48,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4480090.0, ans=0.0 2024-08-19 13:14:48,804 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4480090.0, ans=0.125 2024-08-19 13:14:53,975 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-19 13:15:05,325 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4480190.0, ans=0.125 2024-08-19 13:15:14,736 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 22 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-19 13:15:22,759 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.735e+01 2.253e+01 2.575e+01 3.055e+01 5.711e+02, threshold=5.150e+01, percent-clipped=3.0 2024-08-19 13:15:23,038 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-19 13:15:33,756 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 3450, loss[loss=0.1202, beats_loss=0.01035, ecapa_loss=0.0001334, whisper_loss=0.1085, over 22121.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0104, ecapa_loss=0.000142, whisper_loss=0.09035, over 3873097.31 frames. ], batch size: 86, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:15:35,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4480390.0, ans=0.0 2024-08-19 13:15:43,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4480390.0, ans=0.1 2024-08-19 13:16:00,296 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 21 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-19 13:16:27,604 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4480690.0, ans=0.1 2024-08-19 13:16:28,712 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 24 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-19 13:16:47,962 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-19 13:16:50,980 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 3500, loss[loss=0.09816, beats_loss=0.01179, ecapa_loss=0.000113, whisper_loss=0.08523, over 17876.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01038, ecapa_loss=0.0001416, whisper_loss=0.08973, over 3829838.58 frames. ], batch size: 68, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:17:06,094 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 18 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-19 13:17:10,885 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 16 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-19 13:17:18,687 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-19 13:17:20,631 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=4481090.0, ans=0.05 2024-08-19 13:17:38,377 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.01 vs. limit=10.0 2024-08-19 13:17:57,048 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.595e+01 2.291e+01 2.477e+01 2.753e+01 3.837e+01, threshold=4.954e+01, percent-clipped=0.0 2024-08-19 13:18:01,452 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 32 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-19 13:18:06,613 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 3550, loss[loss=0.1078, beats_loss=0.0105, ecapa_loss=0.0001397, whisper_loss=0.09586, over 17348.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0104, ecapa_loss=0.000141, whisper_loss=0.08945, over 3821553.86 frames. ], batch size: 67, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:18:13,298 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-19 13:18:16,668 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4481390.0, ans=0.5 2024-08-19 13:18:24,010 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.96 vs. limit=15.0 2024-08-19 13:18:26,983 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4481490.0, ans=0.04949747468305833 2024-08-19 13:18:52,374 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4481690.0, ans=0.125 2024-08-19 13:18:55,844 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.22 vs. limit=15.0 2024-08-19 13:18:58,347 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 21 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-19 13:19:13,493 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4481790.0, ans=0.09899494936611666 2024-08-19 13:19:14,978 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.53 vs. limit=15.0 2024-08-19 13:19:19,401 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4481790.0, ans=0.05 2024-08-19 13:19:25,142 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 3600, loss[loss=0.09991, beats_loss=0.008456, ecapa_loss=0.0001328, whisper_loss=0.09012, over 18699.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01035, ecapa_loss=0.0001414, whisper_loss=0.08954, over 3834140.06 frames. ], batch size: 71, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:19:28,411 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 31 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-19 13:19:33,836 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4481890.0, ans=0.1 2024-08-19 13:20:00,686 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-19 13:20:13,487 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 30 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-19 13:20:18,345 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 24 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-19 13:20:34,079 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.650e+01 2.290e+01 2.482e+01 2.749e+01 4.098e+01, threshold=4.965e+01, percent-clipped=0.0 2024-08-19 13:20:43,915 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.37 vs. limit=15.0 2024-08-19 13:20:44,709 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 3650, loss[loss=0.08935, beats_loss=0.01201, ecapa_loss=0.000135, whisper_loss=0.07598, over 16846.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01036, ecapa_loss=0.000142, whisper_loss=0.0896, over 3830846.05 frames. ], batch size: 69, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:20:55,752 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4482390.0, ans=0.125 2024-08-19 13:21:24,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4482590.0, ans=0.2 2024-08-19 13:21:28,917 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-19 13:21:54,621 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4482790.0, ans=0.0 2024-08-19 13:22:00,028 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 3700, loss[loss=0.0948, beats_loss=0.01122, ecapa_loss=0.0001167, whisper_loss=0.08241, over 23560.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01039, ecapa_loss=0.0001431, whisper_loss=0.0896, over 3839642.15 frames. ], batch size: 92, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:22:17,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4482990.0, ans=0.0 2024-08-19 13:22:19,279 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4482990.0, ans=0.2 2024-08-19 13:22:24,147 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.44 vs. limit=22.5 2024-08-19 13:22:32,120 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4483090.0, ans=0.1 2024-08-19 13:22:39,972 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-19 13:22:52,648 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4483190.0, ans=0.125 2024-08-19 13:22:53,846 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 22 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-19 13:22:55,989 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4483190.0, ans=0.125 2024-08-19 13:23:06,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4483290.0, ans=0.125 2024-08-19 13:23:09,230 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.337e+01 2.650e+01 3.046e+01 1.477e+02, threshold=5.301e+01, percent-clipped=3.0 2024-08-19 13:23:17,477 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 15 from Vox, 47 fro AS 2024-08-19 13:23:20,800 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 3750, loss[loss=0.1328, beats_loss=0.008727, ecapa_loss=0.0001356, whisper_loss=0.1227, over 22848.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0104, ecapa_loss=0.0001431, whisper_loss=0.08996, over 3861162.72 frames. ], batch size: 84, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:23:40,197 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 30 from LS+wenet, 26 from Vox, 22 fro AS 2024-08-19 13:23:40,550 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4483490.0, ans=0.1 2024-08-19 13:23:42,140 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4483490.0, ans=0.07 2024-08-19 13:23:42,496 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.56 vs. limit=22.5 2024-08-19 13:23:48,811 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.97 vs. limit=6.0 2024-08-19 13:23:52,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4483590.0, ans=0.125 2024-08-19 13:24:02,003 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 18 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-19 13:24:04,602 WARNING [optim.py:496] (2/4) Scaling gradients by 0.08629204332828522, model_norm_threshold=53.00765609741211 2024-08-19 13:24:04,769 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.31, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.164e+05, grad_sumsq=1.113e+07, orig_rms_sq=1.046e-02 2024-08-19 13:24:09,686 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.79 vs. limit=15.0 2024-08-19 13:24:11,753 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-19 13:24:21,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4483790.0, ans=0.125 2024-08-19 13:24:33,958 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4483790.0, ans=0.125 2024-08-19 13:24:35,460 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4483890.0, ans=0.09899494936611666 2024-08-19 13:24:36,464 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 3800, loss[loss=0.1161, beats_loss=0.009216, ecapa_loss=0.0001394, whisper_loss=0.1055, over 15103.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01046, ecapa_loss=0.0001429, whisper_loss=0.08987, over 3833482.59 frames. ], batch size: 60, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:24:43,106 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-19 13:24:47,748 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 13 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-19 13:25:12,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=4484090.0, ans=0.1 2024-08-19 13:25:22,332 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4484090.0, ans=0.0 2024-08-19 13:25:26,732 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-19 13:25:30,559 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.19 vs. limit=15.0 2024-08-19 13:25:33,224 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-19 13:25:33,276 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4484190.0, ans=0.1 2024-08-19 13:25:45,330 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.415e+01 2.611e+01 2.947e+01 6.143e+02, threshold=5.223e+01, percent-clipped=2.0 2024-08-19 13:25:56,282 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 3850, loss[loss=0.1128, beats_loss=0.01005, ecapa_loss=9.515e-05, whisper_loss=0.1018, over 21414.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01047, ecapa_loss=0.0001418, whisper_loss=0.09042, over 3848445.76 frames. ], batch size: 79, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:26:01,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4484390.0, ans=0.125 2024-08-19 13:26:03,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4484390.0, ans=0.125 2024-08-19 13:26:09,789 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4484390.0, ans=0.0 2024-08-19 13:26:14,114 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 34 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-19 13:26:39,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4484590.0, ans=0.125 2024-08-19 13:27:00,754 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 13 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-19 13:27:00,990 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4484790.0, ans=0.125 2024-08-19 13:27:02,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4484790.0, ans=0.2 2024-08-19 13:27:08,029 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.00 vs. limit=15.0 2024-08-19 13:27:11,944 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 3900, loss[loss=0.096, beats_loss=0.009997, ecapa_loss=0.0001221, whisper_loss=0.08478, over 17256.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01047, ecapa_loss=0.0001419, whisper_loss=0.09081, over 3852357.89 frames. ], batch size: 67, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:27:26,164 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4484990.0, ans=0.125 2024-08-19 13:27:36,595 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-19 13:27:36,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4484990.0, ans=0.07 2024-08-19 13:27:44,993 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.269e+05 2024-08-19 13:27:48,364 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.04 vs. limit=15.0 2024-08-19 13:28:12,109 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.09 vs. limit=10.0 2024-08-19 13:28:18,588 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.733e+01 2.352e+01 2.565e+01 2.860e+01 1.728e+02, threshold=5.131e+01, percent-clipped=1.0 2024-08-19 13:28:26,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4485290.0, ans=0.125 2024-08-19 13:28:30,338 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 3950, loss[loss=0.1006, beats_loss=0.009745, ecapa_loss=0.0001497, whisper_loss=0.08932, over 22495.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01038, ecapa_loss=0.0001433, whisper_loss=0.09108, over 3867185.14 frames. ], batch size: 93, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:28:36,780 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 25 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-19 13:28:42,125 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4485390.0, ans=0.0 2024-08-19 13:28:47,946 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-19 13:28:55,773 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-19 13:29:15,513 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.73 vs. limit=22.5 2024-08-19 13:29:25,658 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 25 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-19 13:29:40,203 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 16 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-19 13:29:48,655 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 4000, loss[loss=0.09217, beats_loss=0.01107, ecapa_loss=0.0001616, whisper_loss=0.07948, over 21640.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01037, ecapa_loss=0.0001428, whisper_loss=0.09122, over 3850184.03 frames. ], batch size: 88, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:29:57,424 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 33 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-19 13:30:09,489 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4485990.0, ans=0.1 2024-08-19 13:30:34,272 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4486190.0, ans=0.0 2024-08-19 13:30:48,152 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 28 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-19 13:30:54,625 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.618e+01 2.333e+01 2.515e+01 2.857e+01 4.559e+01, threshold=5.029e+01, percent-clipped=0.0 2024-08-19 13:31:05,651 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 4050, loss[loss=0.1092, beats_loss=0.00948, ecapa_loss=0.0001814, whisper_loss=0.09788, over 22240.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01038, ecapa_loss=0.0001428, whisper_loss=0.0911, over 3821710.82 frames. ], batch size: 93, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:31:10,848 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4486390.0, ans=0.95 2024-08-19 13:31:18,442 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-19 13:31:29,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4486490.0, ans=0.0 2024-08-19 13:31:30,995 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4486490.0, ans=0.0 2024-08-19 13:31:54,253 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4486690.0, ans=0.125 2024-08-19 13:31:57,647 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4486690.0, ans=0.125 2024-08-19 13:32:01,704 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 25 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-19 13:32:06,788 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4486790.0, ans=0.1 2024-08-19 13:32:19,127 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 21 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-19 13:32:21,484 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.78 vs. limit=15.0 2024-08-19 13:32:23,966 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 4100, loss[loss=0.09526, beats_loss=0.01019, ecapa_loss=0.0001477, whisper_loss=0.08359, over 22482.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01039, ecapa_loss=0.0001415, whisper_loss=0.09135, over 3853377.31 frames. ], batch size: 90, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:32:42,499 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.19 vs. limit=15.0 2024-08-19 13:33:04,917 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4487090.0, ans=0.125 2024-08-19 13:33:06,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4487090.0, ans=0.125 2024-08-19 13:33:35,991 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+01 2.243e+01 2.583e+01 2.871e+01 4.368e+01, threshold=5.166e+01, percent-clipped=0.0 2024-08-19 13:33:36,462 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4487290.0, ans=0.1 2024-08-19 13:33:38,005 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4487290.0, ans=0.0 2024-08-19 13:33:46,109 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 4150, loss[loss=0.122, beats_loss=0.008687, ecapa_loss=0.0001439, whisper_loss=0.1119, over 14461.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01041, ecapa_loss=0.0001423, whisper_loss=0.091, over 3850015.62 frames. ], batch size: 54, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:33:50,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=4487390.0, ans=0.05 2024-08-19 13:33:58,527 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4487390.0, ans=0.125 2024-08-19 13:34:06,572 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4487490.0, ans=0.0 2024-08-19 13:34:10,286 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4487490.0, ans=0.125 2024-08-19 13:34:21,904 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4487490.0, ans=0.125 2024-08-19 13:34:28,681 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4487590.0, ans=0.125 2024-08-19 13:34:30,129 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4487590.0, ans=0.125 2024-08-19 13:34:30,223 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4487590.0, ans=0.2 2024-08-19 13:34:34,113 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4487590.0, ans=0.125 2024-08-19 13:34:37,930 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 23 from LS+wenet, 9 from Vox, 22 fro AS 2024-08-19 13:34:40,143 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4487590.0, ans=0.1 2024-08-19 13:34:42,181 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 18 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-19 13:34:48,352 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 21 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-19 13:34:53,509 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4487690.0, ans=0.0 2024-08-19 13:35:12,556 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-19 13:35:27,034 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 4200, loss[loss=0.09957, beats_loss=0.01064, ecapa_loss=0.0001506, whisper_loss=0.08742, over 15803.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01037, ecapa_loss=0.0001417, whisper_loss=0.09154, over 3849237.83 frames. ], batch size: 65, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:35:45,976 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4487990.0, ans=0.2 2024-08-19 13:35:59,069 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.984e+01 2024-08-19 13:36:36,523 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-19 13:36:43,502 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.778e+01 2.376e+01 2.579e+01 2.870e+01 3.813e+01, threshold=5.158e+01, percent-clipped=0.0 2024-08-19 13:36:53,588 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 4250, loss[loss=0.09058, beats_loss=0.01039, ecapa_loss=0.0001031, whisper_loss=0.07916, over 15371.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01045, ecapa_loss=0.0001413, whisper_loss=0.09088, over 3859651.86 frames. ], batch size: 58, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:37:19,325 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4488490.0, ans=0.035 2024-08-19 13:37:22,464 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4488490.0, ans=0.125 2024-08-19 13:37:23,748 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-19 13:37:50,990 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.88 vs. limit=22.5 2024-08-19 13:38:06,248 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.89 vs. limit=6.0 2024-08-19 13:38:15,204 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 4300, loss[loss=0.09144, beats_loss=0.01158, ecapa_loss=0.0001319, whisper_loss=0.07854, over 21742.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01045, ecapa_loss=0.0001426, whisper_loss=0.09038, over 3878605.53 frames. ], batch size: 92, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:38:16,323 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4488890.0, ans=0.125 2024-08-19 13:38:24,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4488890.0, ans=0.1 2024-08-19 13:38:56,146 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-19 13:39:00,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4489090.0, ans=0.2 2024-08-19 13:39:04,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4489090.0, ans=0.125 2024-08-19 13:39:07,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4489090.0, ans=0.125 2024-08-19 13:39:08,236 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4489090.0, ans=0.0 2024-08-19 13:39:27,215 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 14 from Vox, 48 fro AS 2024-08-19 13:39:36,973 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.711e+01 2.321e+01 2.560e+01 2.840e+01 5.054e+01, threshold=5.120e+01, percent-clipped=0.0 2024-08-19 13:39:37,915 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 28 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-19 13:39:48,868 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.92 vs. limit=15.0 2024-08-19 13:39:49,754 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 4350, loss[loss=0.1152, beats_loss=0.007309, ecapa_loss=0.0001443, whisper_loss=0.1065, over 19092.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01034, ecapa_loss=0.0001435, whisper_loss=0.0917, over 3897555.00 frames. ], batch size: 74, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:40:00,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4489390.0, ans=0.125 2024-08-19 13:40:01,399 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 20 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-19 13:40:06,196 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 13:40:26,597 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 20 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-19 13:40:29,108 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4489590.0, ans=0.125 2024-08-19 13:40:54,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4489690.0, ans=0.125 2024-08-19 13:41:03,236 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 22 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-19 13:41:07,096 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4489790.0, ans=0.125 2024-08-19 13:41:19,876 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.63 vs. limit=15.0 2024-08-19 13:41:22,322 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 4400, loss[loss=0.09299, beats_loss=0.01147, ecapa_loss=0.000137, whisper_loss=0.08016, over 16435.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01044, ecapa_loss=0.0001423, whisper_loss=0.0909, over 3879372.28 frames. ], batch size: 67, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:41:26,525 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 21 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-19 13:41:32,324 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 16 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-19 13:42:01,488 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 13:42:18,487 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4490190.0, ans=0.0 2024-08-19 13:42:45,476 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.746e+01 2.324e+01 2.485e+01 2.810e+01 3.864e+01, threshold=4.971e+01, percent-clipped=0.0 2024-08-19 13:42:46,471 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 13:43:00,443 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 4450, loss[loss=0.09078, beats_loss=0.01054, ecapa_loss=0.0001284, whisper_loss=0.07895, over 16368.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01045, ecapa_loss=0.000141, whisper_loss=0.09022, over 3867664.09 frames. ], batch size: 64, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:43:05,986 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.61 vs. limit=10.0 2024-08-19 13:43:13,815 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 24 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-19 13:43:15,460 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4490390.0, ans=0.125 2024-08-19 13:43:20,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4490490.0, ans=0.125 2024-08-19 13:43:21,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4490490.0, ans=0.1 2024-08-19 13:43:22,708 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 22 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-19 13:43:28,948 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 13 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-19 13:43:50,748 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4490590.0, ans=0.0 2024-08-19 13:43:50,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4490590.0, ans=0.125 2024-08-19 13:43:52,829 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4490590.0, ans=0.125 2024-08-19 13:43:54,546 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 30 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-19 13:44:05,145 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 31 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-19 13:44:15,014 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4490690.0, ans=0.2 2024-08-19 13:44:53,119 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 4500, loss[loss=0.1078, beats_loss=0.01008, ecapa_loss=0.0001375, whisper_loss=0.09632, over 18553.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01042, ecapa_loss=0.0001411, whisper_loss=0.09026, over 3851439.91 frames. ], batch size: 76, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:45:02,326 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4490890.0, ans=0.1 2024-08-19 13:45:30,349 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-19 13:46:22,437 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.756e+01 2.269e+01 2.513e+01 2.857e+01 4.975e+01, threshold=5.026e+01, percent-clipped=1.0 2024-08-19 13:46:29,125 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4491290.0, ans=0.125 2024-08-19 13:46:33,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4491290.0, ans=0.125 2024-08-19 13:46:37,956 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 4550, loss[loss=0.08961, beats_loss=0.01206, ecapa_loss=0.0001679, whisper_loss=0.07587, over 20922.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01039, ecapa_loss=0.0001418, whisper_loss=0.09052, over 3871249.76 frames. ], batch size: 91, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:46:38,655 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.13 vs. limit=22.5 2024-08-19 13:46:45,116 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4491390.0, ans=0.0 2024-08-19 13:46:53,261 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4491390.0, ans=0.2 2024-08-19 13:47:11,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4491490.0, ans=0.0 2024-08-19 13:47:19,593 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4491590.0, ans=0.125 2024-08-19 13:47:25,474 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4491590.0, ans=0.125 2024-08-19 13:47:45,453 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.49 vs. limit=15.0 2024-08-19 13:47:48,751 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-08-19 13:47:52,174 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-19 13:48:00,124 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 19 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-19 13:48:11,589 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 4600, loss[loss=0.08634, beats_loss=0.01313, ecapa_loss=0.0001125, whisper_loss=0.07209, over 15211.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01045, ecapa_loss=0.0001407, whisper_loss=0.09023, over 3856473.96 frames. ], batch size: 61, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:48:15,792 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4491890.0, ans=0.125 2024-08-19 13:48:43,221 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4491990.0, ans=0.0 2024-08-19 13:49:02,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4492090.0, ans=0.125 2024-08-19 13:49:03,858 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4492090.0, ans=0.125 2024-08-19 13:49:03,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4492090.0, ans=0.1 2024-08-19 13:49:07,105 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 19 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-19 13:49:16,237 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-19 13:49:21,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4492190.0, ans=0.0 2024-08-19 13:49:25,148 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4492290.0, ans=0.125 2024-08-19 13:49:25,163 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4492290.0, ans=0.125 2024-08-19 13:49:31,931 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.304e+01 2.544e+01 2.901e+01 6.121e+01, threshold=5.088e+01, percent-clipped=1.0 2024-08-19 13:49:34,852 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.23 vs. limit=15.0 2024-08-19 13:49:41,354 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4492290.0, ans=0.125 2024-08-19 13:49:43,898 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 4650, loss[loss=0.1001, beats_loss=0.01263, ecapa_loss=0.0001299, whisper_loss=0.08619, over 21483.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01054, ecapa_loss=0.0001413, whisper_loss=0.08994, over 3877404.06 frames. ], batch size: 89, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:50:15,213 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4492490.0, ans=0.1 2024-08-19 13:50:32,639 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 24 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-19 13:50:32,889 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4492590.0, ans=0.0 2024-08-19 13:50:47,783 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-19 13:50:48,348 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.95 vs. limit=6.0 2024-08-19 13:50:49,363 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 31 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-19 13:50:57,911 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4492790.0, ans=0.0 2024-08-19 13:51:00,931 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4492790.0, ans=0.125 2024-08-19 13:51:11,139 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4492790.0, ans=0.0 2024-08-19 13:51:11,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4492790.0, ans=0.125 2024-08-19 13:51:14,857 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 4700, loss[loss=0.1142, beats_loss=0.007936, ecapa_loss=0.0001183, whisper_loss=0.105, over 20407.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01055, ecapa_loss=0.0001417, whisper_loss=0.0896, over 3892215.18 frames. ], batch size: 75, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:51:18,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4492890.0, ans=0.125 2024-08-19 13:51:33,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4492990.0, ans=0.0 2024-08-19 13:52:02,355 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 21 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-19 13:52:20,762 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.02 vs. limit=15.0 2024-08-19 13:52:24,367 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4493290.0, ans=0.2 2024-08-19 13:52:27,983 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.95 vs. limit=22.5 2024-08-19 13:52:30,135 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.714e+01 2.412e+01 2.639e+01 2.968e+01 4.627e+01, threshold=5.277e+01, percent-clipped=0.0 2024-08-19 13:52:34,228 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4493290.0, ans=0.0 2024-08-19 13:52:39,460 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4493290.0, ans=0.125 2024-08-19 13:52:42,730 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 4750, loss[loss=0.1265, beats_loss=0.009151, ecapa_loss=0.0001702, whisper_loss=0.1156, over 18168.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01047, ecapa_loss=0.0001425, whisper_loss=0.0901, over 3886853.66 frames. ], batch size: 76, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:52:59,477 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.93 vs. limit=15.0 2024-08-19 13:53:00,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4493490.0, ans=0.125 2024-08-19 13:53:31,282 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4493590.0, ans=0.125 2024-08-19 13:53:41,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4493690.0, ans=0.125 2024-08-19 13:53:43,512 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4493690.0, ans=0.125 2024-08-19 13:53:49,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4493690.0, ans=0.2 2024-08-19 13:53:58,130 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4493790.0, ans=0.125 2024-08-19 13:54:14,323 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 4800, loss[loss=0.1063, beats_loss=0.009196, ecapa_loss=0.0001417, whisper_loss=0.09572, over 19876.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01038, ecapa_loss=0.0001435, whisper_loss=0.09004, over 3902893.92 frames. ], batch size: 80, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 13:54:22,325 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.65 vs. limit=15.0 2024-08-19 13:54:58,445 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 22 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-19 13:55:26,745 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.328e+01 2.534e+01 2.852e+01 3.872e+01, threshold=5.068e+01, percent-clipped=0.0 2024-08-19 13:55:31,548 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4494290.0, ans=0.125 2024-08-19 13:55:31,597 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4494290.0, ans=0.0 2024-08-19 13:55:38,094 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 4850, loss[loss=0.1088, beats_loss=0.01049, ecapa_loss=0.000112, whisper_loss=0.09719, over 16588.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01043, ecapa_loss=0.0001425, whisper_loss=0.09023, over 3917261.53 frames. ], batch size: 63, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 13:55:38,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4494390.0, ans=0.035 2024-08-19 13:55:51,843 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4494390.0, ans=0.125 2024-08-19 13:55:51,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4494390.0, ans=0.2 2024-08-19 13:55:54,919 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-19 13:56:08,285 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-19 13:56:35,187 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 16 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-19 13:56:35,428 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4494690.0, ans=0.125 2024-08-19 13:56:35,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4494690.0, ans=0.0 2024-08-19 13:56:40,915 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.68 vs. limit=12.0 2024-08-19 13:56:44,476 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.28 vs. limit=12.0 2024-08-19 13:57:03,361 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 4900, loss[loss=0.07354, beats_loss=0.01195, ecapa_loss=0.0001114, whisper_loss=0.06048, over 13884.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01043, ecapa_loss=0.0001424, whisper_loss=0.09065, over 3894494.05 frames. ], batch size: 54, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 13:57:22,108 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 16 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-19 13:57:25,463 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 18 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-19 13:57:25,716 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4494990.0, ans=10.0 2024-08-19 13:57:32,525 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.18 vs. limit=15.0 2024-08-19 13:57:33,903 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4494990.0, ans=0.125 2024-08-19 13:57:44,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4495090.0, ans=0.125 2024-08-19 13:57:45,933 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-19 13:57:49,477 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 25 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-19 13:58:01,135 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-19 13:58:15,437 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.052e+01 2.360e+01 2.527e+01 2.756e+01 4.531e+01, threshold=5.055e+01, percent-clipped=0.0 2024-08-19 13:58:25,266 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 4950, loss[loss=0.08574, beats_loss=0.01214, ecapa_loss=0.0001369, whisper_loss=0.07223, over 21879.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01051, ecapa_loss=0.0001404, whisper_loss=0.08997, over 3889668.11 frames. ], batch size: 92, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 13:58:53,177 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4495490.0, ans=0.2 2024-08-19 13:59:04,266 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-19 13:59:15,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4495690.0, ans=0.0 2024-08-19 13:59:15,406 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4495690.0, ans=0.125 2024-08-19 13:59:31,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4495790.0, ans=0.015 2024-08-19 13:59:32,806 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-19 13:59:40,945 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 35 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-19 13:59:48,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4495890.0, ans=0.125 2024-08-19 13:59:49,056 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 5000, loss[loss=0.1013, beats_loss=0.01326, ecapa_loss=0.0001209, whisper_loss=0.08684, over 21594.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01051, ecapa_loss=0.0001405, whisper_loss=0.08964, over 3882283.15 frames. ], batch size: 87, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 13:59:54,759 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-19 14:00:10,296 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.09 vs. limit=15.0 2024-08-19 14:00:13,494 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-19 14:00:33,649 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 28 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-19 14:00:44,723 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 36 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-19 14:01:00,817 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4496290.0, ans=0.125 2024-08-19 14:01:07,622 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.293e+01 2.592e+01 2.875e+01 4.425e+01, threshold=5.184e+01, percent-clipped=0.0 2024-08-19 14:01:11,649 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-19 14:01:13,866 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4496290.0, ans=0.125 2024-08-19 14:01:15,395 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4496290.0, ans=0.04949747468305833 2024-08-19 14:01:17,806 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 5050, loss[loss=0.1115, beats_loss=0.00828, ecapa_loss=0.0001651, whisper_loss=0.1016, over 21593.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0105, ecapa_loss=0.0001411, whisper_loss=0.09001, over 3889667.34 frames. ], batch size: 89, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:01:32,092 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4496490.0, ans=0.07 2024-08-19 14:01:34,387 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-19 14:01:58,929 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4496590.0, ans=0.1 2024-08-19 14:01:59,317 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.86 vs. limit=15.0 2024-08-19 14:02:23,700 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.72 vs. limit=15.0 2024-08-19 14:02:28,935 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=15.0 2024-08-19 14:02:33,224 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-19 14:02:41,955 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 5100, loss[loss=0.09872, beats_loss=0.01176, ecapa_loss=0.0001449, whisper_loss=0.08552, over 22253.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0105, ecapa_loss=0.0001403, whisper_loss=0.09077, over 3917737.35 frames. ], batch size: 90, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:03:00,206 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-19 14:03:42,607 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 14:03:49,735 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-19 14:03:53,610 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4497290.0, ans=0.125 2024-08-19 14:03:55,037 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-19 14:03:56,383 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.306e+01 2.543e+01 2.887e+01 4.105e+01, threshold=5.087e+01, percent-clipped=0.0 2024-08-19 14:04:01,240 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-19 14:04:06,697 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 5150, loss[loss=0.1052, beats_loss=0.01132, ecapa_loss=0.0001495, whisper_loss=0.09238, over 21696.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01054, ecapa_loss=0.0001402, whisper_loss=0.09123, over 3941040.12 frames. ], batch size: 90, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:04:08,722 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-19 14:04:18,159 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-19 14:04:20,613 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4497390.0, ans=0.04949747468305833 2024-08-19 14:04:34,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4497490.0, ans=0.125 2024-08-19 14:04:47,717 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-19 14:04:54,543 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.94 vs. limit=10.0 2024-08-19 14:04:56,496 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 20 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-19 14:05:01,710 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.10 vs. limit=15.0 2024-08-19 14:05:02,424 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4497690.0, ans=0.125 2024-08-19 14:05:06,462 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 28 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-19 14:05:06,795 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4497690.0, ans=0.0 2024-08-19 14:05:27,346 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-19 14:05:28,790 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 5200, loss[loss=0.09861, beats_loss=0.01162, ecapa_loss=0.0001208, whisper_loss=0.08579, over 23677.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01049, ecapa_loss=0.0001406, whisper_loss=0.09137, over 3953399.48 frames. ], batch size: 92, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:05:35,740 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4497890.0, ans=0.2 2024-08-19 14:05:40,935 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4497890.0, ans=0.125 2024-08-19 14:05:58,575 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 20 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-19 14:06:05,472 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-19 14:06:11,086 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4498090.0, ans=0.125 2024-08-19 14:06:22,866 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 31 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-19 14:06:24,189 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.71 vs. limit=10.0 2024-08-19 14:06:29,493 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 19 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-19 14:06:41,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4498290.0, ans=0.0 2024-08-19 14:06:42,501 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.701e+01 2.299e+01 2.552e+01 2.822e+01 3.690e+01, threshold=5.104e+01, percent-clipped=0.0 2024-08-19 14:06:47,824 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4498290.0, ans=0.1 2024-08-19 14:06:52,380 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 5250, loss[loss=0.1222, beats_loss=0.01157, ecapa_loss=0.0001253, whisper_loss=0.1094, over 21719.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01051, ecapa_loss=0.0001409, whisper_loss=0.09079, over 3941690.74 frames. ], batch size: 86, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:07:24,610 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4498490.0, ans=0.0 2024-08-19 14:07:32,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4498590.0, ans=0.125 2024-08-19 14:07:38,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4498590.0, ans=0.125 2024-08-19 14:07:46,359 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 29 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-19 14:07:50,303 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 22 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-19 14:08:10,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4498790.0, ans=0.0 2024-08-19 14:08:19,132 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 5300, loss[loss=0.08628, beats_loss=0.0107, ecapa_loss=0.0001397, whisper_loss=0.07418, over 16006.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01044, ecapa_loss=0.0001413, whisper_loss=0.09087, over 3930076.28 frames. ], batch size: 67, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:08:23,809 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 21 from LS+wenet, 13 from Vox, 19 fro AS 2024-08-19 14:08:25,531 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-19 14:08:25,805 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4498890.0, ans=0.125 2024-08-19 14:08:32,590 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-19 14:08:45,339 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 31 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-19 14:09:02,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4499090.0, ans=0.125 2024-08-19 14:09:16,328 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.46 vs. limit=22.5 2024-08-19 14:09:31,258 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.981e+01 2.343e+01 2.651e+01 2.972e+01 3.624e+01, threshold=5.303e+01, percent-clipped=0.0 2024-08-19 14:09:40,835 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 5350, loss[loss=0.0855, beats_loss=0.009876, ecapa_loss=0.0001925, whisper_loss=0.0737, over 14583.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01047, ecapa_loss=0.00014, whisper_loss=0.09049, over 3900559.06 frames. ], batch size: 64, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:09:47,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4499390.0, ans=0.125 2024-08-19 14:10:12,906 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.79 vs. limit=15.0 2024-08-19 14:10:25,505 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-19 14:10:34,631 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 14:10:48,951 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.93 vs. limit=10.0 2024-08-19 14:10:51,252 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.61 vs. limit=10.0 2024-08-19 14:10:57,530 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4499790.0, ans=0.125 2024-08-19 14:10:59,583 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4499790.0, ans=0.125 2024-08-19 14:11:09,962 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 5400, loss[loss=0.1276, beats_loss=0.01109, ecapa_loss=0.0001214, whisper_loss=0.1153, over 23282.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01041, ecapa_loss=0.0001394, whisper_loss=0.09105, over 3915124.85 frames. ], batch size: 91, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:11:10,497 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4499890.0, ans=0.125 2024-08-19 14:11:24,199 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 22 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-19 14:11:41,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4499990.0, ans=0.125 2024-08-19 14:11:59,053 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 31 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-19 14:12:02,206 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 28 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-19 14:12:09,531 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4500190.0, ans=0.125 2024-08-19 14:12:10,581 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 26 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-19 14:12:13,684 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4500190.0, ans=0.125 2024-08-19 14:12:20,107 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.87 vs. limit=6.0 2024-08-19 14:12:26,263 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.275e+01 2.501e+01 2.873e+01 4.130e+01, threshold=5.001e+01, percent-clipped=0.0 2024-08-19 14:12:31,911 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=4500290.0, ans=0.5 2024-08-19 14:12:36,751 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 5450, loss[loss=0.0953, beats_loss=0.01009, ecapa_loss=0.0001447, whisper_loss=0.08376, over 19459.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01034, ecapa_loss=0.0001408, whisper_loss=0.09163, over 3916652.89 frames. ], batch size: 78, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:12:58,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4500490.0, ans=0.1 2024-08-19 14:12:59,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4500490.0, ans=0.125 2024-08-19 14:13:02,929 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 18 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-19 14:13:20,614 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 37 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-19 14:14:07,495 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 5500, loss[loss=0.1137, beats_loss=0.01022, ecapa_loss=0.0001223, whisper_loss=0.1023, over 22753.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01041, ecapa_loss=0.0001401, whisper_loss=0.09077, over 3914970.46 frames. ], batch size: 89, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:14:18,922 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4500890.0, ans=0.125 2024-08-19 14:14:29,764 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 22 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-19 14:14:37,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4500990.0, ans=0.2 2024-08-19 14:15:29,751 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+01 2.304e+01 2.520e+01 2.871e+01 1.187e+02, threshold=5.040e+01, percent-clipped=1.0 2024-08-19 14:15:42,780 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 5550, loss[loss=0.08547, beats_loss=0.01214, ecapa_loss=0.0001306, whisper_loss=0.07202, over 20229.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01047, ecapa_loss=0.0001396, whisper_loss=0.09018, over 3908374.38 frames. ], batch size: 85, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:15:56,052 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=22.5 2024-08-19 14:15:58,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4501390.0, ans=0.0 2024-08-19 14:16:15,974 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4501490.0, ans=0.0 2024-08-19 14:16:19,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4501590.0, ans=0.125 2024-08-19 14:17:05,743 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4501790.0, ans=0.125 2024-08-19 14:17:05,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=4501790.0, ans=0.1 2024-08-19 14:17:10,645 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4501790.0, ans=0.125 2024-08-19 14:17:15,059 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4501790.0, ans=0.0 2024-08-19 14:17:17,872 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 5600, loss[loss=0.1088, beats_loss=0.01141, ecapa_loss=0.0001318, whisper_loss=0.09603, over 22537.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01053, ecapa_loss=0.0001397, whisper_loss=0.08985, over 3908358.71 frames. ], batch size: 90, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:17:35,992 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4501990.0, ans=0.125 2024-08-19 14:17:37,596 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-19 14:17:37,918 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4501990.0, ans=0.125 2024-08-19 14:17:48,199 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4501990.0, ans=0.125 2024-08-19 14:18:02,176 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 20 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-19 14:18:05,445 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-19 14:18:07,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4502090.0, ans=0.0 2024-08-19 14:18:39,441 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.342e+01 2.515e+01 2.771e+01 5.960e+01, threshold=5.030e+01, percent-clipped=1.0 2024-08-19 14:18:44,398 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 13 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-19 14:18:46,640 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=15.0 2024-08-19 14:18:49,834 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 5650, loss[loss=0.09932, beats_loss=0.009016, ecapa_loss=0.0001443, whisper_loss=0.08886, over 22860.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01053, ecapa_loss=0.0001405, whisper_loss=0.08938, over 3909748.71 frames. ], batch size: 90, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:18:55,625 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.48 vs. limit=12.0 2024-08-19 14:18:55,692 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.18 vs. limit=15.0 2024-08-19 14:18:56,703 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4502390.0, ans=0.0 2024-08-19 14:18:59,226 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4502390.0, ans=0.125 2024-08-19 14:19:30,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4502590.0, ans=0.0 2024-08-19 14:20:48,578 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 5700, loss[loss=0.1115, beats_loss=0.009776, ecapa_loss=0.0001401, whisper_loss=0.1004, over 23416.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01049, ecapa_loss=0.0001421, whisper_loss=0.08952, over 3883092.15 frames. ], batch size: 94, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:21:13,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4502890.0, ans=0.0 2024-08-19 14:21:23,878 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.93 vs. limit=15.0 2024-08-19 14:21:56,032 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 15 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-19 14:22:27,000 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4503190.0, ans=10.0 2024-08-19 14:22:35,301 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.08 vs. limit=12.0 2024-08-19 14:22:47,177 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.393e+01 2.644e+01 2.920e+01 4.310e+01, threshold=5.288e+01, percent-clipped=0.0 2024-08-19 14:22:51,866 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4503290.0, ans=0.125 2024-08-19 14:22:51,875 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4503290.0, ans=0.125 2024-08-19 14:23:03,905 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 5750, loss[loss=0.09576, beats_loss=0.011, ecapa_loss=0.0001802, whisper_loss=0.08296, over 17979.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01055, ecapa_loss=0.0001424, whisper_loss=0.08967, over 3904844.36 frames. ], batch size: 75, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:23:13,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4503390.0, ans=0.125 2024-08-19 14:23:32,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4503490.0, ans=0.125 2024-08-19 14:24:12,984 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 16 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-19 14:24:50,101 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4503790.0, ans=0.2 2024-08-19 14:24:56,356 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-19 14:25:15,867 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 5800, loss[loss=0.07625, beats_loss=0.01153, ecapa_loss=0.0001519, whisper_loss=0.0632, over 22562.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01054, ecapa_loss=0.0001422, whisper_loss=0.08935, over 3851380.83 frames. ], batch size: 96, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:25:33,678 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 21 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-19 14:25:47,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4503990.0, ans=0.125 2024-08-19 14:25:53,500 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.99 vs. limit=12.0 2024-08-19 14:25:56,430 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-19 14:26:35,461 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.790e+01 2.385e+01 2.563e+01 2.878e+01 9.355e+01, threshold=5.127e+01, percent-clipped=2.0 2024-08-19 14:26:41,641 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 15 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-19 14:26:47,230 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 5850, loss[loss=0.07497, beats_loss=0.01288, ecapa_loss=0.0001409, whisper_loss=0.06068, over 19318.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01053, ecapa_loss=0.0001418, whisper_loss=0.08921, over 3847780.27 frames. ], batch size: 80, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:26:47,392 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-19 14:27:09,168 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-19 14:27:10,883 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4504490.0, ans=0.0 2024-08-19 14:27:21,943 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.49 vs. limit=15.0 2024-08-19 14:27:26,799 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.13 vs. limit=15.0 2024-08-19 14:27:28,793 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 21 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-19 14:27:45,125 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4504590.0, ans=0.0 2024-08-19 14:27:58,803 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4504690.0, ans=0.125 2024-08-19 14:28:02,203 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4504690.0, ans=0.125 2024-08-19 14:28:02,587 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.93 vs. limit=15.0 2024-08-19 14:28:18,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4504790.0, ans=0.0 2024-08-19 14:28:22,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4504790.0, ans=0.1 2024-08-19 14:28:26,517 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 23 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-19 14:28:28,252 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 5900, loss[loss=0.09546, beats_loss=0.01016, ecapa_loss=0.0001809, whisper_loss=0.08349, over 19200.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01049, ecapa_loss=0.0001418, whisper_loss=0.08924, over 3864798.16 frames. ], batch size: 81, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:28:28,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4504890.0, ans=0.2 2024-08-19 14:28:31,488 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.62 vs. limit=15.0 2024-08-19 14:28:54,981 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4504990.0, ans=0.0 2024-08-19 14:29:24,717 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4505190.0, ans=0.1 2024-08-19 14:29:29,078 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4505190.0, ans=0.0 2024-08-19 14:29:51,847 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.307e+01 2.493e+01 2.853e+01 3.698e+01, threshold=4.986e+01, percent-clipped=0.0 2024-08-19 14:30:02,197 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 5950, loss[loss=0.09527, beats_loss=0.01039, ecapa_loss=0.0001567, whisper_loss=0.08332, over 14519.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01052, ecapa_loss=0.0001412, whisper_loss=0.08911, over 3854192.50 frames. ], batch size: 59, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:30:10,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4505390.0, ans=0.1 2024-08-19 14:30:12,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4505390.0, ans=0.0 2024-08-19 14:30:16,253 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 20 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-19 14:30:18,192 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4505390.0, ans=0.0 2024-08-19 14:30:32,021 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-19 14:30:52,726 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-19 14:31:14,506 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-19 14:31:42,947 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 6000, loss[loss=0.1123, beats_loss=0.009994, ecapa_loss=0.0001278, whisper_loss=0.1011, over 23221.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01063, ecapa_loss=0.0001405, whisper_loss=0.08869, over 3864563.24 frames. ], batch size: 91, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:31:42,947 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-19 14:32:32,835 INFO [train_multi_KD3.py:1149] (2/4) Epoch 31, validation on ASR_libri: loss=0.2517, beats_loss=0, ecapa_loss=0.0005139, whisper_loss=0.2466, over 922467.00 frames. 2024-08-19 14:32:50,724 INFO [train_multi_KD3.py:1149] (2/4) Epoch 31, validation on SV_voxceleb1: loss=0.003959, beats_loss=0, ecapa_loss=0.0003959, whisper_loss=0, over 939242.00 frames. 2024-08-19 14:34:28,042 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.8094, 2.1227, 2.5172, 1.2316], device='cuda:2') 2024-08-19 14:34:38,991 INFO [train_multi_KD3.py:1149] (2/4) Epoch 31, validation on AT_audioset: loss=0.02302, beats_loss=0.02302, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 14:34:38,994 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31584MB 2024-08-19 14:34:51,416 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4505890.0, ans=0.125 2024-08-19 14:35:16,117 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.21 vs. limit=15.0 2024-08-19 14:35:52,299 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 33 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-19 14:35:55,824 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.316e+01 2.553e+01 2.863e+01 1.087e+02, threshold=5.107e+01, percent-clipped=1.0 2024-08-19 14:35:58,103 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-19 14:36:03,916 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-19 14:36:07,621 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 6050, loss[loss=0.1225, beats_loss=0.009269, ecapa_loss=0.0001604, whisper_loss=0.1116, over 22667.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01055, ecapa_loss=0.0001412, whisper_loss=0.08924, over 3878092.97 frames. ], batch size: 91, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:36:27,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4506490.0, ans=0.125 2024-08-19 14:36:38,486 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 28 from Vox, 23 fro AS 2024-08-19 14:36:42,411 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-19 14:36:45,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4506590.0, ans=0.125 2024-08-19 14:36:51,565 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4506590.0, ans=0.015 2024-08-19 14:36:53,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4506590.0, ans=0.125 2024-08-19 14:37:01,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4506590.0, ans=0.1 2024-08-19 14:37:04,844 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 35 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-19 14:37:07,001 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4506690.0, ans=0.125 2024-08-19 14:37:09,320 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-19 14:37:23,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4506790.0, ans=0.0 2024-08-19 14:37:35,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4506790.0, ans=0.025 2024-08-19 14:37:42,762 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 6100, loss[loss=0.08775, beats_loss=0.008182, ecapa_loss=0.0001117, whisper_loss=0.07846, over 14719.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01058, ecapa_loss=0.0001413, whisper_loss=0.08881, over 3871179.82 frames. ], batch size: 53, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:37:46,257 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 24 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-19 14:37:51,203 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-19 14:38:12,163 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 23 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-19 14:38:13,749 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-19 14:38:15,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4506990.0, ans=0.0 2024-08-19 14:38:17,928 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4507090.0, ans=0.125 2024-08-19 14:38:20,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4507090.0, ans=0.125 2024-08-19 14:38:32,025 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-19 14:38:34,331 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-19 14:38:34,339 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4507090.0, ans=0.09899494936611666 2024-08-19 14:38:35,370 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 27 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-19 14:38:41,532 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4507190.0, ans=0.125 2024-08-19 14:38:42,965 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4507190.0, ans=0.0 2024-08-19 14:38:45,683 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 32 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-19 14:38:58,026 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.243e+01 2.514e+01 2.857e+01 4.099e+01, threshold=5.029e+01, percent-clipped=0.0 2024-08-19 14:39:07,564 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 6150, loss[loss=0.1009, beats_loss=0.01069, ecapa_loss=0.0001243, whisper_loss=0.08895, over 13719.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01059, ecapa_loss=0.0001408, whisper_loss=0.08889, over 3881484.68 frames. ], batch size: 54, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:39:22,040 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=4507390.0, ans=0.05 2024-08-19 14:39:40,835 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-19 14:39:47,150 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4507590.0, ans=0.0 2024-08-19 14:39:47,467 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.60 vs. limit=15.0 2024-08-19 14:40:01,380 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4507690.0, ans=0.0 2024-08-19 14:40:04,607 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 14:40:07,940 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 23 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-19 14:40:24,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4507790.0, ans=0.1 2024-08-19 14:40:29,559 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 28 from Vox, 24 fro AS 2024-08-19 14:40:29,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4507790.0, ans=0.125 2024-08-19 14:40:29,857 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4507790.0, ans=0.125 2024-08-19 14:40:34,787 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-19 14:40:35,028 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4507790.0, ans=0.2 2024-08-19 14:40:38,437 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 6200, loss[loss=0.1035, beats_loss=0.01115, ecapa_loss=0.000111, whisper_loss=0.09122, over 19707.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01052, ecapa_loss=0.0001417, whisper_loss=0.08884, over 3901551.74 frames. ], batch size: 76, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:40:41,921 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4507890.0, ans=0.09899494936611666 2024-08-19 14:41:13,988 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4508090.0, ans=0.2 2024-08-19 14:41:15,224 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 24 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-19 14:41:20,515 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-19 14:41:37,570 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4508090.0, ans=0.125 2024-08-19 14:42:00,965 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.259e+01 2.437e+01 2.769e+01 4.793e+01, threshold=4.873e+01, percent-clipped=0.0 2024-08-19 14:42:10,026 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 6250, loss[loss=0.09423, beats_loss=0.01194, ecapa_loss=0.0001555, whisper_loss=0.08074, over 21036.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01056, ecapa_loss=0.0001416, whisper_loss=0.08888, over 3900713.76 frames. ], batch size: 91, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:42:13,538 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4508390.0, ans=0.125 2024-08-19 14:42:18,929 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-19 14:42:40,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4508590.0, ans=0.0 2024-08-19 14:42:47,303 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.65 vs. limit=22.5 2024-08-19 14:43:07,449 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4508790.0, ans=0.125 2024-08-19 14:43:08,647 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-19 14:43:13,356 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4508790.0, ans=0.125 2024-08-19 14:43:20,367 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 6300, loss[loss=0.09839, beats_loss=0.00872, ecapa_loss=0.0001298, whisper_loss=0.08837, over 16399.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.0105, ecapa_loss=0.0001426, whisper_loss=0.08872, over 3826681.96 frames. ], batch size: 63, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:44:04,862 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-19 14:44:12,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4509190.0, ans=0.125 2024-08-19 14:44:19,832 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4509290.0, ans=0.09899494936611666 2024-08-19 14:44:21,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4509290.0, ans=0.2 2024-08-19 14:44:23,700 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.807e+01 2.387e+01 2.658e+01 3.235e+01 4.854e+01, threshold=5.315e+01, percent-clipped=0.0 2024-08-19 14:44:30,447 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4509290.0, ans=0.2 2024-08-19 14:44:32,907 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 6350, loss[loss=0.09318, beats_loss=0.01217, ecapa_loss=0.0001444, whisper_loss=0.07956, over 21108.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01057, ecapa_loss=0.0001408, whisper_loss=0.08897, over 3881118.37 frames. ], batch size: 86, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:44:41,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4509390.0, ans=0.0 2024-08-19 14:44:42,944 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.76 vs. limit=22.5 2024-08-19 14:44:49,537 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-19 14:45:03,254 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.34 vs. limit=15.0 2024-08-19 14:45:06,973 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 35 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-19 14:45:08,449 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4509590.0, ans=0.2 2024-08-19 14:45:08,549 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4509590.0, ans=0.125 2024-08-19 14:45:15,567 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4509690.0, ans=0.1 2024-08-19 14:45:18,715 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.76 vs. limit=6.0 2024-08-19 14:45:29,592 WARNING [optim.py:496] (2/4) Scaling gradients by 0.010832761414349079, model_norm_threshold=53.15283203125 2024-08-19 14:45:29,756 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.45, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.092e+07, grad_sumsq=1.044e+09, orig_rms_sq=1.046e-02 2024-08-19 14:45:38,690 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-19 14:45:43,928 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 6400, loss[loss=0.102, beats_loss=0.007653, ecapa_loss=0.0001509, whisper_loss=0.0928, over 15329.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01065, ecapa_loss=0.0001407, whisper_loss=0.08882, over 3901499.22 frames. ], batch size: 58, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:45:44,572 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4509890.0, ans=0.2 2024-08-19 14:45:57,242 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4509990.0, ans=0.1 2024-08-19 14:46:01,418 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-19 14:46:20,638 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.53 vs. limit=22.5 2024-08-19 14:46:27,100 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 15 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-19 14:46:33,194 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4510190.0, ans=0.1 2024-08-19 14:46:37,069 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 27 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-19 14:46:38,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4510190.0, ans=0.0 2024-08-19 14:46:46,824 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.039e+01 2.363e+01 2.610e+01 2.887e+01 4.907e+03, threshold=5.221e+01, percent-clipped=2.0 2024-08-19 14:46:47,489 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.99 vs. limit=15.0 2024-08-19 14:46:55,554 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 6450, loss[loss=0.1175, beats_loss=0.009863, ecapa_loss=0.0001465, whisper_loss=0.1062, over 23277.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01071, ecapa_loss=0.0001392, whisper_loss=0.08893, over 3901009.20 frames. ], batch size: 92, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:46:57,311 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4510390.0, ans=0.0 2024-08-19 14:47:09,825 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 23 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-19 14:47:25,055 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.39 vs. limit=22.5 2024-08-19 14:47:35,979 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4510590.0, ans=0.125 2024-08-19 14:47:37,232 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 29 from LS+wenet, 27 from Vox, 19 fro AS 2024-08-19 14:47:48,622 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 32 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-19 14:48:10,045 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 6500, loss[loss=0.1077, beats_loss=0.01083, ecapa_loss=0.0001497, whisper_loss=0.09535, over 22639.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01057, ecapa_loss=0.0001393, whisper_loss=0.09012, over 3908943.11 frames. ], batch size: 90, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:48:10,426 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4510890.0, ans=0.125 2024-08-19 14:48:13,112 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 17 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-19 14:48:14,838 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4510890.0, ans=0.0 2024-08-19 14:48:14,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4510890.0, ans=0.125 2024-08-19 14:48:15,953 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 17 from LS+wenet, 11 from Vox, 43 fro AS 2024-08-19 14:48:23,171 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4510990.0, ans=0.125 2024-08-19 14:48:43,807 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 29 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-19 14:48:45,347 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4511090.0, ans=0.2 2024-08-19 14:48:46,235 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 24 from LS+wenet, 20 from Vox, 51 fro AS 2024-08-19 14:48:47,906 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4511090.0, ans=0.0 2024-08-19 14:48:50,216 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 21 from LS+wenet, 33 from Vox, 28 fro AS 2024-08-19 14:49:01,106 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4511190.0, ans=0.125 2024-08-19 14:49:07,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4511290.0, ans=0.1 2024-08-19 14:49:13,653 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.374e+01 2.606e+01 2.807e+01 1.094e+02, threshold=5.213e+01, percent-clipped=1.0 2024-08-19 14:49:14,338 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.59 vs. limit=15.0 2024-08-19 14:49:23,907 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 6550, loss[loss=0.1073, beats_loss=0.01055, ecapa_loss=0.0001383, whisper_loss=0.0954, over 20870.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01051, ecapa_loss=0.0001408, whisper_loss=0.09, over 3915817.64 frames. ], batch size: 81, lr: 1.97e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:49:28,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4511390.0, ans=0.125 2024-08-19 14:49:31,433 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-19 14:49:43,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4511490.0, ans=0.125 2024-08-19 14:49:50,849 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4511490.0, ans=0.2 2024-08-19 14:50:01,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4511590.0, ans=0.125 2024-08-19 14:50:09,251 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-19 14:50:16,675 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 35 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-19 14:50:26,035 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4511790.0, ans=0.0 2024-08-19 14:50:38,606 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 6600, loss[loss=0.102, beats_loss=0.009527, ecapa_loss=0.0001189, whisper_loss=0.09133, over 21510.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01049, ecapa_loss=0.0001414, whisper_loss=0.09001, over 3913020.55 frames. ], batch size: 81, lr: 1.97e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:50:44,939 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 23 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-19 14:50:50,928 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 14 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-19 14:50:55,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4511990.0, ans=0.0 2024-08-19 14:51:01,482 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4511990.0, ans=0.125 2024-08-19 14:51:03,075 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4511990.0, ans=0.2 2024-08-19 14:51:23,719 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-19 14:51:24,089 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 14:51:41,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4512290.0, ans=0.0 2024-08-19 14:51:44,339 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.291e+01 2.479e+01 2.799e+01 1.189e+02, threshold=4.958e+01, percent-clipped=2.0 2024-08-19 14:51:53,302 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 6650, loss[loss=0.108, beats_loss=0.009531, ecapa_loss=0.0001562, whisper_loss=0.09688, over 21371.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01045, ecapa_loss=0.0001416, whisper_loss=0.09077, over 3904683.44 frames. ], batch size: 89, lr: 1.97e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:51:54,605 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-19 14:51:55,198 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.97 vs. limit=15.0 2024-08-19 14:52:05,421 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 16 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-19 14:52:14,586 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4512490.0, ans=0.125 2024-08-19 14:52:28,983 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4512590.0, ans=0.125 2024-08-19 14:52:30,029 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-19 14:53:04,789 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 6700, loss[loss=0.09167, beats_loss=0.009227, ecapa_loss=0.0001384, whisper_loss=0.08106, over 16210.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01043, ecapa_loss=0.0001426, whisper_loss=0.0907, over 3901622.91 frames. ], batch size: 61, lr: 1.97e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:53:06,783 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4512890.0, ans=0.0 2024-08-19 14:53:08,132 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-19 14:53:15,717 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4512890.0, ans=0.125 2024-08-19 14:53:20,533 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4512990.0, ans=0.09899494936611666 2024-08-19 14:53:28,665 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.20 vs. limit=10.0 2024-08-19 14:53:29,812 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4512990.0, ans=0.125 2024-08-19 14:53:37,628 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 9 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-19 14:54:01,025 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.72 vs. limit=15.0 2024-08-19 14:54:14,030 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.375e+01 2.659e+01 3.001e+01 3.941e+01, threshold=5.319e+01, percent-clipped=0.0 2024-08-19 14:54:20,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4513290.0, ans=0.2 2024-08-19 14:54:20,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4513290.0, ans=0.125 2024-08-19 14:54:23,299 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 6750, loss[loss=0.08952, beats_loss=0.01286, ecapa_loss=0.0001186, whisper_loss=0.07547, over 22482.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0104, ecapa_loss=0.0001423, whisper_loss=0.0908, over 3906936.63 frames. ], batch size: 92, lr: 1.97e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:54:29,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4513390.0, ans=0.0 2024-08-19 14:54:29,993 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.44 vs. limit=15.0 2024-08-19 14:54:32,130 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 15 from LS+wenet, 10 from Vox, 28 fro AS 2024-08-19 14:54:45,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4513490.0, ans=0.125 2024-08-19 14:54:49,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4513490.0, ans=0.125 2024-08-19 14:54:53,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4513590.0, ans=0.0 2024-08-19 14:54:55,346 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 19 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-19 14:54:57,901 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-19 14:55:19,109 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4513790.0, ans=0.125 2024-08-19 14:55:28,869 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 6800, loss[loss=0.09286, beats_loss=0.01076, ecapa_loss=0.0001622, whisper_loss=0.08048, over 22005.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01041, ecapa_loss=0.0001427, whisper_loss=0.09107, over 3888116.16 frames. ], batch size: 92, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 14:56:01,086 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4514090.0, ans=0.125 2024-08-19 14:56:08,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4514190.0, ans=0.1 2024-08-19 14:56:23,586 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.299e+01 2.607e+01 2.944e+01 3.489e+02, threshold=5.214e+01, percent-clipped=2.0 2024-08-19 14:56:26,691 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4514290.0, ans=0.125 2024-08-19 14:56:31,348 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 6850, loss[loss=0.0951, beats_loss=0.0115, ecapa_loss=0.0001358, whisper_loss=0.08224, over 16935.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01042, ecapa_loss=0.0001424, whisper_loss=0.0909, over 3870850.94 frames. ], batch size: 68, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 14:56:35,144 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-19 14:56:36,253 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-19 14:56:39,464 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2024-08-19 14:56:49,219 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.77 vs. limit=15.0 2024-08-19 14:56:51,570 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4514490.0, ans=0.125 2024-08-19 14:56:58,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4514590.0, ans=0.2 2024-08-19 14:57:03,643 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-19 14:57:07,929 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4514690.0, ans=0.125 2024-08-19 14:57:11,631 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4514690.0, ans=0.1 2024-08-19 14:57:19,067 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4514690.0, ans=0.0 2024-08-19 14:57:22,379 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 20 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-19 14:57:25,164 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4514790.0, ans=0.125 2024-08-19 14:57:25,525 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.06 vs. limit=15.0 2024-08-19 14:57:29,749 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 18 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-19 14:57:33,382 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 6900, loss[loss=0.09857, beats_loss=0.01137, ecapa_loss=0.0001414, whisper_loss=0.08578, over 18070.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01042, ecapa_loss=0.0001418, whisper_loss=0.09065, over 3889527.62 frames. ], batch size: 71, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 14:57:36,209 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4514890.0, ans=0.0 2024-08-19 14:57:41,360 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4514890.0, ans=0.125 2024-08-19 14:57:53,581 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4514990.0, ans=0.0 2024-08-19 14:58:13,570 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.80 vs. limit=15.0 2024-08-19 14:58:22,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4515290.0, ans=0.2 2024-08-19 14:58:27,594 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.255e+01 2.566e+01 2.840e+01 4.154e+01, threshold=5.133e+01, percent-clipped=0.0 2024-08-19 14:58:33,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4515290.0, ans=0.125 2024-08-19 14:58:34,565 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4515390.0, ans=0.1 2024-08-19 14:58:35,200 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 6950, loss[loss=0.08849, beats_loss=0.01027, ecapa_loss=0.0001006, whisper_loss=0.07722, over 16172.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0105, ecapa_loss=0.0001412, whisper_loss=0.09044, over 3905964.44 frames. ], batch size: 60, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 14:58:43,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4515390.0, ans=0.125 2024-08-19 14:58:53,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=4515490.0, ans=10.0 2024-08-19 14:58:57,203 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.67 vs. limit=15.0 2024-08-19 14:58:58,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4515490.0, ans=0.1 2024-08-19 14:59:06,204 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 24 from LS+wenet, 34 from Vox, 38 fro AS 2024-08-19 14:59:13,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4515690.0, ans=0.125 2024-08-19 14:59:20,331 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 39 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-19 14:59:20,647 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4515690.0, ans=0.125 2024-08-19 14:59:37,385 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 7000, loss[loss=0.1071, beats_loss=0.009888, ecapa_loss=0.0001287, whisper_loss=0.09597, over 22870.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01056, ecapa_loss=0.0001407, whisper_loss=0.08984, over 3902151.49 frames. ], batch size: 89, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 14:59:43,463 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-19 14:59:55,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4515990.0, ans=0.09899494936611666 2024-08-19 15:00:01,111 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4516090.0, ans=0.125 2024-08-19 15:00:01,409 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.91 vs. limit=15.0 2024-08-19 15:00:11,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4516090.0, ans=0.125 2024-08-19 15:00:14,303 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-19 15:00:31,228 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.030e+01 2.441e+01 2.633e+01 3.064e+01 9.212e+01, threshold=5.267e+01, percent-clipped=3.0 2024-08-19 15:00:38,229 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.02 vs. limit=22.5 2024-08-19 15:00:38,727 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 7050, loss[loss=0.06765, beats_loss=0.01252, ecapa_loss=0.0001434, whisper_loss=0.0537, over 15347.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0105, ecapa_loss=0.0001417, whisper_loss=0.09058, over 3899811.96 frames. ], batch size: 63, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:00:46,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4516390.0, ans=0.0 2024-08-19 15:00:48,269 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.63 vs. limit=15.0 2024-08-19 15:01:05,691 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.whiten.whitening_limit, batch_count=4516590.0, ans=12.0 2024-08-19 15:01:06,198 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-19 15:01:29,282 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4516790.0, ans=0.1 2024-08-19 15:01:30,888 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.41 vs. limit=15.0 2024-08-19 15:01:38,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4516790.0, ans=0.2 2024-08-19 15:01:40,653 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 7100, loss[loss=0.09711, beats_loss=0.01078, ecapa_loss=0.0001304, whisper_loss=0.08503, over 17609.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01052, ecapa_loss=0.0001411, whisper_loss=0.09037, over 3885505.33 frames. ], batch size: 70, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:01:45,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4516890.0, ans=0.125 2024-08-19 15:01:49,084 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.74 vs. limit=6.0 2024-08-19 15:01:49,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4516890.0, ans=0.0 2024-08-19 15:01:51,083 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4516890.0, ans=0.125 2024-08-19 15:01:53,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4516990.0, ans=0.04949747468305833 2024-08-19 15:01:55,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4516990.0, ans=0.1 2024-08-19 15:01:56,641 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 40 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-19 15:01:57,737 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-19 15:02:01,480 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-19 15:02:14,529 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-19 15:02:16,012 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4517090.0, ans=0.2 2024-08-19 15:02:19,104 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.32 vs. limit=10.0 2024-08-19 15:02:32,788 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.17 vs. limit=10.0 2024-08-19 15:02:35,776 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.673e+01 2.230e+01 2.578e+01 2.810e+01 3.581e+01, threshold=5.156e+01, percent-clipped=0.0 2024-08-19 15:02:43,323 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 7150, loss[loss=0.08523, beats_loss=0.01024, ecapa_loss=0.0001727, whisper_loss=0.07327, over 16962.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01049, ecapa_loss=0.0001408, whisper_loss=0.09042, over 3871622.72 frames. ], batch size: 74, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:02:43,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4517390.0, ans=0.125 2024-08-19 15:02:53,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4517390.0, ans=0.125 2024-08-19 15:02:58,400 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4517490.0, ans=0.125 2024-08-19 15:03:08,863 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4517590.0, ans=0.0 2024-08-19 15:03:16,445 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 21 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-19 15:03:26,440 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 19 from LS+wenet, 17 from Vox, 54 fro AS 2024-08-19 15:03:44,476 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.25 vs. limit=15.0 2024-08-19 15:03:45,992 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 7200, loss[loss=0.1357, beats_loss=0.007914, ecapa_loss=0.0001308, whisper_loss=0.1264, over 15074.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01038, ecapa_loss=0.0001415, whisper_loss=0.09096, over 3836831.15 frames. ], batch size: 55, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:04:02,191 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-19 15:04:04,672 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-19 15:04:07,578 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.61 vs. limit=12.0 2024-08-19 15:04:14,495 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4518090.0, ans=0.125 2024-08-19 15:04:20,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4518090.0, ans=0.07 2024-08-19 15:04:30,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4518190.0, ans=0.04949747468305833 2024-08-19 15:04:39,776 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.333e+01 2.621e+01 2.974e+01 6.907e+01, threshold=5.242e+01, percent-clipped=0.0 2024-08-19 15:04:46,968 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 7250, loss[loss=0.1078, beats_loss=0.01103, ecapa_loss=0.0001493, whisper_loss=0.09525, over 19464.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01045, ecapa_loss=0.0001409, whisper_loss=0.09082, over 3889455.77 frames. ], batch size: 83, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:05:02,948 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-19 15:05:09,091 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 24 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-19 15:05:09,343 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4518490.0, ans=0.0 2024-08-19 15:05:10,241 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 31 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-19 15:05:24,605 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4518690.0, ans=0.125 2024-08-19 15:05:30,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4518690.0, ans=0.125 2024-08-19 15:05:32,590 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.61 vs. limit=22.5 2024-08-19 15:05:40,742 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4518790.0, ans=0.125 2024-08-19 15:05:44,166 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-19 15:05:47,602 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 7300, loss[loss=0.09037, beats_loss=0.00977, ecapa_loss=0.000154, whisper_loss=0.07906, over 22665.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0104, ecapa_loss=0.0001425, whisper_loss=0.09094, over 3871992.38 frames. ], batch size: 91, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:06:01,396 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4518990.0, ans=0.125 2024-08-19 15:06:08,831 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4518990.0, ans=0.2 2024-08-19 15:06:12,726 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4519090.0, ans=0.125 2024-08-19 15:06:33,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4519190.0, ans=0.125 2024-08-19 15:06:41,771 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.312e+01 2.518e+01 2.847e+01 5.686e+01, threshold=5.035e+01, percent-clipped=2.0 2024-08-19 15:06:49,085 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 7350, loss[loss=0.1136, beats_loss=0.009908, ecapa_loss=0.0001467, whisper_loss=0.1022, over 18078.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01038, ecapa_loss=0.0001437, whisper_loss=0.09095, over 3882522.39 frames. ], batch size: 71, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:06:53,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4519390.0, ans=0.0 2024-08-19 15:07:01,343 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4519490.0, ans=0.2 2024-08-19 15:07:03,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4519490.0, ans=0.1 2024-08-19 15:07:03,860 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4519490.0, ans=0.125 2024-08-19 15:07:25,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4519690.0, ans=0.0 2024-08-19 15:07:30,853 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4519690.0, ans=0.125 2024-08-19 15:07:30,917 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4519690.0, ans=0.125 2024-08-19 15:07:33,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4519690.0, ans=0.0 2024-08-19 15:07:50,035 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 7400, loss[loss=0.09266, beats_loss=0.01216, ecapa_loss=0.0001398, whisper_loss=0.0791, over 17515.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01047, ecapa_loss=0.0001418, whisper_loss=0.09015, over 3906580.89 frames. ], batch size: 69, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:07:51,565 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4519890.0, ans=0.1 2024-08-19 15:07:51,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4519890.0, ans=0.125 2024-08-19 15:07:58,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4519890.0, ans=0.125 2024-08-19 15:08:00,045 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4519890.0, ans=0.1 2024-08-19 15:08:16,205 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 18 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-19 15:08:47,147 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.604e+01 2.272e+01 2.510e+01 2.684e+01 4.218e+01, threshold=5.020e+01, percent-clipped=0.0 2024-08-19 15:08:53,987 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 7450, loss[loss=0.1017, beats_loss=0.01161, ecapa_loss=0.0001425, whisper_loss=0.0887, over 22927.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01049, ecapa_loss=0.0001417, whisper_loss=0.09018, over 3894928.61 frames. ], batch size: 94, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:09:04,734 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4520390.0, ans=0.2 2024-08-19 15:09:13,242 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-19 15:09:13,811 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.03 vs. limit=22.5 2024-08-19 15:09:23,634 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4520590.0, ans=0.0 2024-08-19 15:09:41,041 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.96 vs. limit=6.0 2024-08-19 15:09:51,633 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 27 from LS+wenet, 29 from Vox, 27 fro AS 2024-08-19 15:09:55,856 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.01 vs. limit=15.0 2024-08-19 15:09:56,178 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 7500, loss[loss=0.1211, beats_loss=0.01113, ecapa_loss=0.0001158, whisper_loss=0.1088, over 20159.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01052, ecapa_loss=0.0001421, whisper_loss=0.08985, over 3893562.11 frames. ], batch size: 78, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:10:04,956 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 30 from LS+wenet, 16 from Vox, 49 fro AS 2024-08-19 15:10:14,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4520990.0, ans=0.125 2024-08-19 15:10:19,207 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4520990.0, ans=0.1 2024-08-19 15:10:21,648 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4521090.0, ans=0.0 2024-08-19 15:10:32,244 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 20 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-19 15:10:38,780 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-19 15:10:39,455 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.47 vs. limit=10.0 2024-08-19 15:10:47,518 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 21 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-19 15:10:51,087 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.958e+01 2.277e+01 2.526e+01 2.958e+01 5.169e+01, threshold=5.052e+01, percent-clipped=1.0 2024-08-19 15:10:58,355 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 7550, loss[loss=0.08904, beats_loss=0.01033, ecapa_loss=0.0001472, whisper_loss=0.07723, over 21092.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01054, ecapa_loss=0.0001418, whisper_loss=0.08955, over 3856056.20 frames. ], batch size: 87, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:11:21,622 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-19 15:11:22,099 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.72 vs. limit=15.0 2024-08-19 15:11:27,671 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 28 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-19 15:11:50,762 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 24 from LS+wenet, 25 from Vox, 20 fro AS 2024-08-19 15:11:54,935 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4521790.0, ans=0.125 2024-08-19 15:11:56,120 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4521790.0, ans=0.1 2024-08-19 15:11:59,534 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 7600, loss[loss=0.08713, beats_loss=0.009625, ecapa_loss=0.000114, whisper_loss=0.07636, over 15973.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01051, ecapa_loss=0.000142, whisper_loss=0.08932, over 3824557.44 frames. ], batch size: 59, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:12:11,533 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-19 15:12:29,081 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 25 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-19 15:12:33,042 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-19 15:12:38,180 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4522190.0, ans=0.05 2024-08-19 15:12:49,460 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4522290.0, ans=0.1 2024-08-19 15:12:54,041 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+01 2.291e+01 2.555e+01 2.938e+01 1.089e+02, threshold=5.110e+01, percent-clipped=2.0 2024-08-19 15:12:54,428 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4522290.0, ans=0.125 2024-08-19 15:12:59,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4522290.0, ans=0.125 2024-08-19 15:13:01,257 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 7650, loss[loss=0.1014, beats_loss=0.00979, ecapa_loss=0.0001511, whisper_loss=0.09007, over 20785.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0104, ecapa_loss=0.000142, whisper_loss=0.09061, over 3879760.53 frames. ], batch size: 83, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:13:01,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4522390.0, ans=0.125 2024-08-19 15:13:26,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4522590.0, ans=0.125 2024-08-19 15:13:28,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4522590.0, ans=0.0 2024-08-19 15:13:30,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4522590.0, ans=0.2 2024-08-19 15:13:31,562 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4522590.0, ans=0.125 2024-08-19 15:13:38,443 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 33 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-19 15:13:42,041 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-19 15:13:49,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4522790.0, ans=0.125 2024-08-19 15:13:54,116 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 17 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-19 15:13:57,120 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.62 vs. limit=15.0 2024-08-19 15:14:02,331 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 7700, loss[loss=0.07777, beats_loss=0.01246, ecapa_loss=0.0001194, whisper_loss=0.06411, over 16471.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01041, ecapa_loss=0.000141, whisper_loss=0.09023, over 3891697.61 frames. ], batch size: 65, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:14:20,114 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-19 15:14:20,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4522990.0, ans=0.125 2024-08-19 15:14:21,896 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4522990.0, ans=0.0 2024-08-19 15:14:29,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4523090.0, ans=0.1 2024-08-19 15:14:29,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4523090.0, ans=0.125 2024-08-19 15:14:29,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4523090.0, ans=0.125 2024-08-19 15:14:30,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4523090.0, ans=0.125 2024-08-19 15:14:31,778 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4523090.0, ans=0.0 2024-08-19 15:14:33,245 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.55 vs. limit=6.0 2024-08-19 15:14:40,292 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.80 vs. limit=15.0 2024-08-19 15:14:42,553 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4523190.0, ans=0.125 2024-08-19 15:14:49,007 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.67 vs. limit=10.0 2024-08-19 15:14:49,736 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 30 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-19 15:14:55,509 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.405e+01 2.630e+01 2.865e+01 8.039e+01, threshold=5.260e+01, percent-clipped=1.0 2024-08-19 15:15:01,034 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4523290.0, ans=0.1 2024-08-19 15:15:03,004 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 7750, loss[loss=0.1089, beats_loss=0.01016, ecapa_loss=0.0001459, whisper_loss=0.09728, over 22327.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01036, ecapa_loss=0.0001411, whisper_loss=0.09054, over 3900832.74 frames. ], batch size: 91, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:15:32,548 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4523590.0, ans=0.0 2024-08-19 15:15:34,896 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4523590.0, ans=0.0 2024-08-19 15:15:34,928 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4523590.0, ans=0.0 2024-08-19 15:15:38,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4523690.0, ans=0.1 2024-08-19 15:15:43,465 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4523690.0, ans=0.125 2024-08-19 15:15:44,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4523690.0, ans=0.1 2024-08-19 15:15:51,612 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4523790.0, ans=0.125 2024-08-19 15:15:52,922 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4523790.0, ans=0.0 2024-08-19 15:16:00,741 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.14 vs. limit=15.0 2024-08-19 15:16:03,690 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 7800, loss[loss=0.08398, beats_loss=0.01216, ecapa_loss=0.0001281, whisper_loss=0.07054, over 20906.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01039, ecapa_loss=0.0001412, whisper_loss=0.08978, over 3869990.86 frames. ], batch size: 88, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:16:05,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4523890.0, ans=0.2 2024-08-19 15:16:09,902 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4523890.0, ans=0.0 2024-08-19 15:16:33,604 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4524090.0, ans=0.125 2024-08-19 15:16:41,129 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4524190.0, ans=0.0 2024-08-19 15:16:47,060 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-19 15:16:49,530 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4524190.0, ans=0.0 2024-08-19 15:16:56,150 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2024-08-19 15:16:56,327 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.665e+01 2.331e+01 2.591e+01 2.941e+01 6.755e+01, threshold=5.181e+01, percent-clipped=1.0 2024-08-19 15:17:03,386 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 7850, loss[loss=0.1104, beats_loss=0.01161, ecapa_loss=0.0001196, whisper_loss=0.09757, over 22176.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01042, ecapa_loss=0.0001402, whisper_loss=0.09075, over 3890738.81 frames. ], batch size: 84, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:17:38,747 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4524690.0, ans=0.125 2024-08-19 15:17:42,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4524690.0, ans=0.2 2024-08-19 15:17:46,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4524690.0, ans=0.1 2024-08-19 15:17:51,603 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-19 15:18:03,661 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 7900, loss[loss=0.1, beats_loss=0.01251, ecapa_loss=0.000119, whisper_loss=0.08635, over 21723.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0105, ecapa_loss=0.00014, whisper_loss=0.09032, over 3874097.39 frames. ], batch size: 87, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:18:13,612 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4524890.0, ans=0.0 2024-08-19 15:18:18,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4524990.0, ans=0.0 2024-08-19 15:18:30,498 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=4525090.0, ans=0.5 2024-08-19 15:18:36,869 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-19 15:18:37,272 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4525090.0, ans=0.2 2024-08-19 15:18:37,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4525090.0, ans=0.0 2024-08-19 15:18:40,923 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4525190.0, ans=0.125 2024-08-19 15:18:43,082 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-19 15:18:45,628 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4525190.0, ans=0.1 2024-08-19 15:18:46,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4525190.0, ans=0.09899494936611666 2024-08-19 15:18:51,685 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4525290.0, ans=0.1 2024-08-19 15:18:56,258 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.850e+01 2.323e+01 2.530e+01 2.924e+01 2.485e+02, threshold=5.060e+01, percent-clipped=4.0 2024-08-19 15:19:03,378 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 7950, loss[loss=0.1295, beats_loss=0.008284, ecapa_loss=0.0001603, whisper_loss=0.1196, over 23353.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01055, ecapa_loss=0.0001411, whisper_loss=0.09068, over 3895243.96 frames. ], batch size: 93, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:19:03,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4525390.0, ans=0.0 2024-08-19 15:19:05,196 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.11 vs. limit=12.0 2024-08-19 15:19:10,621 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 25 from LS+wenet, 20 from Vox, 49 fro AS 2024-08-19 15:19:14,433 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4525490.0, ans=0.2 2024-08-19 15:19:50,452 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 17 from LS+wenet, 31 from Vox, 46 fro AS 2024-08-19 15:20:00,177 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4525790.0, ans=0.0 2024-08-19 15:20:02,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4525890.0, ans=0.0 2024-08-19 15:20:03,580 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 8000, loss[loss=0.1107, beats_loss=0.008917, ecapa_loss=0.000144, whisper_loss=0.1003, over 19030.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01056, ecapa_loss=0.0001403, whisper_loss=0.09015, over 3883464.58 frames. ], batch size: 70, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:20:04,194 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.25 vs. limit=15.0 2024-08-19 15:20:09,333 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 23 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-19 15:20:10,676 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4525890.0, ans=0.125 2024-08-19 15:20:17,969 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 20 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-19 15:20:24,075 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-19 15:20:30,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4526090.0, ans=0.5 2024-08-19 15:20:32,698 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4526090.0, ans=0.125 2024-08-19 15:20:40,915 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 27 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-19 15:20:56,405 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.249e+01 2.539e+01 2.836e+01 4.576e+01, threshold=5.079e+01, percent-clipped=0.0 2024-08-19 15:21:03,676 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 8050, loss[loss=0.08882, beats_loss=0.008852, ecapa_loss=0.0001671, whisper_loss=0.07829, over 14751.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01061, ecapa_loss=0.0001396, whisper_loss=0.08985, over 3908122.34 frames. ], batch size: 58, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:21:05,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4526390.0, ans=0.1 2024-08-19 15:21:06,037 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 19 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-19 15:21:20,687 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-19 15:21:20,906 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4526490.0, ans=0.125 2024-08-19 15:21:41,477 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4526690.0, ans=10.0 2024-08-19 15:21:47,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4526690.0, ans=0.125 2024-08-19 15:21:49,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4526690.0, ans=0.1 2024-08-19 15:21:58,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4526790.0, ans=0.125 2024-08-19 15:22:01,036 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.37 vs. limit=10.0 2024-08-19 15:22:03,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4526890.0, ans=0.125 2024-08-19 15:22:03,879 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 8100, loss[loss=0.1217, beats_loss=0.008816, ecapa_loss=0.0001527, whisper_loss=0.1114, over 22214.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0105, ecapa_loss=0.0001404, whisper_loss=0.0905, over 3931633.66 frames. ], batch size: 88, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:22:14,085 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.583e+00 2024-08-19 15:22:22,248 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4526990.0, ans=0.125 2024-08-19 15:22:24,784 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4526990.0, ans=0.125 2024-08-19 15:22:25,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4526990.0, ans=0.0 2024-08-19 15:22:44,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4527190.0, ans=0.125 2024-08-19 15:22:44,202 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.32 vs. limit=6.0 2024-08-19 15:22:51,313 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.83 vs. limit=15.0 2024-08-19 15:22:56,203 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.730e+01 2.370e+01 2.531e+01 2.810e+01 1.337e+02, threshold=5.062e+01, percent-clipped=2.0 2024-08-19 15:23:03,535 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 8150, loss[loss=0.1218, beats_loss=0.009647, ecapa_loss=0.0001334, whisper_loss=0.1108, over 22355.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01042, ecapa_loss=0.0001408, whisper_loss=0.09075, over 3922512.38 frames. ], batch size: 89, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:23:22,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4527490.0, ans=0.0 2024-08-19 15:23:27,433 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4527590.0, ans=0.1 2024-08-19 15:23:39,576 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 24 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-19 15:23:44,367 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 26 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-19 15:23:45,528 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-19 15:23:48,091 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=4527690.0, ans=0.05 2024-08-19 15:23:48,338 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.59 vs. limit=15.0 2024-08-19 15:23:56,216 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.81 vs. limit=15.0 2024-08-19 15:23:58,192 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-19 15:24:03,097 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 8200, loss[loss=0.07089, beats_loss=0.01231, ecapa_loss=0.0001224, whisper_loss=0.05736, over 16887.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01039, ecapa_loss=0.0001422, whisper_loss=0.0904, over 3919963.80 frames. ], batch size: 68, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:24:11,676 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-19 15:24:11,922 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=4527890.0, ans=0.5 2024-08-19 15:24:14,397 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4527990.0, ans=0.125 2024-08-19 15:24:20,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4527990.0, ans=0.125 2024-08-19 15:24:31,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4528090.0, ans=0.0 2024-08-19 15:24:38,626 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.07 vs. limit=12.0 2024-08-19 15:24:39,826 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.97 vs. limit=15.0 2024-08-19 15:24:44,071 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 17 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-19 15:24:51,472 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 34 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-19 15:24:55,842 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.279e+01 2.474e+01 2.838e+01 8.024e+01, threshold=4.948e+01, percent-clipped=1.0 2024-08-19 15:25:03,234 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 8250, loss[loss=0.103, beats_loss=0.009769, ecapa_loss=0.0001425, whisper_loss=0.09177, over 21112.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01036, ecapa_loss=0.0001419, whisper_loss=0.09023, over 3887430.01 frames. ], batch size: 84, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:25:26,304 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-19 15:25:41,092 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=4528690.0, ans=10.0 2024-08-19 15:26:03,568 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 8300, loss[loss=0.1211, beats_loss=0.006542, ecapa_loss=0.0001638, whisper_loss=0.1129, over 14031.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01039, ecapa_loss=0.0001418, whisper_loss=0.09043, over 3912905.61 frames. ], batch size: 54, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:26:26,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4529090.0, ans=0.125 2024-08-19 15:26:32,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=4529090.0, ans=0.1 2024-08-19 15:26:44,168 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4529190.0, ans=0.125 2024-08-19 15:26:45,688 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.44 vs. limit=15.0 2024-08-19 15:26:55,592 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.344e+01 2.567e+01 2.934e+01 1.286e+02, threshold=5.133e+01, percent-clipped=2.0 2024-08-19 15:26:55,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4529290.0, ans=0.125 2024-08-19 15:27:02,881 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 8350, loss[loss=0.1069, beats_loss=0.00902, ecapa_loss=0.0001565, whisper_loss=0.09631, over 15809.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01037, ecapa_loss=0.000143, whisper_loss=0.09049, over 3910143.85 frames. ], batch size: 63, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:27:04,267 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 25 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-19 15:27:05,478 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 22 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-19 15:27:07,032 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4529390.0, ans=0.5 2024-08-19 15:27:16,254 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 20 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-19 15:27:23,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4529490.0, ans=0.0 2024-08-19 15:27:31,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4529590.0, ans=0.0 2024-08-19 15:27:39,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4529690.0, ans=0.125 2024-08-19 15:27:39,299 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4529690.0, ans=0.125 2024-08-19 15:27:40,099 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 31 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-19 15:27:43,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4529690.0, ans=0.0 2024-08-19 15:27:44,040 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.63 vs. limit=15.0 2024-08-19 15:27:45,874 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 30 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-19 15:27:59,467 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.76 vs. limit=15.0 2024-08-19 15:28:02,403 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 8400, loss[loss=0.08462, beats_loss=0.01301, ecapa_loss=0.0001233, whisper_loss=0.07038, over 20925.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01037, ecapa_loss=0.0001425, whisper_loss=0.09104, over 3916397.12 frames. ], batch size: 86, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:28:06,391 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4529890.0, ans=0.0 2024-08-19 15:28:15,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4529990.0, ans=0.125 2024-08-19 15:28:21,017 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4529990.0, ans=0.0 2024-08-19 15:28:29,395 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4530090.0, ans=0.125 2024-08-19 15:28:30,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4530090.0, ans=0.125 2024-08-19 15:28:54,962 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.308e+01 2.547e+01 2.800e+01 8.519e+01, threshold=5.094e+01, percent-clipped=2.0 2024-08-19 15:28:56,888 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.37 vs. limit=10.0 2024-08-19 15:29:01,978 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 8450, loss[loss=0.1188, beats_loss=0.007905, ecapa_loss=0.0001665, whisper_loss=0.1092, over 23771.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01043, ecapa_loss=0.0001412, whisper_loss=0.09032, over 3871721.78 frames. ], batch size: 95, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:29:03,275 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 19 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-19 15:29:04,414 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-19 15:29:20,331 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.84 vs. limit=15.0 2024-08-19 15:29:26,076 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 15:29:27,149 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4530590.0, ans=0.125 2024-08-19 15:29:39,180 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4530690.0, ans=0.125 2024-08-19 15:29:42,626 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4530690.0, ans=0.125 2024-08-19 15:29:55,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4530790.0, ans=0.125 2024-08-19 15:30:00,917 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 8500, loss[loss=0.1004, beats_loss=0.01129, ecapa_loss=0.0001416, whisper_loss=0.08773, over 21960.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01045, ecapa_loss=0.0001405, whisper_loss=0.09028, over 3860827.03 frames. ], batch size: 91, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:30:04,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4530890.0, ans=0.0 2024-08-19 15:30:25,510 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.90 vs. limit=22.5 2024-08-19 15:30:40,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4531190.0, ans=0.1 2024-08-19 15:30:40,703 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4531190.0, ans=0.125 2024-08-19 15:30:46,428 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4531190.0, ans=0.0 2024-08-19 15:30:48,953 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.43 vs. limit=10.0 2024-08-19 15:30:50,983 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 35 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-19 15:30:53,226 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.328e+01 2.576e+01 3.014e+01 4.814e+01, threshold=5.151e+01, percent-clipped=0.0 2024-08-19 15:31:00,395 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 8550, loss[loss=0.1112, beats_loss=0.009256, ecapa_loss=0.000148, whisper_loss=0.1004, over 20660.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01039, ecapa_loss=0.0001402, whisper_loss=0.09071, over 3863780.84 frames. ], batch size: 81, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:31:00,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4531390.0, ans=0.125 2024-08-19 15:31:18,755 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=4531490.0, ans=0.05 2024-08-19 15:31:23,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4531590.0, ans=0.09899494936611666 2024-08-19 15:31:28,358 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-19 15:31:35,433 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 27 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-19 15:31:36,862 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4531690.0, ans=0.2 2024-08-19 15:31:36,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4531690.0, ans=0.125 2024-08-19 15:31:43,401 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.75 vs. limit=15.0 2024-08-19 15:31:45,578 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4531690.0, ans=0.0 2024-08-19 15:31:59,854 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=5.169e+01 2024-08-19 15:32:00,549 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 8600, loss[loss=0.113, beats_loss=0.008565, ecapa_loss=0.0001307, whisper_loss=0.1032, over 17603.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01041, ecapa_loss=0.0001412, whisper_loss=0.09027, over 3885489.58 frames. ], batch size: 65, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:32:00,877 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4531890.0, ans=0.1 2024-08-19 15:32:24,633 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 28 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-19 15:32:27,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4532090.0, ans=0.125 2024-08-19 15:32:28,310 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-19 15:32:30,502 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-19 15:32:30,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4532090.0, ans=0.125 2024-08-19 15:32:35,263 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-19 15:32:52,880 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.343e+01 2.555e+01 2.873e+01 3.984e+01, threshold=5.110e+01, percent-clipped=0.0 2024-08-19 15:33:00,235 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 8650, loss[loss=0.1136, beats_loss=0.008721, ecapa_loss=0.0001416, whisper_loss=0.1034, over 20009.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01044, ecapa_loss=0.0001405, whisper_loss=0.08993, over 3872896.63 frames. ], batch size: 78, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:33:07,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4532390.0, ans=0.1 2024-08-19 15:33:08,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4532390.0, ans=0.2 2024-08-19 15:33:24,308 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=4532590.0, ans=0.05 2024-08-19 15:33:28,775 WARNING [optim.py:496] (2/4) Scaling gradients by 0.054981451481580734, model_norm_threshold=51.102230072021484 2024-08-19 15:33:28,931 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.12, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.056e+05, grad_sumsq=1.009e+07, orig_rms_sq=1.047e-02 2024-08-19 15:33:29,096 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-19 15:33:30,465 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4532590.0, ans=0.125 2024-08-19 15:33:41,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4532690.0, ans=0.2 2024-08-19 15:33:49,442 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-19 15:34:00,472 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 8700, loss[loss=0.1373, beats_loss=0.007404, ecapa_loss=0.0001354, whisper_loss=0.1285, over 24201.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01041, ecapa_loss=0.0001411, whisper_loss=0.09023, over 3904859.04 frames. ], batch size: 89, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:34:01,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4532890.0, ans=0.0 2024-08-19 15:34:07,762 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-19 15:34:21,282 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-19 15:34:23,182 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.30 vs. limit=15.0 2024-08-19 15:34:38,628 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.73 vs. limit=15.0 2024-08-19 15:34:41,777 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 36 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-19 15:34:45,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4533190.0, ans=0.0 2024-08-19 15:34:46,615 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 12 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-19 15:34:53,681 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.820e+01 2.310e+01 2.553e+01 2.787e+01 9.294e+02, threshold=5.105e+01, percent-clipped=1.0 2024-08-19 15:35:00,798 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 8750, loss[loss=0.09412, beats_loss=0.009548, ecapa_loss=0.0001287, whisper_loss=0.08328, over 16428.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01035, ecapa_loss=0.0001416, whisper_loss=0.09056, over 3898633.84 frames. ], batch size: 62, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:35:01,166 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4533390.0, ans=0.05 2024-08-19 15:35:15,364 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4533490.0, ans=0.125 2024-08-19 15:35:25,031 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4533590.0, ans=0.2 2024-08-19 15:35:33,515 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4533590.0, ans=0.2 2024-08-19 15:35:49,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4533790.0, ans=0.025 2024-08-19 15:35:53,117 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.34 vs. limit=10.0 2024-08-19 15:35:54,684 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.28 vs. limit=6.0 2024-08-19 15:36:00,675 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 8800, loss[loss=0.112, beats_loss=0.009731, ecapa_loss=0.0001505, whisper_loss=0.1008, over 22504.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01045, ecapa_loss=0.0001425, whisper_loss=0.09012, over 3905530.87 frames. ], batch size: 90, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:36:03,424 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4533890.0, ans=0.0 2024-08-19 15:36:11,790 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4533990.0, ans=0.1 2024-08-19 15:36:24,806 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 25 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-19 15:36:39,679 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4534190.0, ans=0.0 2024-08-19 15:36:45,348 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-19 15:36:46,749 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4534190.0, ans=0.0 2024-08-19 15:36:49,897 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-19 15:36:53,351 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.915e+01 2.339e+01 2.639e+01 2.921e+01 5.304e+01, threshold=5.278e+01, percent-clipped=1.0 2024-08-19 15:36:54,896 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4534290.0, ans=0.07 2024-08-19 15:37:00,584 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 8850, loss[loss=0.09944, beats_loss=0.01028, ecapa_loss=0.0001202, whisper_loss=0.08795, over 21236.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01055, ecapa_loss=0.0001411, whisper_loss=0.0893, over 3872412.77 frames. ], batch size: 84, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:37:04,252 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 21 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-19 15:37:06,898 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4534390.0, ans=0.125 2024-08-19 15:37:06,924 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4534390.0, ans=0.125 2024-08-19 15:37:11,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4534490.0, ans=0.125 2024-08-19 15:37:12,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4534490.0, ans=0.125 2024-08-19 15:37:25,930 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-19 15:37:26,140 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 15:37:27,112 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-19 15:37:27,397 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4534590.0, ans=0.125 2024-08-19 15:37:31,930 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 23 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-19 15:37:34,367 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4534590.0, ans=0.125 2024-08-19 15:37:55,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4534790.0, ans=0.1 2024-08-19 15:38:00,678 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 8900, loss[loss=0.1197, beats_loss=0.008887, ecapa_loss=0.0001585, whisper_loss=0.1092, over 23773.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01059, ecapa_loss=0.00014, whisper_loss=0.08921, over 3835150.89 frames. ], batch size: 91, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:38:10,653 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 31 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-19 15:38:14,866 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4534990.0, ans=0.1 2024-08-19 15:38:27,123 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4535090.0, ans=0.0 2024-08-19 15:38:30,343 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 26 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-19 15:38:34,484 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.51 vs. limit=10.0 2024-08-19 15:38:38,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4535190.0, ans=0.125 2024-08-19 15:38:42,353 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 24 from LS+wenet, 11 from Vox, 22 fro AS 2024-08-19 15:38:42,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4535190.0, ans=0.1 2024-08-19 15:38:43,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4535190.0, ans=0.125 2024-08-19 15:38:46,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4535190.0, ans=0.2 2024-08-19 15:38:54,823 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.335e+01 2.657e+01 2.947e+01 4.207e+01, threshold=5.314e+01, percent-clipped=0.0 2024-08-19 15:39:02,157 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 8950, loss[loss=0.1022, beats_loss=0.01135, ecapa_loss=0.0001238, whisper_loss=0.08964, over 21435.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01059, ecapa_loss=0.0001402, whisper_loss=0.08965, over 3814984.65 frames. ], batch size: 89, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:39:06,501 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.48 vs. limit=10.0 2024-08-19 15:39:12,034 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 15 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-19 15:39:20,495 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 27 from Vox, 24 fro AS 2024-08-19 15:39:35,188 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4535590.0, ans=0.125 2024-08-19 15:39:36,803 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.66 vs. limit=22.5 2024-08-19 15:39:48,017 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 23 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-19 15:40:02,145 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 9000, loss[loss=0.1252, beats_loss=0.008957, ecapa_loss=0.0001147, whisper_loss=0.1151, over 23367.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01053, ecapa_loss=0.0001402, whisper_loss=0.08947, over 3807065.59 frames. ], batch size: 89, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:40:02,145 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-19 15:40:30,353 INFO [train_multi_KD3.py:1149] (2/4) Epoch 31, validation on ASR_libri: loss=0.2531, beats_loss=0, ecapa_loss=0.0005084, whisper_loss=0.248, over 922467.00 frames. 2024-08-19 15:40:43,467 INFO [train_multi_KD3.py:1149] (2/4) Epoch 31, validation on SV_voxceleb1: loss=0.004046, beats_loss=0, ecapa_loss=0.0004046, whisper_loss=0, over 939242.00 frames. 2024-08-19 15:41:04,126 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([9.4859e-04, 8.0483e-03, 3.1554e-03, 3.7125e+00, 1.4840e-03, 2.6893e-02, 2.8373e-02, 3.5037e-02], device='cuda:2') 2024-08-19 15:42:05,912 INFO [train_multi_KD3.py:1149] (2/4) Epoch 31, validation on AT_audioset: loss=0.02311, beats_loss=0.02311, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 15:42:05,916 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31584MB 2024-08-19 15:42:09,886 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.61 vs. limit=10.0 2024-08-19 15:42:12,991 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 35 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-19 15:42:19,691 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4535990.0, ans=0.95 2024-08-19 15:42:20,767 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4535990.0, ans=0.1 2024-08-19 15:42:23,986 WARNING [optim.py:496] (2/4) Scaling gradients by 0.08310459554195404, model_norm_threshold=53.13531494140625 2024-08-19 15:42:24,144 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.161e+04, grad_sumsq=1.575e+04, orig_rms_sq=3.277e+00 2024-08-19 15:42:38,976 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4536090.0, ans=0.125 2024-08-19 15:42:40,380 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4536090.0, ans=0.2 2024-08-19 15:42:41,195 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 15 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-19 15:42:43,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4536190.0, ans=0.125 2024-08-19 15:42:55,475 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 24 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-19 15:42:58,834 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.296e+01 2.631e+01 2.912e+01 6.394e+02, threshold=5.262e+01, percent-clipped=1.0 2024-08-19 15:43:01,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4536290.0, ans=0.0 2024-08-19 15:43:05,967 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 9050, loss[loss=0.1276, beats_loss=0.009877, ecapa_loss=0.0001338, whisper_loss=0.1164, over 23778.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01054, ecapa_loss=0.0001405, whisper_loss=0.08967, over 3807588.82 frames. ], batch size: 90, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:43:06,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4536390.0, ans=0.125 2024-08-19 15:43:06,309 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4536390.0, ans=0.1 2024-08-19 15:43:06,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4536390.0, ans=0.125 2024-08-19 15:43:06,415 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4536390.0, ans=0.0 2024-08-19 15:43:06,672 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.20 vs. limit=15.0 2024-08-19 15:43:13,246 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 24 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-19 15:43:21,756 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4536490.0, ans=0.125 2024-08-19 15:43:21,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=4536490.0, ans=0.05 2024-08-19 15:43:22,197 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.21 vs. limit=15.0 2024-08-19 15:43:29,278 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.13 vs. limit=15.0 2024-08-19 15:43:30,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4536590.0, ans=0.1 2024-08-19 15:43:33,238 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.01 vs. limit=12.0 2024-08-19 15:43:47,665 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4536690.0, ans=0.125 2024-08-19 15:43:50,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4536690.0, ans=0.0 2024-08-19 15:43:51,158 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 20 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-19 15:43:52,541 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4536790.0, ans=10.0 2024-08-19 15:43:56,560 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.33 vs. limit=12.0 2024-08-19 15:44:05,267 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 9100, loss[loss=0.1005, beats_loss=0.01133, ecapa_loss=0.0001315, whisper_loss=0.08786, over 23308.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01052, ecapa_loss=0.0001411, whisper_loss=0.08975, over 3832675.95 frames. ], batch size: 92, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:44:16,399 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4536990.0, ans=0.125 2024-08-19 15:44:19,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4536990.0, ans=0.1 2024-08-19 15:44:34,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4537090.0, ans=0.07 2024-08-19 15:44:58,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4537290.0, ans=0.125 2024-08-19 15:45:03,563 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.267e+01 2.524e+01 2.813e+01 7.972e+01, threshold=5.047e+01, percent-clipped=1.0 2024-08-19 15:45:09,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=4537290.0, ans=0.025 2024-08-19 15:45:12,319 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 9150, loss[loss=0.1319, beats_loss=0.007784, ecapa_loss=0.000151, whisper_loss=0.1226, over 20589.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01046, ecapa_loss=0.0001412, whisper_loss=0.09059, over 3865868.34 frames. ], batch size: 79, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:45:12,700 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4537390.0, ans=0.0 2024-08-19 15:45:15,912 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.89 vs. limit=10.0 2024-08-19 15:45:16,837 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4537390.0, ans=0.125 2024-08-19 15:45:45,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4537590.0, ans=0.125 2024-08-19 15:45:49,502 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4537590.0, ans=0.1 2024-08-19 15:45:57,978 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 20 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-19 15:45:58,304 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4537690.0, ans=0.125 2024-08-19 15:46:10,846 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.85 vs. limit=15.0 2024-08-19 15:46:13,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=4537790.0, ans=0.025 2024-08-19 15:46:19,732 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 9200, loss[loss=0.09687, beats_loss=0.01148, ecapa_loss=0.000139, whisper_loss=0.084, over 18678.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01045, ecapa_loss=0.0001417, whisper_loss=0.09034, over 3875524.39 frames. ], batch size: 75, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:46:21,487 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4537890.0, ans=0.125 2024-08-19 15:46:30,882 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4537890.0, ans=0.2 2024-08-19 15:46:44,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4537990.0, ans=0.0 2024-08-19 15:46:58,310 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4538190.0, ans=0.2 2024-08-19 15:47:01,185 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4538190.0, ans=0.125 2024-08-19 15:47:14,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4538290.0, ans=0.125 2024-08-19 15:47:16,757 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.333e+01 2.570e+01 2.855e+01 5.209e+01, threshold=5.141e+01, percent-clipped=1.0 2024-08-19 15:47:23,439 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4538390.0, ans=0.125 2024-08-19 15:47:24,291 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 9250, loss[loss=0.1102, beats_loss=0.01032, ecapa_loss=0.000167, whisper_loss=0.09816, over 20862.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01042, ecapa_loss=0.0001414, whisper_loss=0.09047, over 3899458.08 frames. ], batch size: 89, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:47:33,284 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 22 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-19 15:47:35,001 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.64 vs. limit=15.0 2024-08-19 15:47:50,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4538590.0, ans=0.125 2024-08-19 15:47:51,430 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.65 vs. limit=15.0 2024-08-19 15:47:56,843 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 22 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-19 15:48:03,045 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-19 15:48:12,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4538690.0, ans=0.125 2024-08-19 15:48:18,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4538790.0, ans=0.125 2024-08-19 15:48:22,126 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-19 15:48:29,377 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 9300, loss[loss=0.09326, beats_loss=0.01045, ecapa_loss=0.0001232, whisper_loss=0.08158, over 15667.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01043, ecapa_loss=0.0001403, whisper_loss=0.09072, over 3899720.97 frames. ], batch size: 63, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:48:30,866 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 29 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-19 15:48:42,902 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 22 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-19 15:49:02,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4539090.0, ans=0.125 2024-08-19 15:49:23,363 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4539290.0, ans=0.125 2024-08-19 15:49:25,213 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.419e+01 2.678e+01 2.934e+01 3.690e+01, threshold=5.356e+01, percent-clipped=0.0 2024-08-19 15:49:28,475 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 15:49:28,500 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.277e+01 2024-08-19 15:49:29,731 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4539290.0, ans=0.0 2024-08-19 15:49:33,372 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 9350, loss[loss=0.1229, beats_loss=0.008323, ecapa_loss=0.0001607, whisper_loss=0.1129, over 22572.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01045, ecapa_loss=0.0001402, whisper_loss=0.09021, over 3894451.26 frames. ], batch size: 90, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:49:33,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4539390.0, ans=0.5 2024-08-19 15:49:38,186 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 21 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-19 15:49:38,531 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=4539390.0, ans=10.0 2024-08-19 15:49:42,937 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 18 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-19 15:49:53,495 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.65 vs. limit=15.0 2024-08-19 15:50:04,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4539590.0, ans=0.1 2024-08-19 15:50:10,908 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.63 vs. limit=22.5 2024-08-19 15:50:32,922 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 36 from LS+wenet, 29 from Vox, 26 fro AS 2024-08-19 15:50:35,122 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 9400, loss[loss=0.09903, beats_loss=0.0103, ecapa_loss=0.0001541, whisper_loss=0.08718, over 18659.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01045, ecapa_loss=0.0001411, whisper_loss=0.09023, over 3895771.63 frames. ], batch size: 77, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:50:37,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4539890.0, ans=0.125 2024-08-19 15:50:43,562 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.95 vs. limit=15.0 2024-08-19 15:50:51,611 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.97 vs. limit=15.0 2024-08-19 15:51:24,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4540190.0, ans=0.1 2024-08-19 15:51:26,214 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4540190.0, ans=0.0 2024-08-19 15:51:30,404 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4540190.0, ans=0.125 2024-08-19 15:51:39,321 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.736e+01 2.311e+01 2.569e+01 2.722e+01 4.090e+01, threshold=5.139e+01, percent-clipped=0.0 2024-08-19 15:51:42,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4540290.0, ans=0.1 2024-08-19 15:51:51,718 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.31 vs. limit=15.0 2024-08-19 15:51:52,093 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 9450, loss[loss=0.1104, beats_loss=0.009395, ecapa_loss=0.0001814, whisper_loss=0.0992, over 21297.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01036, ecapa_loss=0.0001426, whisper_loss=0.09049, over 3872501.42 frames. ], batch size: 89, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:52:04,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4540390.0, ans=0.0 2024-08-19 15:52:04,423 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.32 vs. limit=15.0 2024-08-19 15:52:05,013 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-19 15:52:32,096 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4540590.0, ans=0.1 2024-08-19 15:52:33,673 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 26 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-19 15:52:46,125 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4540690.0, ans=0.125 2024-08-19 15:53:08,027 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-19 15:53:09,871 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.74 vs. limit=22.5 2024-08-19 15:53:11,184 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=4540790.0, ans=0.05 2024-08-19 15:53:13,527 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 9500, loss[loss=0.107, beats_loss=0.01011, ecapa_loss=0.0001715, whisper_loss=0.09515, over 22510.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01036, ecapa_loss=0.0001424, whisper_loss=0.09069, over 3887940.08 frames. ], batch size: 92, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:53:16,525 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 32 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-19 15:53:24,508 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 25 from LS+wenet, 10 from Vox, 19 fro AS 2024-08-19 15:53:40,162 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-19 15:53:40,543 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.40 vs. limit=12.0 2024-08-19 15:53:48,580 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.89 vs. limit=22.5 2024-08-19 15:54:05,233 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 14 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-19 15:54:18,128 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-19 15:54:31,299 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4541290.0, ans=0.0 2024-08-19 15:54:37,966 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+01 2.373e+01 2.637e+01 2.974e+01 3.781e+01, threshold=5.275e+01, percent-clipped=0.0 2024-08-19 15:54:51,063 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 9550, loss[loss=0.1156, beats_loss=0.009416, ecapa_loss=0.0001156, whisper_loss=0.105, over 20867.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01036, ecapa_loss=0.0001429, whisper_loss=0.09044, over 3891461.27 frames. ], batch size: 78, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:55:00,054 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 24 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-19 15:55:12,255 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4541390.0, ans=0.125 2024-08-19 15:55:34,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4541590.0, ans=0.0 2024-08-19 15:55:36,117 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.20 vs. limit=15.0 2024-08-19 15:55:47,654 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=4541590.0, ans=0.05 2024-08-19 15:56:19,810 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 13 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-19 15:56:23,798 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.79 vs. limit=15.0 2024-08-19 15:56:24,923 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-19 15:56:31,197 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-19 15:56:31,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4541790.0, ans=0.125 2024-08-19 15:56:34,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4541790.0, ans=0.125 2024-08-19 15:56:38,070 INFO [train_multi_KD3.py:1116] (2/4) Epoch 31, batch 9600, loss[loss=0.1254, beats_loss=0.008919, ecapa_loss=0.00016, whisper_loss=0.1149, over 21127.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01036, ecapa_loss=0.0001421, whisper_loss=0.09028, over 3878886.80 frames. ], batch size: 87, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:56:41,409 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 22 from LS+wenet, 24 from Vox, 47 fro AS 2024-08-19 15:56:41,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4541890.0, ans=0.125 2024-08-19 15:56:48,010 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-19 15:57:02,343 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4541990.0, ans=0.1 2024-08-19 15:57:14,500 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-19 15:57:16,986 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 35 from LS+wenet, 22 from Vox, 32 fro AS